iplookup This program is used to add FipHdr fields matched against a key field in a lookup file. The original header and text are left untouched. The format of the lookup file is : (Search String) (separator) (NewField1) [optional sep] [opt new2]...(eoln) One item per line. Lines starting with a semicolon ';' are considered comments and are ignored. Examples - a tab delimited file ICI ICI.EU CHEMICALS "Imperial Chemical Industries" ATT ATT.US TELECOMS "A.T. & T." or a CSV file as Lotus or Excel might generate : "Maserati","Italian","Brillo" "TVR","British","Knockout" or a file of your own making - such as one with pipes as separators : 747|Boeing|long distance|4 engines|commercial 300|Airbus|short distance|2 engines|commercial In each of these cases the program will attempt to match the content of a specified FipHdr field with the FIRST field on the line. Note that if you have a very big lookup file, processing is vastly speeded up by sorting it beforehand in ascending order and letting the program hash this sorted file. For Unix Sort : sort -b -o sorted.file orig.file or if the separator is a Pipe '|' : sort -b -t\| -o sorted.file orig.file If you change the Lookup file, pls kill and reactive the program to get it to read the new version as it reads the lookup file once only and uses the copy in memory for processing. The parameter file describing where the lookup file is for the data is held in tables/setup and defaults to 'LOOKUP'. This can be overridden by the DY: fipHdr field. As per normal the name of the parameter file is forced uppercase. There is also a question of where to send the output file as this, by default, is put in spool/2go for IPWHEEL to distribute. So it needs a Destination(s) or DU FipHdr field. This is added by either : - It there is a DX FipHdr field in the input file, that is used. - If not, the keyword 'dest' is used in the parameter file. - If that is not specified either, it is sent to 'woops' the Intercept queue. - You may also specify it from the incoming data or attribute-data using the 'fixhdr' keyword. In this case the contents of DX, 'dest' or 'woops' will be the default if there is no data. If using the Reuters MetaData Repository switch, the lookup (but NOT template) files are ignored and the data is added to the output file directly (minus the Newsml tags). Keywords for LOOKUP parameter file are : Mandatory: ; comment line lookup: file containing codes to match and any additional fields. eg lookup:/data1/MATCHCODES match:(existing FipHdr field) Match FipHdr field with lookup table entry Optional subKeywords newfld:(Fip1, Fip2, Fip3) one or more 2 letter FipHdr Codes The additional fields on the lookup lines will be allocated to these FipHdr names. By default thes are L0 for the first field, L1 the second etc eg if the lookup file has 5 fields - the first being the match. newfld:AB,FF,AC,L6 will stuff field 2 in AB, 3 in FF, 4 in AC and 5 into L6 default: Default value if NOT found (in FipSeq) There is NO default default - the field is ignored. Up to 50 hdr fields may be matched. There is also a 'repeat-match' keyword which allows a single field containing multiple items to be broken automatically into zones and EACH zone is matched in turn. (see below) Optional: sorted The match field (first field) of the Lookup File is in the correct sort order. default:no sep: single chr Field separator in the Lookup file This is defaults to any run of Tabs/Spcs casesens:y Match Fields are Case sensitive - normally NOT. fmt:csv Comma separated lookup file - ie strip double quotes. This does NOT affect the Separator which should be set if NOT tab/spcs. default is Space/Tab separated outque: output queue for the new file. dest: Destination (as in sys/USERS) extra-fiphdr: or fixhdr: more fixed Fip Hdr fields to add to the file (before any new matched additions) script: script to run against the New file. default: none newname: name of the output file. template: Use the template and fill in with the new FipHdr values. repeat-sep: Separator for repeat-match fields default: '+' log-line: Substitute log line. REUTERS-HEADLINE:KK,M9 REUTERS-SLUGLINE:MD,M8 REUTERS-LANGUAGE:KF REUTERS-PRIORITY:KH REUTERS-GET-XML:yes for XXnews reuters-topic-lookup:\\R8 reuters-ccs-lookup:\\R6 REUTERS-iso-LC-file:mrm.language.fip This is read once on startup and should be in the default parameter file only. syntax is LANG-VARIANT-Duid (NL) eg en-GB-T0024959592 The Genre processing uses FipHdr fields J0 to J9 plus R9 for the Genre Duid R8 for Language Duid R0 for Language Variant R6 list of paths/topics For Reuters PNAC processing, the relevant FipHdr fields needs : reuters-sq-topics:KB reuters-sq-priority:KF reuters-sq-filed:WK reuters-sq-source:QR combie:QR M3|KZ|KV,RTRS ;move these from Topic to Products reuters-rnp:DF XRNP ; NTM teXt or Table Fiphdr to add a <PRE> reuters-TX-fiphdr:KE Where sections of FipHdr fields are required or changes to the output style, use keywords : fixed, partial, combie, optional, repeat, newdate and/or style. (see The SysAdmin manual for more information). They are normally specified : fixed:QZ 1234543 partial:QT ST,3,2,U,<,> combie:QY ep|na,(0000000)a option:QE ep,11,7,s repeat:QK XK,-,3 or repeat:QP PK,,4,#X style:QS XN,%.03d unique:QT XT ------------------------------------------------------ Example LOOKUP file : ; Codes file is in MATCHCODES lookup:/fip/tables/setup/MATCHCODES ; field sep in MATCHCODES is a pipe sep:| ; incoming FipHdr field is in the format : ;SH::LP:N0297:VZ:H-----:PU:CF:TFXJ.AU-PBL.AU-BRY.NZ:KHeavy Selling in Australian Publishing ; Breakup the XT Fip Source Header field into 3 fields called Q1,Q2,Q3 divided by a hyphen ; note that Q1,Q2,Q3 are for internal use only - they do NOT get created in the output file. repeat:Q1 XT,-,1 repeat:Q2 XT,-,2 repeat:Q3 XT,-,3 ; match each one of these against the MATCHCODE table and create (up to) 4 new header fields. match:Q1 newfld:A1,A2,A3,A4 match:Q2 newfld:B1,B2,B3,B4 match:Q3 newfld:C1,C2,C3,C4 -- The MATCHCODES file in this case could look like : ; with a sep of PIPE, first field is the key, a newline finishes each line ; the file should be sorted on the first field. FXJ.AU|AU000000FXJ5|6467074|FAIRFAX(JOHN)|AU|PUB PBL.AU|AU000000PBL6|6637082|PUBLISHING & BROAD|AU|PUB ------------------------------------------------------ Notes ----- Use the repeat-match for cases where the input field looks something like : WT:ASIA EMRG IN IND AUT MAC RES To use : ; make sure the codes are unique and separated by a single plus sign unique:AT WT ; go get a match for each one ! - rptfld holds the FipHdr contining just this single search. repeat-match:AT newfld:W2,W3,W4 rptfld:W1 You then have to define what to do with the output. - If you do nothing, then there will be multiple 'W1,W2 etc' etc in the FipHdr BUT only the last one will be accessible! - If you are using templates, then a new template is generated for each match. Each template will be appended to the last for the output file. ------------------------------ Normally 'iplookup' only adds new FipHdrs and the data or text of the file is not changed at all. There are times however when you wish to use the new lookup headers immediately The V1 is the index to our table - which provides uniqueness inside the file for Duids and other ids V2 is the original search key V3... is the first new field if there are any more fields, they are V4, V5 etc text-template: (file in table/setup) <rn2wEntity><Category Type="COMPANY" Resolve="tf_entity" Alias="\V1" SearchCount="1"><SearchResult IdRef="\V3"/></Category>\V2</Entity> newV1: (FipHdr to use inplace of V1, V2 and V3 if they are used elsewhere) ------------------------------ In the current version only ONE lookup file can be searched per file. To search more, you need to run the file through the program twice, hopefully against different parameter files! In the first parameter file, use 'outque' and 'extra-fiphdr' to loop : ; stuff it back into 2lookup afterwards outque:2lookup ; use the 2nd parameter file /fip/tables/setup/PARAM2 extra-fiphdr:#DY:param2 OR if there are two very big lookups, you will not want to read and hash each lookup for every file coming though - so if sub-second speed is importnat, use two 'iplookups' in the SYSTEM file and use either input switch '-o' or 'outque' to get the files from one to the other. look1 local iplookup -i 2lookup -o 2lookup2 -z param1 look2 local iplookup -i 2lookup2 -z param2 ------------------------------ The Parameter file is read every time a file is processed. The lookup file is first checked to see if it is the same as the last request or if it has changed before reloading. Note this is a change from version 04c, before which a change to the lookup needed a stop and restart of iplookup to load the new version. ------------------------------ There are extra parameters for the MRM v2 : reuters-check-headline:K5 This will also check the FipHdr K5 and replace any RICs with Entity tags. Any '<' and '>' tags are mapped to 036 and 037 for mapping back in a subsequent xchg. reuters-RICs-in-text:U9 Split CompanyIds/Rics into the original FipHdr field plus U9 for any that appear in Text reuters-ignore-RICS-in-text:B9 If this FipHdr field is NON-blank, do NOT look for Rics in the text. ------------------------------ Input Parameters are (all optional) : Either -1 : path/filename for single shot default: spooled The input file is NOT deleted If this does NOT start with a '/', it is assumed relative to the current path. Or -i : input queue default: spool/2lookup If this does NOT start with a '/', it is assumed under spool. -D : time in seconds to sweep the queue default: not enabled Use this to batch and delay copy for 20 mins -D 1200 -h : Show results from Reuters MRM default: no -H : display ONLY the new FipHdr fields The default is a complete file with FipHdr and data This is of most use with the '-1' single file switch. -l : do NOT log files in default: log -o : output queue default: spool/2go If this does NOT start with a '/', it is assumed under spool. This is overridden by the 'outque' or 'outdest' parameters if they exist. -r : Reuters Duid checking for internal NewsML feed default: no -R : use the Reuters Metadata Repository for the Lookup. default: no -C : do NOT use Reuters CCS codes (ie old variants) default: CCS -w : file wait for files arriving across a network. def: 8 secs -z : default parameter file in tables/setup default: tables/setup/LOOKUP -v : print version number and exit Version Control ;08r 10dec04 Roy-mrmCache added ;f 07feb05 added NLS/Genre Cache too ;g-i 21feb05, Roy - ignore language variants ;j 09mar05 Roy added table-pre ;k-o 21mar05 Roy buglettes ;p-q 15jul05 xxnews - added REUTERS-GET-XML ;07z7 05may04 N2Wversion4 ;b-i 01jun04 woops -  only for .ULs ;j-l 28jul04 Roy - leg/CCS dual Rics and NSC codes too for -r/ccs ;m-r 04aug04 for RFC 46 [*] (p for dualric bug) (q speedy) (r-duid<002) ;s 18oct04 AVANT-PAPIER catered for plus FEATURE ;t-u 20oct04 ADVISORY and allow 4 x head and 5 x slug genres ;v-z7 27oct04 rfc87/46/110 work ;06w 13aug03 bugette in RTR Genre ;c upped Rics 300->500 ;d 06oct03 reworked Duid bit for EITHER n2000 OR rrr. ;e 31oct03 timings ;f-h 21nov03 v3.0 RICs - markup NoCoys and ONLY markup the RIC not the name ;i 09dec03 added FIP_maxFipHdrSize = (4*STDBUF) - 1; ;j 17dec03 allow multiple TOPICS for a single N2000 code inbound ;k 05feb04 bug in non-mrm version - double FipHdrs ;l-n 01mar04 RTR added UnlistedRics and made Genre generic ;o-q 12mar04 added -h and reuters-iso.lc-tags (q v4 mrm .h) ;r-w 26mar04 bugette in Genre-Headline and UTF8 ;05z 11apr03 added Dual Rics for RTR ;b-g 26may03 added Genre ;h-k 17jun03 added reuters-check-alertline plus & in Genre plus Features plus reuters-priority ;l-n 23jun03 NNG variant ie V3 ;o-p 04jul03 bugette - missing last number.in headline plus if UPDATE with no number - default is 1 ;q 12jul03 CoInst chg to APARTMENTTHREADED ;r-u 16jul03 bugette - ignoring 1st Co Duid plus redid rn2wcats plus bug in Genre ;v-z 31jul03 **see notes of Nathan's changes (y-FEATURE bugette) ;04z 29nov01 added COM support for Reuters Metadata Repository ;b 23jan02 added text-template ;c/d/e 05apr02 added check time/size of lookup file ;f Reuters-lookup NOT be uppercase ;g 29aug02 bug for Reuters-lookup picking up bad xml ;h-m 17oct02 Reuters MRM version 2 (j=added headline and bug if no data) ;n-p 09dec02 Tuned routine for finding Reuters MRM name ;r-s 14feb03 MRMv2 build 58/59 mods for RICs ;t-z 11mar03 if FipHdr B9 is TOP - ignore Rics-in-text ;03 29oct01 added repeat-match: and template (copyright) 2021 and previous years FingerPost Ltd.