ipramsflt This program is used to filter and process a Reuters AMS/SelectFeed plus stream. a.1 Remove Dow Jones duplicates - check StoryDT - save ALL Headlines in another structure - check Headline - reload on startup - purge according to DropDate/Time AND PNAC a.2 Chr translations using a mapping table in pram file for quotes etc a.3 mangle location for Agency/Area/City/Dateline/By/Email a.4 PreClassification of Headings - ANALYSIS, UPDATE etc according to Language a.5 -not done - PreClass of Topics/Products according to ben's algo. a.6 (mar06) new means of locating the City a.7 Total number of Elements, Words and Chrs - was done in ipramstxt (BX, BW, BL) but NOT now as we need NEP data b. The following items have already been done by ipramstxt. b.1 Aggregate Story elements b.2 NONO - Total number of Elements, Words and Chrs - done in ipramstxt (BX, BW, BL) b.3 De-duplicate the incoming data of redundant/already sent elements. Keywords for RamsFilter parameter file are : ; comment line dest: Destination (as in sys/USERS) extra-fiphdr: more fixed Fip Hdr fields to add to the file (before any new matched additions) script: script to run against the New file. default: none newname: name of the output file. preclass-headlines:RamsFlt.head-preclass.fip if we want to preclass headlines, this is a language/keywords. They syntax is ; comment line (lang) (spc or tabs) (kwd) EN ANALYSIS They should/must be in ascending sort order. text-in-fiphdr:(max size) Put the text into FipHdr fields of 1000 chrs each for K1-K5 Use 'hdrchr' to map end-of-lines like CR NL to SPC and quotes etc hdrchr:(octal/dec/hex number):(FipSeq chr) Replace this character with the new Chr This can be a printable chr or an escaped number. The number is octal/dec/hex depending on the preceding 'number' keyword (if any). eg hdrchr:\313:� hdrchr:<:\333 process-headlines:(yes/no) for Rics, Numbers Genres etc default is yes redundant-primary-host: (hostname) redundant-timeout: (seconds) For redundant running, this is the primary host If ramsflt is running on the secondary : If the primary is up and running, files will be deleted after the 60 seconds (or the 'redundant-timeout'). If the primary is down, files are passed to tbe next process. balance-group: (group) balance group for the DJ-duplicates file to send to the redundant system redun-sys: (host1, host2) tph-codes: (filename) This has the format of : (country), (opt city), (+tphno), gibberish United Arab Emirates,,+971,, United Kingdom,London,+44 20,+44 (0)20, United Kingdom,Edinburgh,+44 131,+44 (0)131, desk: (source) (desk) stop-desk: (desk name) allow-unicode-in-fiphdrs:yes/no contents of BC, BB, BC, BD and BZ NO - should be plain ascii only (default) YES - allow unicode source-reuters-flag: (FipSeq) Check against the NAOMI_SOURCE_CODES.IP file to see if the Attribution or Source Code is RTRS default: just RTRS ; ,CRF,Thomson Reuters Commodities and Research Forecasts,Y,Y ; ,CSE,Copenhagen Stock Exchange,, lookup:Q1 TU file=NAOMI_SOURCE_CODES.FIP key=2 sep=, value=4 source-reuters-flag:\Q1 Where sections of FipHdr fields are required or changes to the output style, use keywords : fixed, partial, combie, optional, repeat, newdate and/or style. (see The SysAdmin manual for more information). They are normally specified : fixed:QZ 1234543 partial:QT ST,3,2,U,<,> combie:QY ep|na,(0000000)a option:QE ep,11,7,s repeat:QK XK,-,3 or repeat:QP PK,,4,#X style:QS XN,%.03d Input Parameters are (all optional) : Either -1 : path/filename for single shot default: spooled The input file is NOT deleted If this does NOT start with a '/', it is assumed relative to the current path. Or -A : do NOT calculate BW, BW, BL default: do! -i : input queue default: spool/ramsfilter If this does NOT start with a '/', it is assumed under spool. -C : Log interval default: 600 (for 10 mins) set to 0 for no logging -z : default parameter file in tables/setup default: tables/setup/RamsFilter -w : file wait for files arriving across a network. def: 8 secs -l : log files in default: log only totals -L : log files in default: log only totals -o : output queue default: spool/2meta_nfcp If this does NOT start with a '/', it is assumed under spool. -v : print version number and exit ---NOTES--- FipHdr fields used are : BA Agency Name from Dateline BB Byline BC City from Dateline BD Date from Dateline BH Preclassified Headline Keyword BL Byte/Chr count (now in ipramsflt - was from ipramstxt or iprdfmeta) BM Email from signoff BN Numbers in the Headline BR Rics in the Headline BW Word count (now in ipramsflt - was from ipramstxt or iprdfmeta) BX No of Text Segments (now in ipramsflt - was from ipramstxt or iprdfmeta) BY Preclassification of Topics/Products - not used for the moment BZ Area from Dateline if the text-in-fiphdr: is set K1: First 900 chrs K2: Second 900 chrs K3: Third 900 chrs K4: Fourth chunk of chrs - normally around 500 chrs etc up to value of TextInFipHdr (3500 by default) K9: Optional last 1000 chrs ONLY if the text size is > 3200/TextInFipHdr Version Control ;02r 20nov13 cleanups ;c added K9 as LAST block of text for big files ;d-k bugs.. totSubbo upped to 40 ;l-q 24jun15 added BX,BW,BL (counts for eles, words, chrs) as rdfmeta has no data (now in NEP) ;01d 3mar06 added new DateLine/Source filters ;00u 30dec02 original version ;l 29apr03 added extra-fiphdr + process-headline + digits 0-9 only plus 9-9 ;m 16jun03 cleanups ;n-p 21jul03 added new Dateline processing; only first 8 lines, zap punct from City etc added redundancy and balance ;r-t 16sep03 added redun-sys ;u 20oct03 added moodys and s&p datelines (copyright) 2024 and previous years FingerPost Ltd.