ipramsflt
ipramsflt
This program is used to filter and process a Reuters AMS/SelectFeed plus
stream.
a.1 Remove Dow Jones duplicates
- check StoryDT
- save ALL Headlines in another structure
- check Headline
- reload on startup
- purge according to DropDate/Time AND PNAC
a.2 Chr translations using a mapping table in pram file for quotes etc
a.3 mangle location for Agency/Area/City/Dateline/By/Email
a.4 PreClassification of Headings - ANALYSIS, UPDATE etc according to Language
a.5 -not done - PreClass of Topics/Products according to ben's algo.
a.6 (mar06) new means of locating the City
a.7 Total number of Elements, Words and Chrs - was done in ipramstxt (BX, BW,
BL) but NOT now as we need NEP data
b. The following items have already been done by ipramstxt.
b.1 Aggregate Story elements
b.2 NONO - Total number of Elements, Words and Chrs - done in ipramstxt (BX,
BW, BL)
b.3 De-duplicate the incoming data of redundant/already sent elements.
Keywords for RamsFilter parameter file are :
; comment line
dest: Destination (as in sys/USERS)
extra-fiphdr: more fixed Fip Hdr fields to add to the file (before any new
matched additions)
script: script to run against the New file. default: none
newname: name of the output file.
preclass-headlines:RamsFlt.head-preclass.fip
if we want to preclass headlines, this is a language/keywords.
They syntax is ; comment line
(lang) (spc or tabs) (kwd)
EN ANALYSIS
They should/must be in ascending sort order.
text-in-fiphdr:(max size)
Put the text into FipHdr fields of 1000 chrs each for K1-K5
Use 'hdrchr' to map end-of-lines like CR NL to SPC and quotes etc
hdrchr:(octal/dec/hex number):(FipSeq chr)
Replace this character with the new Chr
This can be a printable chr or an escaped number. The number is
octal/dec/hex depending on the preceding 'number' keyword (if any).
eg hdrchr:\313:�
hdrchr:<:\333
process-headlines:(yes/no) for Rics, Numbers Genres etc
default is yes
redundant-primary-host: (hostname)
redundant-timeout: (seconds)
For redundant running, this is the primary host
If ramsflt is running on the secondary :
If the primary is up and running, files will be deleted after
the 60 seconds (or the 'redundant-timeout').
If the primary is down, files are passed to tbe next process.
balance-group: (group)
balance group for the DJ-duplicates file to send to the redundant system
redun-sys: (host1, host2)
tph-codes: (filename)
This has the format of :
(country), (opt city), (+tphno), gibberish
United Arab Emirates,,+971,,
United Kingdom,London,+44 20,+44 (0)20,
United Kingdom,Edinburgh,+44 131,+44 (0)131,
desk: (source) (desk)
stop-desk: (desk name)
allow-unicode-in-fiphdrs:yes/no
contents of BC, BB, BC, BD and BZ
NO - should be plain ascii only (default)
YES - allow unicode
source-reuters-flag: (FipSeq)
Check against the NAOMI_SOURCE_CODES.IP file to see if the
Attribution or Source Code is RTRS default: just RTRS
; ,CRF,Thomson Reuters Commodities and Research Forecasts,Y,Y
; ,CSE,Copenhagen Stock Exchange,,
lookup:Q1 TU file=NAOMI_SOURCE_CODES.FIP key=2 sep=, value=4
source-reuters-flag:\Q1
Where sections of FipHdr fields are required or changes to the output style,
use keywords : fixed, partial, combie, optional, repeat, newdate and/or style.
(see The SysAdmin manual for more information).
They are normally specified :
fixed:QZ 1234543
partial:QT ST,3,2,U,<,>
combie:QY ep|na,(0000000)a
option:QE ep,11,7,s
repeat:QK XK,-,3
or repeat:QP PK,,4,#X
style:QS XN,%.03d
Input Parameters are (all optional) :
Either
-1 : path/filename for single shot default: spooled
The input file is NOT deleted
If this does NOT start with a '/', it is assumed relative to the current
path.
Or
-A : do NOT calculate BW, BW, BL default: do!
-i : input queue default: spool/ramsfilter
If this does NOT start with a '/', it is assumed under spool.
-C : Log interval default: 600 (for 10 mins)
set to 0 for no logging
-z : default parameter file in tables/setup default: tables/setup/RamsFilter
-w : file wait for files arriving across a network. def: 8 secs
-l : log files in default: log only totals
-L : log files in default: log only totals
-o : output queue default: spool/2meta_nfcp
If this does NOT start with a '/', it is assumed under spool.
-v : print version number and exit
---NOTES---
FipHdr fields used are :
BA Agency Name from Dateline
BB Byline
BC City from Dateline
BD Date from Dateline
BH Preclassified Headline Keyword
BL Byte/Chr count (now in ipramsflt - was from ipramstxt or iprdfmeta)
BM Email from signoff
BN Numbers in the Headline
BR Rics in the Headline
BW Word count (now in ipramsflt - was from ipramstxt or iprdfmeta)
BX No of Text Segments (now in ipramsflt - was from ipramstxt or iprdfmeta)
BY Preclassification of Topics/Products - not used for the moment
BZ Area from Dateline
if the text-in-fiphdr: is set
K1: First 900 chrs
K2: Second 900 chrs
K3: Third 900 chrs
K4: Fourth chunk of chrs - normally around 500 chrs etc up to value of
TextInFipHdr (3500 by default)
K9: Optional last 1000 chrs ONLY if the text size is > 3200/TextInFipHdr
Version Control
;02r 20nov13 cleanups
;c added K9 as LAST block of text for big files
;d-k bugs.. totSubbo upped to 40
;l-q 24jun15 added BX,BW,BL (counts for eles, words, chrs) as rdfmeta has no
data (now in NEP)
;01d 3mar06 added new DateLine/Source filters
;00u 30dec02 original version
;l 29apr03 added extra-fiphdr + process-headline
+ digits 0-9 only plus 9-9
;m 16jun03 cleanups
;n-p 21jul03 added new Dateline processing;
only first 8 lines, zap punct from City etc
added redundancy and balance
;r-t 16sep03 added redun-sys
;u 20oct03 added moodys and s&p datelines
(copyright) 2025 and previous years FingerPost Ltd.