ipramsflt (Sat Oct 25 2014 01:31:01)

ipramsflt

This program is used to filter and process a Reuters AMS/SelectFeed plus
stream.

a.1 Remove Dow Jones duplicates
	- check StoryDT
	- save ALL Headlines in another structure
	- check Headline
	- reload on startup
	- purge according to DropDate/Time AND PNAC

a.2 Chr translations using a mapping table in pram file for quotes etc

a.3 mangle location for Agency/Area/City/Dateline/By/Email

a.4 PreClassification of Headings - ANALYSIS, UPDATE etc according to Language

a.5 -not done - PreClass of Topics/Products according to ben's algo.

a.6 (mar06) new means of locating the City


b. The following items have already been done by ipramstxt.

b.1 Aggregate Story elements

b.2 Total number of Elements, Words and Chrs - done in ipramstxt (BX, BW, BL)

b.3 De-duplicate the incoming data of redundant/already sent elements.

Keywords for RamsFilter parameter file are :

	; comment line
	dest:	Destination (as in sys/USERS)
	extra-fiphdr:	more fixed Fip Hdr fields to add to the file (before any new
matched additions)
	script:	script to run against the New file.		default: none
	newname: name of the output file.

	preclass-headlines:RamsFlt.head-preclass.fip
		if we want to preclass headlines, this is a language/keywords.
		They syntax is	; comment line
				(lang) (spc or tabs) (kwd)
				EN ANALYSIS
		They should/must be in ascending sort order.

	text-in-fiphdr:(max size)
		Put the text into FipHdr fields of 1000 chrs each for K1-K5
		Use 'hdrchr' to map end-of-lines like CR NL to SPC and quotes etc
	hdrchr:(octal/dec/hex number):(FipSeq chr)
		Replace this character with the new Chr
		This can be a printable chr or an escaped number. The number is
		octal/dec/hex depending on the preceding 'number' keyword (if any).
		eg	hdrchr:313:�
			hdrchr:<:333

	process-headlines:(yes/no) for Rics, Numbers Genres etc
		default is yes

	redundant-primary-host: (hostname)
	redundant-timeout: (seconds)
		For redundant running, this is the primary host
		If ramsflt is running on the secondary :
			If the primary is up and running, files will be deleted after
			the 60 seconds (or the 'redundant-timeout').
			If the primary is down, files are passed to tbe next process.
	balance-group: (group)
		balance group for the DJ-duplicates file to send to the redundant system
	redun-sys: (host1, host2)

	tph-codes: (filename)
		This has the format of :
			(country), (opt city), (+tphno), gibberish
			United Arab Emirates,,+971,,
			United Kingdom,London,+44 20,+44 (0)20,
			United Kingdom,Edinburgh,+44 131,+44 (0)131,
	desk: (source) (desk)
	stop-desk: (desk name)

	source-reuters-flag: (FipSeq)
		Check against the NAOMI_SOURCE_CODES.IP file to see if the
		Attribution or Source Code is RTRS			default: just RTRS
		; ,CRF,Thomson Reuters Commodities and Research Forecasts,Y,Y
		; ,CSE,Copenhagen Stock Exchange,,
		lookup:Q1	TU	file=NAOMI_SOURCE_CODES.FIP key=2 sep=, value=4
		source-reuters-flag:Q1

Where sections of FipHdr fields are required or changes to the output style,
use keywords : fixed, partial, combie, optional, repeat, newdate and/or style.
(see The SysAdmin manual for more information).

	They are normally specified :
		fixed:QZ	1234543
		partial:QT	ST,3,2,U,<,>
		combie:QY	ep|na,(0000000)a
		option:QE	ep,11,7,s
		repeat:QK	XK,-,3
	or	repeat:QP	PK,,4,#X
		style:QS	XN,%.03d


Input Parameters are (all optional) :
Either
	-1 : path/filename for single shot		default: spooled
		The input file is NOT deleted
		If this does NOT start with a '/', it is assumed relative to the current
path.
Or
	-i : input queue			default: spool/ramsfilter
		If this does NOT start with a '/', it is assumed under spool.

	-C : Log interval		default: 600  (for 10 mins)
			set to 0 for no logging
	-z : default parameter file in tables/setup	default: tables/setup/RamsFilter
	-w : file wait for files arriving across a network.	def: 8 secs
	-l : log files in			default: log only totals
	-L : log files in			default: log only totals
	-o : output queue			default: spool/2go
		If this does NOT start with a '/', it is assumed under spool.
	-v : print version number and exit

---NOTES---


  FipHdr fields used are :

	BA	Agency Name from Dateline
	BB	Byline
	BC	City from Dateline
	BD	Date from Dateline
	BH	Preclassified Headline Keyword
	BL	Byte/Chr count (from ipramstxt or iprdfmeta)
	BM	Email from signoff
	BN	Numbers in the Headline
	BR	Rics in the Headline
	BW	Word count (from ipramstxt or iprdfmeta)
	BX	No of Text Segments (from ipramstxt or iprdfmeta)
	BY	Preclassification of Topics/Products - not used for the moment
	BZ	Area from Dateline

if the text-in-fiphdr: is set
	K1:	First 900 chrs
	K2:	Second 900 chrs
	K3:	Third 900 chrs
	K4:	Fourth chunk of chrs - normally around 500 chrs  etc up to value of
TextInFipHdr (3500 by default)
	K9:	Optional last 1000 chrs ONLY if the text size is > 3200/TextInFipHdr

Version Control
;002k	20nov13 cleanups
	;c added K9 as LAST block of text for big files
	;d-k bugs.. totSubbo upped to 40
;001d	 3mar06 added new DateLine/Source filters
;000u	30dec02 original version
	;l 29apr03 added extra-fiphdr + process-headline
		+ digits 0-9 only plus 9-9
	;m 16jun03 cleanups
	;n-p 21jul03 added new Dateline processing;
		only first 8 lines, zap punct from City etc
		added redundancy and balance
	;r-t 16sep03 added redun-sys
	;u 20oct03 added moodys and s&p datelines

(copyright) 2014 and previous years FingerPost Ltd.