ipramsflt

ipramsflt

This program is used to filter and process a Reuters AMS/SelectFeed plus
stream.

a.1 Remove Dow Jones duplicates
    - check StoryDT
    - save ALL Headlines in another structure
    - check Headline
    - reload on startup
    - purge according to DropDate/Time AND PNAC

a.2 Chr translations using a mapping table in pram file for quotes etc

a.3 mangle location for Agency/Area/City/Dateline/By/Email

a.4 PreClassification of Headings - ANALYSIS, UPDATE etc according to Language

a.5 -not done - PreClass of Topics/Products according to ben's algo.

a.6 (mar06) new means of locating the City

a.7 Total number of Elements, Words and Chrs - was done in ipramstxt (BX, BW,
BL) but NOT now as we need NEP data

b. The following items have already been done by ipramstxt.

b.1 Aggregate Story elements

b.2 NONO - Total number of Elements, Words and Chrs - done in ipramstxt (BX,
BW, BL)

b.3 De-duplicate the incoming data of redundant/already sent elements.

Keywords for RamsFilter parameter file are :

    ; comment line
    dest:   Destination (as in sys/USERS)
    extra-fiphdr:   more fixed Fip Hdr fields to add to the file (before any new
matched additions)
    script: script to run against the New file.     default: none
    newname: name of the output file.

    preclass-headlines:RamsFlt.head-preclass.fip
        if we want to preclass headlines, this is a language/keywords.
        They syntax is  ; comment line
                (lang) (spc or tabs) (kwd)
                EN ANALYSIS
        They should/must be in ascending sort order.

    text-in-fiphdr:(max size)
        Put the text into FipHdr fields of 1000 chrs each for K1-K5
        Use 'hdrchr' to map end-of-lines like CR NL to SPC and quotes etc
    hdrchr:(octal/dec/hex number):(FipSeq chr)
        Replace this character with the new Chr
        This can be a printable chr or an escaped number. The number is
        octal/dec/hex depending on the preceding 'number' keyword (if any).
        eg  hdrchr:\313:�
            hdrchr:<:\333

    process-headlines:(yes/no) for Rics, Numbers Genres etc
        default is yes

    redundant-primary-host: (hostname)
    redundant-timeout: (seconds)
        For redundant running, this is the primary host
        If ramsflt is running on the secondary :
            If the primary is up and running, files will be deleted after
            the 60 seconds (or the 'redundant-timeout').
            If the primary is down, files are passed to tbe next process.
    balance-group: (group)
        balance group for the DJ-duplicates file to send to the redundant system
    redun-sys: (host1, host2)

    tph-codes: (filename)
        This has the format of :
            (country), (opt city), (+tphno), gibberish
            United Arab Emirates,,+971,,
            United Kingdom,London,+44 20,+44 (0)20,
            United Kingdom,Edinburgh,+44 131,+44 (0)131,
    desk: (source) (desk)
    stop-desk: (desk name)

    allow-unicode-in-fiphdrs:yes/no
        contents of BC, BB, BC, BD and BZ
        NO  - should be plain ascii only (default)
        YES - allow unicode

    source-reuters-flag: (FipSeq)
        Check against the NAOMI_SOURCE_CODES.IP file to see if the
        Attribution or Source Code is RTRS          default: just RTRS
        ; ,CRF,Thomson Reuters Commodities and Research Forecasts,Y,Y
        ; ,CSE,Copenhagen Stock Exchange,,
        lookup:Q1   TU  file=NAOMI_SOURCE_CODES.FIP key=2 sep=, value=4
        source-reuters-flag:\Q1

Where sections of FipHdr fields are required or changes to the output style,
use keywords : fixed, partial, combie, optional, repeat, newdate and/or style.
(see The SysAdmin manual for more information).

    They are normally specified :
        fixed:QZ    1234543
        partial:QT  ST,3,2,U,<,>
        combie:QY   ep|na,(0000000)a
        option:QE   ep,11,7,s
        repeat:QK   XK,-,3
    or  repeat:QP   PK,,4,#X
        style:QS    XN,%.03d

Input Parameters are (all optional) :
Either
    -1 : path/filename for single shot      default: spooled
        The input file is NOT deleted
        If this does NOT start with a '/', it is assumed relative to the current
path.
Or
    -A : do NOT calculate BW, BW, BL    default: do!
    -i : input queue            default: spool/ramsfilter
        If this does NOT start with a '/', it is assumed under spool.

    -C : Log interval       default: 600  (for 10 mins)
            set to 0 for no logging
    -z : default parameter file in tables/setup default: tables/setup/RamsFilter
    -w : file wait for files arriving across a network. def: 8 secs
    -l : log files in           default: log only totals
    -L : log files in           default: log only totals
    -o : output queue           default: spool/2meta_nfcp
        If this does NOT start with a '/', it is assumed under spool.
    -v : print version number and exit

---NOTES---

  FipHdr fields used are :

    BA  Agency Name from Dateline
    BB  Byline
    BC  City from Dateline
    BD  Date from Dateline
    BH  Preclassified Headline Keyword
    BL  Byte/Chr count (now in ipramsflt - was from ipramstxt or iprdfmeta)
    BM  Email from signoff
    BN  Numbers in the Headline
    BR  Rics in the Headline
    BW  Word count (now in ipramsflt - was from ipramstxt or iprdfmeta)
    BX  No of Text Segments (now in ipramsflt - was from ipramstxt or iprdfmeta)
    BY  Preclassification of Topics/Products - not used for the moment
    BZ  Area from Dateline

if the text-in-fiphdr: is set
    K1: First 900 chrs
    K2: Second 900 chrs
    K3: Third 900 chrs
    K4: Fourth chunk of chrs - normally around 500 chrs  etc up to value of
TextInFipHdr (3500 by default)
    K9: Optional last 1000 chrs ONLY if the text size is > 3200/TextInFipHdr

Version Control
;02r    20nov13 cleanups
    ;c added K9 as LAST block of text for big files
    ;d-k bugs.. totSubbo upped to 40
    ;l-q 24jun15 added BX,BW,BL (counts for eles, words, chrs) as rdfmeta has no
data (now in NEP)
;01d     3mar06 added new DateLine/Source filters
;00u    30dec02 original version
    ;l 29apr03 added extra-fiphdr + process-headline
        + digits 0-9 only plus 9-9
    ;m 16jun03 cleanups
    ;n-p 21jul03 added new Dateline processing;
        only first 8 lines, zap punct from City etc
        added redundancy and balance
    ;r-t 16sep03 added redun-sys
    ;u 20oct03 added moodys and s&p datelines

(copyright) 2024 and previous years FingerPost Ltd.