ipw4 This program generates w4 structures - lists and files - for the w4 browser-based tasting system. Each incoming file is compared against the parameter file and added to each directory found. The text is left unaltered as it assumes 'ipxchg' has already cleaned it up. A single file can be in none, one or many lists. A single copy of the file is maintained for all the lists so that when it is copied/exported, the audit message is inserted at all relevant points in all relavant lists. If more than one publication is specified, then a copy of the file is made for each publication. In this case the audit message is restricted to those lists belonging to that publication. To decide which lists a file should be in, each destination in the Fip destination field 'DU' is compared to all the entries of the 'dest' parameters. Then the same is done for any 'testforlist' parameters. Note there MUST always be a DU field - even if you are only using 'testforlist'. The Parameter file is in tables/w4 and, by default, is called W4. The syntax is the normal Fip style : ; comment dest: define which lists an entry is inserted for each destination eg dest:w4arte list:KULTUR,ARTE Each list must be defined by a 'list:' parameter as below. These are fixed lists. eg dest:w4soccer list:SOCCER_SUNDAY,SPORT_SUNDAY list:SOCCER_SUNDAY maxitems:200 list:SPORT_SUNDAY .. or a specified FipSeq. dest:w4client list:\DA list:SUBLIST default-list-parameters:SUBLIST If you do not use 'default-list', any non-matching file is ignored. Case is ignored for the names of the 'list' and 'pub' There can be multiple lists separated by a comma. The same 'list' can be defined on several 'dest' lines but only one entry will be made. testforlist: define lists for one or more FipHdr tests. Syntax is testforlist:(list1,list2,..) (FipHdr)=(test) (FipHdr)#(test) There can be one or more lists separated by a comma (no spaces) There can be one or more tests with can be either equal '=' or not equal '#' (not equal can also be '!=' ) For the test a single wildcard '*' can be added at the end. To test for a blank field (or a field which does not exist), use double quotes : XY="" ZZ#"" eg: testforlist:AFX_SPORTS SU=afx XC=s* XC#sdd Both the FipHdr and the Test fields can be FipSeq .. eg ; Check if the source is 'epd' ; should be the XA field XA:epd ; BUT also XA:/AFP-SX77, so repeat on punctuation ; if XA does NOT exist or there is no data, chk SU repeat:Q1 XA,,1,#x repeat:Q2 XA,,2,#x combie:QA Q1|Q2|SU testforlist:epd QA=epd\$d Note that 'testforlist' and 'dest' can be equivalent except the test for 'testforlist' is case INsensitive while for 'dest' sensitive. list: define the size of a list and optional ticker. Sub Parameters are : maxitems maximum number of items Specify zero to mean all items default is all files. maxsize maximum no of chrs of text per item default is 1000 bytes ticker-items maximum no of items for the ticker if not specified, there is no ticker ticker-size max no of chrs per ticker item default is 60 bytes refresh (optional) refresh the main list only once every X secs. Normally the main LIST is refreshed on every new file pub (optional) publication name It restricts audits to a single publication. pub:sunday entry (optional) name of a specific entry if not the default. group (optional) Group List name Use this to build collated lists of several sub lists. eg group:ALL_SPORT The item is also put in this LIST The GROUP list is specified as an ordinary List but may/may not have any 'dests' or 'testforlists' pointing to it. It must be specified BEFORE any other list refers to it. maint:500 (optional) Trim the Top List to this number of items NONE - implying no maintenance - ie all items will be left in the main top list - ** Only use this with extreme care as the list can get very big !! MIDNIGHT or 0 (default) - trim the top list to start at midnight (number of items from 1 to 3000) - just that ! eg list:MOTOR maxitems:0 maxsize:300 ticker-items:100 ticker-size:60 pub: define publication - optional, use only for multi-pub sites. The same parameter is added to each 'list' line which means that any audit message will be restricted to that pub eg pub:sunday before: text to add at the top of the data file. after: text to add at the bottom of the data file. filebefore: file to add at the top of the data file. fileafter: file to add at the bottom of the data file. entry: List entry for each file in HTML with FipSeq. This is the directory line in the LIST. Special care should be taken if you need to change from the default as certain key fields are requred for Audit and Search These include the '<!-- @@## -->' and the first '<br>'. If more than one entry are specified (up to 100 may be), the first (ie top of file) is considered the default. Syntax: entry:(name) (HTML in FipSeq) see below for an example entry-abstract: ditto for the abstract part of the list (ie the bit underneath the clickable link to the data) search-entry: ditto for the search entry which defaults to "\\WQ/\\WN|\\$U|\\WK|\\WM|" ticker-entry: The List entry for Tickers. This is generally fairly short with no or few comments to reduce the size of each ticker list. script: Run a script after the file has been written. log: Item log entry if not default. folder: name of a sub-folder under /fip/data/w4 for this list. This should be used for your own scripts as the standard Fip w4 does not normally track folders. default-maxsize:(number of bytes) default is 1000 bytes metadata-for-source: Define the MetaData for a particular source. Do NOT put a tab in or NL or CR. syntax is metadata-for-source:(agencyName) (Meta Strings) Default the 'default-metadata' keyword or 'pri=\WP cat=\WC' for non-fip search and '\WP \WC' for fip search eg metatdata-for-source:WIRE2 sender=\XU ref=\XR The Headline (\WK), Source (\SU) and Filename (\SN) are always added automatically and do not need to be specified. default-metadata: Define default search metadata Default meta is 'pri=\WP cat=\WC' for non-fip search and '\WP \WC' for fip search Other less often used parameters : missing-list: (name of list) If the file is NOT in any other, it is added to this default: file is ignored default-list-parameters: (name of list) Use this to 'list' for all default parametes NOT specified. syndication-list: (FipSeq containing name of client) default-list-for-syndication: (name of list) Any file that matches a 'dest' but the 'list' is not specified uses the parameters as specified by this 'list' but is named as in the 'dest' line. ie if DA:biggles and DU:w4planes : syndication-list:\DA list:heros maxitems:0 default-list-for-syndication:heros so the list will be called BIGGLES with no check on the number of items. use-hour-folders: Where there is masses of data, store the files in hour folders in order to improve disk access. number: default number system - octal,decimal or hex. chkexists: for NFS or NT mapped drives, a check-file to make sure the drive is valid outputdrive: (NT only) drive letter for data audit-msg: Html string to replace the default audit message default: <font color=\"green\">Fetched by \\WA at \\WT<br></font> audit-text: Html string to replace the default audit text point output-filename: change the output filename supercede: files with the same name are normally replaced no-supercede: files with the same name are normally replaced use this to create new everytime. owner: Unix only, logon of the owner of the files if not yours archive: Archive the file in log/data NewSU: FipHdr field for source if NOT 'SU' wild: wild string chr for matching if not the default '*' singlewild: wild single chr for matching if not the default '?' hostname: Name of this host if not that booted from (for IP address) log-unmatched-files: If a file is NOT in any list - log it with a !ox flag allow-deletes: Allow Delete tokens to zap files and list items - default:no chrmap: (old 8 bit chr) (replacement 8 bit chr) default:no chrmap:\236\243 for FipHdr fields only list-end-of-line: String (in FipSeq) to flag an end-of-line in a List or Search default is none - all endoflines are translated to a space. Take care not to reuse a special chr which you are using to flag something else In particular the text-marker which is usually a TAB or a NL/CR which are end-of-item. default-unwrap-abstract: yes/no if the abtract text is wrapped - at 64 chrs for example - use this to put the list-end-of-line marker at the end of a para. Each file is checked for the optional FipHdr field W4_ABSTRACT_UNWRAP: yes/no which, if found, will override the default. zap-xml-abstract: yes/no ZAp all xml <p>, <br> etc in the abstract default: just zap the < and > Each file is checked for the optional FipHdr field W4_ABSTRACT_ZAP_XML: yes/no which, if found, will override the default. hdr-hash:\005 A single chr (usually a control chr - \005 or \035) to use internally in place of a hash '#' in the FipHdr default is 035 hdr-passthru:\023 A single chr (usually a control chr eg \023) to use as a placeholder for another chr (usually a hash) Normally this is used in conjuction with web/setup/(block).setup and w4_readfile.pl : passthru:\023# default is 000 indicating NO passthru chr allow-flow: (version9) Allow data to input into the Fip Web Flow system version 0 for pre 2014 multi-instance mods (default) version 1 for multi-instance flow-default-section: default section default: fip flow-default-status: default status default: Input flow-unique-id: FipSeq for generating the unique-id if there is not a W4_FLOW_ID default:\\WR flow-ext: File extension for files default: fip Do Not add the '.' This should match any filemapping on the client side for flow_edit.pl or flow_read.pl flow-balance: Balance Group for all data files default:none Files should have one or more of the FipHdr fields : W4_FLOW: This flag is needed to signal the file is part of a flow. no parameters required W4_FLOW_SECTION: (section name required - if not default) W4_FLOW_STATUS: (status required - if not default) W4_FLOW_ID:(actual ID to use) Optionally they can also have : W4_FLOW_L1: (data) .. W4_FLOW_L9: (data) These are extra fields for the LISTs in addition to the first line of data. They can be defaulted using parameters 'flow-default-1' etc Plus the usual suspects for FipSeq - such as fixed: partial: combie: option: repeat: style: replace: newdate: etx (pls link to http://www.fingerpost.co.uk and look for FipSeq ) Ordinary incoming files are checked for FIP header fields : W4_TOP: name of template file to add before the data of the file. The full path should be specified. default: none W4_BOTTOM: name of template file to add after the data of the file. The full path should be specified. default: none W4_HTML_IN_LIST: This flag will NOT strip any HTML in the List file Normally all tags - HTML, SGML or XML are stripped for the list Nor are they counted inthe 'chunks' for a list. ** Please label all Pictures this way : ie in sys/USERS w4reupix= DP:localhost DQ:2w4 DC:\SC W4_HTML_IN_LIST: W4_TOP_LIST: name of file to add before the List Entry. The full path should be specified. default: none W4_BOTTOM_LIST: name of template file to add after the List Entry. The full path should be specified. default: none W4_CHRSET: (chrset) Used with -C utf8 to flag files which are already UTF8 and so need no conversion This changes both fiphdrs and the abstract use W4_ABSTRACT_CHRSET: utf8 to change/flag the Abstract only use W4_FIPHDR_CHRSET: utf8 to change/flag the FipHdr only The chrset can be blank or utf8 W4_ABSTRACT: (FipSeq) Replacement for the abstract in the List and Search from the data in this FipHdr which is normally the first bit of text OR the entry-abstract:(entryname) for that service W4_ABSTRACT_FILE: (FullPathName) Replacement for the abstract in the List and Search from the contents of this file which is normally the first bit of text OR the entry-abstract:(entryname) for that service W4_ABSTRACT_UNWRAP: yes/no unwrap/ do not unwrap the abstract for this file default: no W4_ABSTRACT_ZAP_XML: yes/no remove any XML tags from the abstract default: no W4_LIST_DATE: (yyymmdd) Force the List/Search date to be this (default is current system time when the file hits the input folder) QAZ W4_BLOB_TYPE: W4_BLOB_THUMB: W4_BLOB_VIEW: W4_BLOB_PLAY: FipHdr fields used include : WM: Mime Type WZ: Xchg to use when reading the file. WI: IP address of the host creating this DS: Supercede this file if it already exists default: yes XD: DO NOT Supercede this file if it already exists default: yes WB: if the mimetype is NOT text, use this as replacement text for the list WN: filename WQ: subpath (the top path is assumed as /fip/data) WL: all the lists this file is in, semicolon separated WV: all the lists, space separated - for displaying WD: all the list DELTAS WG: all the list GROUPS WJ: Julian day of this file WH: Date of this file WC: Category WP: Priority WK: Headline WW: No of words (added 07y1) W$: No of chrs (added 07y1) For AUDIT messages, incoming files are checked for FIP header fields : WA: audit file logon WT: Time and date of audit WY: audit message WN: (From Data) filename WQ: (From Data) subpath (the top path is assumed as /fip/data) WL: (From Data) all the lists this file is in, comma separated WV: (From Data) all the lists, space separated - for displaying WD: (From Data) all the list DELTAS WJ: (From Data) Julian day of this file WH: (From Data) Date of this file For DELETE messages, incoming files are checked for FIP header fields : WX: Security checksum for this file WA: logon of the delete person WT: Time and date of delete WN: (From Data) filename WQ: (From Data) subpath (the top path is assumed as /fip/data) WL: (From Data) all the lists this file is in, comma separated WV: (From Data) all the lists, space separated - for displaying WD: (From Data) all the list DELTAS (semicolon separated) WJ: (From Data) Julian day of this file WH: (From Data) Date of this file For Flow messages, ipw4 will ADD the following FipHdr fields : WR Duid WF 1stline of text (Section and Status are implict in the Flow system and are NOT carried in FipHdr fields) (unused are WE, WO, WS IPW4 uses the following environment variables : FIP_W4_defEQ default queue default: general FIP_W4_LINE default line length for \$L def: 80 FIP_W4_WORD default word length for \$W def: 6 \$2 is the second line of text .. \$9 is the ninth line of text Input switches (all optional) : -0 : Use Old Version 0 format files default: current version -9 : run in Speedy mode default: no -a : alert file if not the default which is no publications specified : tables/w4/ALERT publications specified : tables/w4/ALERT_PUBLICATION -c : check this queue or file exists before writing files (for NFS and other mounted queues - see CHKEXISTS above) default: no -C : convert list entry characters to .. default: unconverted -C utf8 convert to utf8 -d : Output Drive (WINNT only) default: drive with Fip on This is overridden by the 'outputdrive' keyword. -D : name of a done queue for input files after processing. If this does not start with a '/', it is assumed to be under /fip/spool. default: files are deleted -f : default flow path default: /fip/data/flow -F : default no of flow sub queues (before 07r was 256). default: 100 -g : do NOT make search Group lists default: do -l : log all files default: do NOT log -L : do NOT log files default: do NOT log -m : UNIX file mask - input to umask for file creation. default is that set for the starting logon (normally 'fip') Pls remember this is input as an octal number eg -m 640 reflects 'rw-r-----' access -N : use the next/previous flags default: do not -o : Output path name default for Version 0 : /fip/spool/w4data If this does not start with a '/', it is assumed to be under /fip/spool. default for other versions : /fip/data/w4/ -q : queue to scan default: 2w4 -Q : keep quiet if the queue for the incoming file does not exist or there are two many duplicates. default:no -r : reindex - just reindex incoming (resent) files. do not add to the lists. default: no do not add the files either. -R : reindex - just reindex incoming (resent) files. do not add to the lists. default: no -s : using external Search default: fip search with a search Group list too -S : using external Search default: fip search WITHOUT a search Group list too -t : sleep time betwix scans default: 1 sec -T : name of search tickers file default: none -u : default owner for ALL files. default: that of 'ip' This may be overridden by the 'owner' parameter. -V : version default: 8 0 - html lists 5 - audit in list 8 - filsize in lists -X : No Search file nor Index file required default: fip search -z : default parameter file default: tables/w4/W4 -v : print version number and exit ---------- Example ---------- pub:herald pub:times pub:sunday ; Text at start of file - Put time stamp and cross references at the end of the file filebefore:/fip/web/setup/w4.file.top ; Text at end of file fileafter:/fip/web/setup/w4.file.bottom ; The aim is to have a cross reference to a file in a directory below this level, ; with the SU as the name of the directory where stories are saved entry:default <DT><!-- \$U CAT:\XC PRI:\XP --><a href="/fip-cgi/pick_showlist.pl?Fipid=91251948919514&file=19981201_rtr/reu4052.0502.html" TARGET="wirecopy_window"> <IMG SRC=/fip-pages/gifs/crush.gif width=10 height=10 border=0> </a><A HREF="/fip-cgi/wir_readfile.pl?Fipid=##FIPID##&file=\WQ/\WN" TARGET="wirecopy_window">\WK</A>\s<FONT SIZE=-1 FACE="Helvetica" COLOR="red">(\s\$D \$M \$Y,\s\$H:\$N<!-- ##@@ -->\s)</FONT><BR> ; Run Verity index program afterwards script:/bin/echo "/fip/data/w4/files/\WQ/\WN" > /fip/spool/2verity/\WN ; Actual lists list:ALL_WIRES_HER maxitems:0 maxsize:300 ticker-items:100 ticker-size:25 pub:herald list:ALL_WIRES_SUN maxitems:0 maxsize:300 ticker-items:100 ticker-size:25 pub:sunday list:AP_ADVISORIES_HER maxitems:0 maxsize:300 ticker-items:100 ticker-size:25 pub:herald ; ---------------------------------------------------------------------- ; Associated Press/Press Association/Reuters dest:all_wires list:ALL_WIRES_HER,ALL_WIRES_SUN ; ; Associated Press ; dest:ap_advisories list:AP_ADVISORIES_HER,AP_ADVISORIES_SUN,AP_ADVISORIES_ET ; NO Financial for Evening Times dest:ap_financial list:AP_FINANCIAL_HER,AP_FINANCIAL_SUN audit-msg:Read by \WA at \WT ---------------------------------------------------------------------- Notes - Installation Do you need to run UTF8 ??? SYSTEM - ipw4 -l -N -C utf8 -T sticker USERS - just text needs w4cp1251 DP:localhost DQ:2w4 W4_ABSTRACT_DC:W4ABS DC:\SC CX:PREW4 W4_ABSTRACT_CHRSET:UTF8 - fiphdr and text w4cp1251 DP:localhost DQ:2w4 W4_ABSTRACT_DC:W4ABS DC:\SC CX:PREW4 W4_CHRSET:UTF8 - xchg (SC)2W4ABS ; W4 Abstract - Russian (CP1251) to UTF8 ; ; Default character set c:isoascii z:chghdr:IH,HK z:convert-fiphdr:utf8,map z:unicode-map:CP1251.TXT ; Convert to UTF-8 z:convert-to-utf8 - Tuning points For the OLD system (-0 input switch or pre version 05 of ipw4) if you have any Lists which will : either have more than 0.5 mb of data at the end of the day or have more than 3 or 4 items per minute Then use the 'refresh' parameter in the list to put the last x secs worth of data into the cache file. This does not affect searching or anythingelse - but it really speeds up processing. In particular this covers wires like Dow Jones, Bloomberg, Business Wire and Bridge/KR. This is not applicable for the ipw4 05+ as the new lists are in chucks of 100 files. - examples for the SYSTEM Using Glimpse as the Search - including Groups (or Collated) w4 local ipw4 -l -s Using Verity as the Search - Excluding Groups (or Collated) w4 local ipw4 -l -S Using Fip Search - with Groups w4 local ipw4 -l Using Fip Search - withOUT Groups w4 local ipw4 -l -g -- What if it is not text in the incoming file - check a couple of areas 1.1 FipHdr field WM does NOT start 'text' 1.2 and there is NO fiphdr market W4_TEXT_REPLACEMENT Nothing will be put out - except the contents of an optional fiphdr field WB - and/or 2.1 add FipHdr field W4_LIST_ABSTRACT with the text/html to - and/or 3.1 match the entry-abstract ---------------------------------------------------------------------- Version Control ;7z48 25jul02 added flow (do not use versions 7a or 7b) (7d for WR) ;h 13may03 audit on other sys was broken. ;i 10jun03 flow - added delete BEFORE adding search link ;j 08aug03 bugette with large files. ;k-o 23apr04 bugette with Audit... ;p-q 05sep04 zippy and timing stats ;r-u 03feb05 added -F for flow queues 256 -> 100 ;v-w 18apr05 added flow-balance-group ;x1 23sep07 buggette in filename - not always unique! ;y1 19mar08 redid search meta to allow for FipSearch too/add WW and W$ ;2-3 16may08 added -C utf8 and chrmap ;z4 6jun08 added next/prv 'np' and -N ;5-6 bugette in search WW/W$ ;8 added default-maxsize ;9 18nov08 added W4_CHRSET ;10 28nov08 bugette with utf8 ;11-12 entry-abstract added ;13-14 29dec08 added W4_DATE ; 15 bugette with utf8 ;16-18 added eolnList and unwrapAbs ;19-24 added W4_ABSTRACT_FILE/UNWRAP/ZAP_XML/CHRSET ;25 added missing-list: to hold files not in any other list ;26 minor bugette if zero length file ;27-28 added hdr-hash and bugette with size of W$ ;30 30apr14 file-trace ;31 buffer sizes -> STDBUF ;32 added zap-xml-abstract ;33 allow-flow:1 for multi-instance ;34 12may15 made search-entry variable ;35 added SX tracking ! ;36-40 better NPseqno - and max items for Ticker ; added hdr-passthru ;41-43 10may20 better blobs and abstracts ;44 npSeqno MUST be in list as 1file can be in multiple Lists ;45 -m reworked as an OCTAL number ;46 zap the x/w4/abs file better ;47 13sep23 added locking ;48 24apr24 BUGette with Sticker ;06l 14feb01 added mimetypes with different entries ;a/b 23feb01 added WG: for groups as fiphdr field in file and -X ;c/d/e 26feb01 added W4_HTML_IN_LIST: ;f 08may01 maint:none ;g/h/i 10sep01 testforlist not catching NOT fields (XC#ABC or XC!=ABC) ;j/k 18mar02 added syndication stuff to version 1 ;l 14may02 added log-unmatched-files ;05g 09aug00 version 2 lists -cdef ;b added -S and w4index for external searches ;c 17oct00 bugette for txt64 for BIG (>64k) files ;d 02nov00 added Groups plus -Reindex ;e 14nov00 added metadata-for-source, audit balanced. ;f 26nov00 added -s and addGroupSearch ;g 14feb01 cleanup (copyright) 2024 and previous years FingerPost Ltd.