ipw4
ipw4
This program generates w4 structures - lists and files - for the w4
browser-based tasting system.
Each incoming file is compared against the parameter file and added to each
directory found.
The text is left unaltered as it assumes 'ipxchg' has already cleaned it up.
A single file can be in none, one or many lists.
A single copy of the file is maintained for all the lists so that when it is
copied/exported, the audit message is inserted at all relevant points in all
relavant lists.
If more than one publication is specified, then a copy of the file is made for
each publication. In this case the audit message is restricted to those lists
belonging to that publication.
To decide which lists a file should be in, each destination in the Fip
destination field 'DU' is compared to all the entries of the 'dest' parameters.
Then the same is done for any 'testforlist' parameters.
Note there MUST always be a DU field - even if you are only using
'testforlist'.
The Parameter file is in tables/w4 and, by default, is called W4. The syntax is
the normal Fip style :
; comment
dest: define which lists an entry is inserted for each destination
eg dest:w4arte list:KULTUR,ARTE
Each list must be defined by a 'list:'
parameter as below.
These are fixed lists. eg
dest:w4soccer list:SOCCER_SUNDAY,SPORT_SUNDAY
list:SOCCER_SUNDAY maxitems:200
list:SPORT_SUNDAY
.. or a specified FipSeq.
dest:w4client list:\DA
list:SUBLIST
default-list-parameters:SUBLIST
If you do not use 'default-list', any non-matching file is ignored.
Case is ignored for the names of the 'list' and 'pub'
There can be multiple lists separated by a comma.
The same 'list' can be defined on several 'dest' lines but only one entry
will be made.
testforlist: define lists for one or more FipHdr tests.
Syntax is
testforlist:(list1,list2,..) (FipHdr)=(test) (FipHdr)#(test)
There can be one or more lists separated by a comma (no spaces)
There can be one or more tests with can be either equal '='
or not equal '#' (not equal can also be '!=' )
For the test a single wildcard '*' can be added at the end.
To test for a blank field (or a field which does not exist),
use double quotes : XY="" ZZ#""
eg: testforlist:AFX_SPORTS SU=afx XC=s* XC#sdd
Both the FipHdr and the Test fields can be FipSeq .. eg
; Check if the source is 'epd'
; should be the XA field XA:epd
; BUT also XA:/AFP-SX77, so repeat on punctuation
; if XA does NOT exist or there is no data, chk SU
repeat:Q1 XA,,1,#x
repeat:Q2 XA,,2,#x
combie:QA Q1|Q2|SU
testforlist:epd QA=epd\$d
Note that 'testforlist' and 'dest' can be equivalent except the test
for 'testforlist' is case INsensitive while for 'dest' sensitive.
list: define the size of a list and optional ticker.
Sub Parameters are :
maxitems maximum number of items
Specify zero to mean all items
default is all files.
maxsize maximum no of chrs of text per item
default is 1000 bytes
ticker-items maximum no of items for the ticker
if not specified, there is no ticker
ticker-size max no of chrs per ticker item
default is 60 bytes
refresh (optional) refresh the main list only
once every X secs. Normally the
main LIST is refreshed on every new file
pub (optional) publication name
It restricts audits to a single publication.
pub:sunday
entry (optional) name of a specific entry if not the default.
group (optional) Group List name
Use this to build collated lists of several sub lists.
eg group:ALL_SPORT
The item is also put in this LIST
The GROUP list is specified as an ordinary List but may/may not have any
'dests' or 'testforlists' pointing to it.
It must be specified BEFORE any other list refers to it.
maint:500 (optional) Trim the Top List to this number of items
NONE - implying no maintenance - ie all items will be
left in the main top list - ** Only use this with
extreme care as the list can get very big !!
MIDNIGHT or 0 (default) - trim the top list to start at midnight
(number of items from 1 to 3000) - just that !
eg list:MOTOR maxitems:0 maxsize:300 ticker-items:100 ticker-size:60
pub: define publication - optional, use only for multi-pub sites.
The same parameter is added to each 'list' line which
means that any audit message will be restricted to that pub
eg pub:sunday
before: text to add at the top of the data file.
after: text to add at the bottom of the data file.
filebefore: file to add at the top of the data file.
fileafter: file to add at the bottom of the data file.
entry: List entry for each file in HTML with FipSeq.
This is the directory line in the LIST.
Special care should be taken if you need to change from the default as
certain key fields are requred for Audit and Search
These include the '<!-- @@## -->' and the first '<br>'.
If more than one entry are specified (up to 100 may be), the first (ie top of
file) is considered the default.
Syntax: entry:(name) (HTML in FipSeq)
see below for an example
entry-abstract: ditto for the abstract part of the list (ie the bit underneath
the clickable link to the data)
search-entry: ditto for the search entry which defaults to
"\\WQ/\\WN|\\$U|\\WK|\\WM|"
ticker-entry: The List entry for Tickers.
This is generally fairly short with no or few comments to reduce the size of
each ticker list.
script: Run a script after the file has been written.
log: Item log entry if not default.
folder: name of a sub-folder under /fip/data/w4 for this list.
This should be used for your own scripts as the standard
Fip w4 does not normally track folders.
default-maxsize:(number of bytes) default is 1000 bytes
metadata-for-source: Define the MetaData for a particular source.
Do NOT put a tab in or NL or CR.
syntax is metadata-for-source:(agencyName) (Meta Strings)
Default the 'default-metadata' keyword
or 'pri=\WP cat=\WC' for non-fip search
and '\WP \WC' for fip search
eg metatdata-for-source:WIRE2 sender=\XU ref=\XR
The Headline (\WK), Source (\SU) and Filename (\SN) are always added
automatically and do not need to be specified.
default-metadata: Define default search metadata
Default meta is 'pri=\WP cat=\WC' for non-fip search
and '\WP \WC' for fip search
Other less often used parameters :
missing-list: (name of list)
If the file is NOT in any other, it is added to this default: file is ignored
default-list-parameters: (name of list)
Use this to 'list' for all default parametes NOT specified.
syndication-list: (FipSeq containing name of client)
default-list-for-syndication: (name of list)
Any file that matches a 'dest' but the 'list' is not specified
uses the parameters as specified by this 'list' but is named
as in the 'dest' line.
ie if DA:biggles and DU:w4planes :
syndication-list:\DA
list:heros maxitems:0
default-list-for-syndication:heros
so the list will be called BIGGLES with no check on the number of items.
use-hour-folders:
Where there is masses of data, store the files in hour folders
in order to improve disk access.
number: default number system - octal,decimal or hex.
chkexists: for NFS or NT mapped drives, a check-file to make sure the
drive is valid
outputdrive: (NT only) drive letter for data
audit-msg: Html string to replace the default audit message
default: <font color=\"green\">Fetched by \\WA at \\WT<br></font>
audit-text: Html string to replace the default audit text point
output-filename: change the output filename
supercede: files with the same name are normally replaced
no-supercede: files with the same name are normally replaced
use this to create new everytime.
owner: Unix only, logon of the owner of the files if not yours
archive: Archive the file in log/data
NewSU: FipHdr field for source if NOT 'SU'
wild: wild string chr for matching if not the default '*'
singlewild: wild single chr for matching if not the default '?'
hostname: Name of this host if not that booted from (for IP address)
log-unmatched-files: If a file is NOT in any list - log it with a !ox flag
allow-deletes: Allow Delete tokens to zap files and list items - default:no
balance-store: (FipSeq) for name of Balance group default: none
Balance all incoming files to this group
chrmap: (old 8 bit chr) (replacement 8 bit chr) default:no
chrmap:\236\243
for FipHdr fields only
list-end-of-line: String (in FipSeq) to flag an end-of-line in a List or
Search
default is none - all endoflines are translated to a space.
Take care not to reuse a special chr which you are using to flag something
else
In particular the text-marker which is usually a TAB or a NL/CR which are
end-of-item.
default-unwrap-abstract: yes/no
if the abtract text is wrapped - at 64 chrs for example - use this to put the
list-end-of-line marker at the end of a para.
Each file is checked for the optional FipHdr field W4_ABSTRACT_UNWRAP: yes/no
which, if found, will override the default.
zap-xml-abstract: yes/no
ZAp all xml <p>, <br> etc in the abstract default: just zap the < and >
Each file is checked for the optional FipHdr field W4_ABSTRACT_ZAP_XML:
yes/no which, if found, will override the default.
hdr-hash:\005
A single chr (usually a control chr - \005 or \035) to use internally in
place of a hash '#' in the FipHdr
default is 035
hdr-passthru:\023
A single chr (usually a control chr eg \023) to use as a placeholder for
another chr (usually a hash)
Normally this is used in conjuction with web/setup/(block).setup and
w4_readfile.pl : passthru:\023#
default is 000 indicating NO passthru chr
allow-flow: (version9) Allow data to input into the Fip Web Flow system
version 0 for pre 2014 multi-instance mods (default)
version 1 for multi-instance
flow-default-section: default section default: fip
flow-default-status: default status default: Input
flow-unique-id: FipSeq for generating the unique-id if there is not a
W4_FLOW_ID
default:\\WR
flow-ext: File extension for files default: fip
Do Not add the '.'
This should match any filemapping on the client side
for flow_edit.pl or flow_read.pl
flow-balance: Balance Group for all data files default:none
Files should have one or more of the FipHdr fields :
W4_FLOW:
This flag is needed to signal the file is part of a flow.
no parameters required
W4_FLOW_SECTION: (section name required - if not default)
W4_FLOW_STATUS: (status required - if not default)
W4_FLOW_ID:(actual ID to use)
Optionally they can also have :
W4_FLOW_L1: (data)
..
W4_FLOW_L9: (data)
These are extra fields for the LISTs in addition to the first line of data.
They can be defaulted using parameters 'flow-default-1' etc
Plus the usual suspects for FipSeq - such as fixed: partial: combie: option:
repeat: style: replace: newdate: etx (pls link to http://www.fingerpost.co.uk
and look for FipSeq )
Ordinary incoming files are checked for FIP header fields :
W4_TOP: name of template file to add before the data of the file.
The full path should be specified.
default: none
W4_BOTTOM: name of template file to add after the data of the file.
The full path should be specified.
default: none
W4_HTML_IN_LIST: This flag will NOT strip any HTML in the List file
Normally all tags - HTML, SGML or XML are stripped for the list
Nor are they counted inthe 'chunks' for a list.
** Please label all Pictures this way : ie in sys/USERS
w4reupix= DP:localhost DQ:2w4 DC:\SC W4_HTML_IN_LIST:
W4_TOP_LIST: name of file to add before the List Entry.
The full path should be specified.
default: none
W4_BOTTOM_LIST: name of template file to add after the List Entry.
The full path should be specified.
default: none
W4_CHRSET: (chrset) Used with -C utf8 to flag files which are already UTF8 and
so need no conversion
This changes both fiphdrs and the abstract
use W4_ABSTRACT_CHRSET: utf8 to change/flag the Abstract only
use W4_FIPHDR_CHRSET: utf8 to change/flag the FipHdr only
The chrset can be blank or utf8
W4_ABSTRACT: (FipSeq)
Replacement for the abstract in the List and Search from the data in this
FipHdr
which is normally the first bit of text OR the entry-abstract:(entryname)
for that service
W4_ABSTRACT_FILE: (FullPathName)
Replacement for the abstract in the List and Search from the contents of
this file
which is normally the first bit of text OR the entry-abstract:(entryname)
for that service
W4_ABSTRACT_UNWRAP: yes/no
unwrap/ do not unwrap the abstract for this file
default: no
W4_ABSTRACT_ZAP_XML: yes/no
remove any XML tags from the abstract
default: no
W4_LIST_DATE: (yyymmdd)
Force the List/Search date to be this
(default is current system time when the file hits the input folder)
Hdr fields for non-text processing - normally added by w4_process_pix.pl or
w4_process_blob.pl
W4_BLOB_TYPE: mime type image/jpeg
W4_BLOB_THUMB: path of thumbnail (under /fip/web/pages)
W4_BLOB_VIEW: path of view file (under /fip/web/pages)
W4_BLOB_PLAY: (ditto)
W4_BLOB_HIRES: path of hires file
W4_BLOB_TEXT: path of text file (under /fip/web/pages)
W4_BLOB_HEAD: Head to use
W4_BLOB_SLUG: Slug to use
W4_BLOB_XCHG: Xchg to use
W4_BLOB_DATE: UTC date to be used for ALL folders/files
W4_BLOB_TIME: UTC time to be used for ALL folders/files
W4_VERSION_DATE: UTC time to be used for ALL folders/files
W4_CHRSET
W4_ABSTRACT_CHRSET
W4_FIPHDR_CHRSET
FIP_COST: (0-9) or X to ignore
FIP_TRUST: (0-9) or X to ignore
FIP_AI: (0-9) or X to ignore
FIP_SPAM: (0-9) or X to ignore
FIP_ADULT: (0-9) or X to ignore
Flag on a scale of 1-no problem to 9-DO NO USE / 0-unknown X-ignore
(uses setup/W4_META.LOOKUP and W4_FEEDS.LOOKUP on display)
FipHdr fields used include :
WM: Mime Type
WZ: Xchg to use when reading the file.
WI: IP address of the host creating this
DS: Supercede this file if it already exists default: yes
XD: DO NOT Supercede this file if it already exists default: yes
WB: if the mimetype is NOT text, use this as replacement text
for the list
WN: filename
WQ: subpath (the top path is assumed as /fip/data)
WL: all the lists this file is in, semicolon separated
WV: all the lists, space separated - for displaying
WD: all the list DELTAS
WG: all the list GROUPS
WJ: Julian day of this file
WH: Date of this file
WC: Category
WP: Priority
WK: Headline
WW: No of words (added 07y1)
W$: No of chrs (added 07y1)
For AUDIT messages, incoming files are checked for FIP header fields :
WA: audit file logon
WT: Time and date of audit
WY: audit message
WN: (From Data) filename
WQ: (From Data) subpath (the top path is assumed as /fip/data)
WL: (From Data) all the lists this file is in, comma separated
WV: (From Data) all the lists, space separated - for displaying
WD: (From Data) all the list DELTAS
WJ: (From Data) Julian day of this file
WH: (From Data) Date of this file
For DELETE messages, incoming files are checked for FIP header fields :
WX: Security checksum for this file
WA: logon of the delete person
WT: Time and date of delete
WN: (From Data) filename
WQ: (From Data) subpath (the top path is assumed as /fip/data)
WL: (From Data) all the lists this file is in, comma separated
WV: (From Data) all the lists, space separated - for displaying
WD: (From Data) all the list DELTAS (semicolon separated)
WJ: (From Data) Julian day of this file
WH: (From Data) Date of this file
For Flow messages, ipw4 will ADD the following FipHdr fields :
WR Duid
WF 1stline of text
(Section and Status are implict in the Flow system and are NOT carried in
FipHdr fields)
(unused are WE, WO, WS
IPW4 uses the following environment variables :
FIP_W4_defEQ default queue default: general
FIP_W4_LINE default line length for \$L def: 80
FIP_W4_WORD default word length for \$W def: 6
\$2 is the second line of text
..
\$9 is the ninth line of text
Input switches (all optional) :
-0 : Use Old Version 0 format files default: current version
-9 : run in Speedy mode default: no
-a : alert file if not the default which is
no publications specified : tables/w4/ALERT
publications specified : tables/w4/ALERT_PUBLICATION
-c : check this queue or file exists before writing files
(for NFS and other mounted queues
- see CHKEXISTS above) default: no
-C : convert list entry characters to .. default: unconverted
-C utf8 convert to utf8
-d : Output Drive (WINNT only) default: drive with Fip on
This is overridden by the 'outputdrive' keyword.
-D : name of a done queue for input files after processing.
If this does not start with a '/', it is assumed to be under /fip/spool.
default: files are deleted
-f : default flow path default: /fip/data/flow
-F : default no of flow sub queues (before 07r was 256). default: 100
-g : do NOT make search Group lists default: do
-l : log all files default: do NOT log
-L : do NOT log files default: do NOT log
-m : UNIX file mask - input to umask for file creation.
default is that set for the starting logon (normally 'fip')
Pls remember this is input as an octal number
eg -m 640 reflects 'rw-r-----' access
-N : use the next/previous flags default: do not
-o : Output path name
default for Version 0 : /fip/spool/w4data
If this does not start with a '/', it is assumed to be under /fip/spool.
default for other versions : /fip/data/w4/
-q : queue to scan default: 2w4
-Q : keep quiet if the queue for the incoming file does not exist
or there are two many duplicates. default:no
-r : reindex - just reindex incoming (resent) files.
do not add to the lists. default: no
do not add the files either.
-R : reindex - just reindex incoming (resent) files.
do not add to the lists. default: no
-s : using external Search default: fip search
with a search Group list too
-S : using external Search default: fip search
WITHOUT a search Group list too
-t : sleep time betwix scans default: 1 sec
-T : name of search tickers file default: none
-u : default owner for ALL files. default: that of 'ip'
This may be overridden by the 'owner' parameter.
-V : version default: 8
0 - html lists
5 - audit in list
8 - filsize in lists
-X : No Search file nor Index file required default: fip search
-z : default parameter file default: tables/w4/W4
-Z : default 2nd/BreakingItems parameter file default: none
-v : print version number and exit
---------- Example ----------
pub:herald
pub:times
pub:sunday
; Text at start of file - Put time stamp and cross references at the end of the
file
filebefore:/fip/web/setup/w4.file.top
; Text at end of file
fileafter:/fip/web/setup/w4.file.bottom
; The aim is to have a cross reference to a file in a directory below this
level,
; with the SU as the name of the directory where stories are saved
entry:default <DT><!-- \$U CAT:\XC PRI:\XP --><a
href="/fip-cgi/pick_showlist.pl?Fipid=91251948919514&file=19981201_rtr/reu4052.0502.html"
TARGET="wirecopy_window"> <IMG SRC=/fip-pages/gifs/crush.gif width=10 height=10
border=0> </a><A HREF="/fip-cgi/wir_readfile.pl?Fipid=##FIPID##&file=\WQ/\WN"
TARGET="wirecopy_window">\WK</A>\s<FONT SIZE=-1 FACE="Helvetica"
COLOR="red">(\s\$D \$M \$Y,\s\$H:\$N<!-- ##@@ -->\s)</FONT><BR>
; Run Verity index program afterwards
script:/bin/echo "/fip/data/w4/files/\WQ/\WN" > /fip/spool/2verity/\WN
; Actual lists
list:ALL_WIRES_HER maxitems:0 maxsize:300 ticker-items:100 ticker-size:25
pub:herald
list:ALL_WIRES_SUN maxitems:0 maxsize:300 ticker-items:100 ticker-size:25
pub:sunday
list:AP_ADVISORIES_HER maxitems:0 maxsize:300 ticker-items:100 ticker-size:25
pub:herald
; ----------------------------------------------------------------------
; Associated Press/Press Association/Reuters
dest:all_wires list:ALL_WIRES_HER,ALL_WIRES_SUN
;
; Associated Press
;
dest:ap_advisories list:AP_ADVISORIES_HER,AP_ADVISORIES_SUN,AP_ADVISORIES_ET
; NO Financial for Evening Times
dest:ap_financial list:AP_FINANCIAL_HER,AP_FINANCIAL_SUN
audit-msg:Read by \WA at \WT
----------------------------------------------------------------------
Notes
- Installation
Do you need to run UTF8 ???
SYSTEM - ipw4 -l -N -C utf8 -T sticker
USERS
- just text needs
w4cp1251 DP:localhost DQ:2w4 W4_ABSTRACT_DC:W4ABS DC:\SC CX:PREW4
W4_ABSTRACT_CHRSET:UTF8
- fiphdr and text
w4cp1251 DP:localhost DQ:2w4 W4_ABSTRACT_DC:W4ABS DC:\SC CX:PREW4
W4_CHRSET:UTF8
- xchg
(SC)2W4ABS
; W4 Abstract - Russian (CP1251) to UTF8
;
; Default character set
c:isoascii
z:chghdr:IH,HK
z:convert-fiphdr:utf8,map
z:unicode-map:CP1251.TXT
; Convert to UTF-8
z:convert-to-utf8
++ TUNING POINTS ++
- BALANCE
Using 'balance-store', all incoming data is balanced - data and audits
.. which means no LISTS need to be balance
- examples for the SYSTEM
Using Glimpse as the Search - including Groups (or Collated)
w4 local ipw4 -l -s
Using Verity as the Search - Excluding Groups (or Collated)
w4 local ipw4 -l -S
Using Fip Search - with Groups
w4 local ipw4 -l
Using Fip Search - withOUT Groups
w4 local ipw4 -l -g
-- What if it is not text in the incoming file
- check a couple of areas
1.1 FipHdr field WM does NOT start 'text'
1.2 and there is NO fiphdr market W4_TEXT_REPLACEMENT
Nothing will be put out - except the contents of an optional fiphdr field
WB
- and/or 2.1 add FipHdr field W4_LIST_ABSTRACT with the text/html to
- and/or 3.1 match the entry-abstract
----------------------------------------------------------------------
Version Control
;7z54 25jul02 added flow (do not use versions 7a or 7b) (7d for WR)
;h 13may03 audit on other sys was broken.
;i 10jun03 flow - added delete BEFORE adding search link
;j 08aug03 bugette with large files.
;k-o 23apr04 bugette with Audit...
;p-q 05sep04 zippy and timing stats
;r-u 03feb05 added -F for flow queues 256 -> 100
;v-w 18apr05 added flow-balance-group
;x1 23sep07 buggette in filename - not always unique!
;y1 19mar08 redid search meta to allow for FipSearch too/add WW and W$
;2-3 16may08 added -C utf8 and chrmap
;z4 6jun08 added next/prv 'np' and -N ;5-6 bugette in search WW/W$
;8 added default-maxsize ;9 18nov08 added W4_CHRSET
;10 28nov08 bugette with utf8 ;11-12 entry-abstract added
;13-14 29dec08 added W4_DATE ; 15 bugette with utf8
;16-18 added eolnList and unwrapAbs
;19-24 added W4_ABSTRACT_FILE/UNWRAP/ZAP_XML/CHRSET
;25 added missing-list: to hold files not in any other list ;26 minor bugette
if zero length file
;27-28 added hdr-hash and bugette with size of W$
;30 30apr14 file-trace ;31 buffer sizes -> STDBUF ;32 added zap-xml-abstract
;33 allow-flow:1 for multi-instance
;34 12may15 made search-entry variable ;35 added SX tracking !
;36-40 better NPseqno - and max items for Ticker ; added hdr-passthru
;41-43 10may20 better blobs and abstracts ;44 npSeqno MUST be in list as
1file can be in multiple Lists
;45 -m reworked as an OCTAL number
;46 zap the x/w4/abs file better
;47 13sep23 added locking
;48 24apr24 BUGette with Sticker
;49 21oct24 BUGette with size of SH and WK
;50 25jan25 added W4_BLOB_TEXT HEAD SLUG DATE
;51 3mar25 balance-store is now strparse
;52-54 18mar25 added FIP_COST etc and -Z 2nd param file ;54 buglette -
duplicate entries in same list
;06l 14feb01 added mimetypes with different entries
;a/b 23feb01 added WG: for groups as fiphdr field in file and -X
;c/d/e 26feb01 added W4_HTML_IN_LIST:
;f 08may01 maint:none
;g/h/i 10sep01 testforlist not catching NOT fields (XC#ABC or XC!=ABC)
;j/k 18mar02 added syndication stuff to version 1
;l 14may02 added log-unmatched-files
;05g 09aug00 version 2 lists -cdef
;b added -S and w4index for external searches
;c 17oct00 bugette for txt64 for BIG (>64k) files
;d 02nov00 added Groups plus -Reindex
;e 14nov00 added metadata-for-source, audit balanced.
;f 26nov00 added -s and addGroupSearch
;g 14feb01 cleanup
(copyright) 2025 and previous years FingerPost Ltd.