Outstanding featurette/bug with tag:a strip:none - misses the end tag NOTE : split sequence numbers start at one. ipxml/sgml/wml/html/newsml/json This program is used to convert into, out of and between different tagged format files such as XML or SGML or variants like NITF, NewsML, XHTML, WML or HTML. It can also be used to pull apart JSON files for the same effect. Generally it can be used to convert things like : - NewsML <-> IPTC 7901 or ANPA 1312 - NITF <-> plain ascii - XML <-> HTML - SGML <-> NITF - SGML <-> plain ascii - WML <-> ascii - HTML/NITF tables -> inline markup for Quark, InDesign or other Editorial systems Data can be extracted from the SGML tags or attributes and formatted into text eg. - convert and/or replace the data within a tag - plain ascii files -> XML possibly using FipHdr fields to create tagged data Definitions and Glossary tag something between '<' and '>' eg. <BODY> usually ending with tagend. eg. <LOCATION>Hollywood</LOCATION> data non-tag information eg. "Hollywood" in the above example attribute - sub field/data within a tag eg. <LOCATION ID="996" PLACE="Hollywood"> NITF News Industry Text Format as put together by IPTC and NAA. XML,HTML Much simplified sub-set of SGML for WWW. - see www.w3.org JsonTag "tagname":"datavalue" - see www.json.org It scans its input directory and each file is processed according to a parameter file specified either as the default or as the DY: FipHdr field. Two types of processing are possible - strip or modify tag, attributes and/or data - extract data or attribute-data and stuff in a FipHdr field which can then be used to replace the top of the file or used by a subsequent program. There is also a question of where to send the output file as this, by default, is put in spool/2go for IPWHEEL to distribute. So it needs a Destination(s) or DU FipHdr field. This is added by either : - It there is a DX FipHdr field in the input file, that is used. - If not, the keyword 'dest' is used in the parameter file. - If that is not specified either, it is sent to 'woops' the Intercept queue. - You may also specify it from the incoming data or attribute-data using the 'fiphdr' keyword. In this case the contents of DX, 'dest' or 'woops' will be the default if there is no data. IPXML may be used to convert XML tables to plain formatted text or in-line markup such as Quark. The parameter file in tables/sgml defaults to SGML and has the keywords: tag:(sgml tag name) (optional subkeywords) Process a Start or End tag as follows : start:(FipSeq) optional string to replace the tag end:(FipSeq) optional string to replace the end tag ie. </location> strip:(tag|attribute|zap|everything|data|end|none) optional strip all or part of the tag and its associated data tag All information between '<' and '>' is ignored. This will also zap the end tag if there is one. attribute all attributes are ignored; tag and data preserved. zap All information - tag, attrib and data is zapped to the next tag. everything Same as 'zap' but lower tags are always zapped too. data All data for this tag is ignored; tag and attrib preserved end Zap everthing, including all other tags until and including the end tag : </NAME> unless any other tags are specified as NOT being stripped. none Preserve everything (default) keepattribute: (optional FipSeq) Used during strip to keep all the attribute data. Any data after the keyword is added before and after the attribute : tag:ds start:** end:-- strip:tag keepattribute:= <ds num="1.5" ver="orig">oinky</ds> gives **=1.5==orig=oinky-- As the optional data is will checked against the mapping tables please make sure they are what you want them to be. endkeepattribute: (optional FipSeq) Same as KeepAttribute: (above) except the data is ONLY added after the attribute and before data. att: (attribute name) used with keepattribute: - use when only one attribute is required tag:content strip:tag att:content-role start:[fip- keepattribute:| endkeepattribute:-fip] <content content-ref="c00000002" content-role="urn:x-hoho:content-role:INTRO" auto-generated="false"> generates : [fip-urn:x-hoho:content-role:INTRO-fip] which can then be mangled by ipxchg or other at a later stage upper: force the field uppercase lower: force the field lowercase Note that these two conversions only change data up to the next tag or end tag (ignoring <P>) which may not be the end of this tag. list-fiphdr:P3 If converting OrderedLists <ol> or unordereds <ul>, this is the FipHdr field containing the item number. tag:ul strip:tag start:<FipUL> list-fiphdr:P6 tag:ol strip:tag start:<FipOL> list-fiphdr:P6 tag:li strip:tag start:"\n \P6" The actual string used in the Unordered list can be changed from a '*' using the parameter 'unordered-list-chr:+' fiphdridx: use a link-Fiphdr (see below) to extract some FipHdr data referenced by tag:A strip:tag end:(\R7) fiphdridx:a@href=R7 Note when specifying the tag, do NOT specify either the presy/endy ie the '<' or '>'. eg tag:location start:[ModeBold] end:[ql]\n strip:tag There is a special case for a comment <!-- This is a comment -->, where the 'end' subkeyword specifies the end of the comment. fiphdr:(2-letter code) (optional subkeywords) Either tagdata:(name of tag) specify the tag name which contains the data required. Or tagattrib:(name of tag),(name of attribute) Or tagattribute:(name of tag),(name of attribute) specify the tag name and the attribute name which contains the data required. Or data: (FipSeq) general data to add to a FipHdr field. NOTE you MUST dblqte top and tail any data field that includes spaces : fiphdr:AB data:"aaa bbb ccc" Or text: Stuff the first part of text into this hdr field This searches for the <TEXT> tag. If not found, the top of data is used. default length is 100 chrs unless you change with a 'max:1024' (see below) For any of the fiphdr-tag* options, subkeywords are 'dup', 'max', 'upper', 'lower' continue: allow this fiphdr to continue and include lower level tags dup:(optional separator) Flag that this field may be duplicated. Duplicate fields are separated with a space unless a separator chr is also specified. For 'dup' to work correctly, each tag or attribute to be accessed is stuffed into one fiphdr line only. Each occurance of the duplicated tag MUST follow sequentially with no other tags interceeding incdup: A second method of handling duplicate tags or tag/attributes is to create a new FipHdr field by incrementing the second letter of the FipHdr name eg fiphdr:J6 tag:DEST incdup: the first FipHdr will be 'J6' the second 'J7' the third 'J8' etc So the idea is to start with 'J0' (zero) if under 10 duplicates are possible or 'JA' if 26. maxdup: (max number of duplicates allowed for this field) default: no limit for 'dup', 26 for 'incdup' Use this to limit the number of entries in a duplicated field. max: (max number of chrs in this FipHdr field) limit the size of the data to a fixed amount max:25 Note there is no default except the absolute maximum is 1023 mandatory: This FipHdr MUST be added - so if it is NOT found in the data, a FipHdr will be added for the last upper: force the field uppercase lower: force the field lowercase Normally these take the concept of lower and uppercase chrs from the LOCALE of the system you are running on. These can be supplemented by the 'locale' and 'extralocale:' keywords below. key: and key2: Some XML variants reuse structures and it is the contents of an attribute which describes what the data really is. In NewsML for example there can be multiple TopicSets with the attribute 'Scheme' on the 'FormalName' tag which varies. Use 'key' to define which one. eg fiphdr:PP tag:FormalName dup: key:TopicSet/Topic/FormalName/Scheme="Internal MetaCodes" See below for more comments for use with multiple structures you MUST specify at least the tag and attribute in the key. There can be up to two 'key's for each 'fiphdr' - see below for an example using 2 keys are necessary for NewsML Topics. index: (Tag@attribute) Create an internal FipHdr for use with this index for outputting with tag/fiphdridx above fiphdr:R7 tag:FormalName dup: key:FormalName@Scheme="Ticker" index:Topic@Duid For fiphdr/tagdata there is an additional keyword of 'attribute-is-data:'. This forces any information in attributes in any lower tags to be treated as data. As some FipHdr fields have distinct meanings - SN, DU, DP etc - please use 2 letter codes starting N or Q. eg fiphdr:NA tagdata:itemid dup:+ get the data from each <ITEMID> field. If there is more than one, they are separated by a '+'. general examples fiphdr:PN data:\SN max:6 fiphdr:HT data:"This is the old HS =\HS=" fiphdr:DI tagdata:brodtext max:200 Other keywords : data-format:JSON / XML default is XML start-text-tag: (tag) Tag signifying the begining of text data for 1st line (etc) of text (\$1, \$t etc) The default is 'TEXT' but is often defined as 'BODY' : start-text-tag:BODY or for NITF, the body.content tag start-text-tag:body.content pinhdr: pindata:The <P> Paragraph tag is handled separately from other tags as it often 'neutral' and should not alter the current processing. Use these two keywords to define what to do with the start and end 'P' in either a FipHdr field or in the data part: pinhdr: start:~ end:\s pindata: start:\n end:\n 'start:' being the string output in place of a <P> 'end:' being the string output in place of </P> Note that CR NL etc are not valid characters in the FIpHdr - if you do need them use another unique chr and use 'ipxchg' to convert at a later stage. Defaults for pinhdr: start:\s end:\s Defaults for pindata: start:\n end:\n dest: (one or more Fip Destinations separated by space or '+') This can be overridden by the DX: FipHdr field. Note that all destinations MUST be in the tables/sys/USERS file. As per normal case is important, so ZAPME and zapme are 2 different destinations. eg. dest:logcopy+outsgml. stripfiphdr: do NOT copy the existing FipHdr of the input file onto the output. Normally the existing FipHdr is preserved in the output file. nofiphdr: do NOT add a FipHdr to the output file. Any new FipHdr keywords are added without the tilde NL top and bottom. zapfiphdrfields: (List of FipHdr fields to zap) Delete all occurances of the FipHdr fields specified. This is ONLY valid where the FipHdr from the input file is retained for the output. In this case it is normal to zap : zapfiphdrfields:XZ,XS,CX,DC,SZ,CQ,CP,XP addhdr-file: (fullpath/filename in FipSeq) default: none Extra, optional FipHdr information held in an external file addhdr-script: (script in FipSeq) default: none Extra, optional FipHdr information generated by an external program or script addhdr-script:/fip/local/find_iim.pl \EP/\EN > \E3 Temporarily, 3 FipHdr fields are available for the script : \EP holds the input folder \EN holds the input filename \E3 hold the name of a TMP file to create that will be read for the list. extra-fiphdr: (FipSeq) default: none Extra, optional FipHdr information - note this overrides the -h switch use-sx: or use-external-file: if there is an SX FipHdr field with a path to the data file, use that rather than the data in the input file. filename: (FipSeq) New filename for the output file name. supercede: or overwrite: Where 'filename' has been specified, if there is already a file with that name in the output queue, it is deleted first. script: (path and name) Script to run AFTER processing. The output filename and path is added to the script before running. Care must be taken NOT to run a script on a file that normally is written to a spooled queue. For example, the default output queue is 'spool/2go' where program 'ipwheel' may have already processed the file (and possibly deleted it) before the script has had time to function. So it is normal to specify a holding queue, not used by any other program as 'outque:' The script must therefore delete the file after use OR delete them all in the nightly maintenance - 'zapfiplog' Note also that script called only once at the end of the file. Use split-script: to run on each split (if using splits). outque: Output Queue for the output file. This can be a FipHdr or FipSeq - which gets resolved at output (so can be conditional on metadata which has been sourced from the current file) This default to the '-o' input switch which defaults to spool/2go. If the first chr is NOT a '/', it is assumed under spool. The default is outque is used in preference to -o, UNLESS the -V switch is on were -o is used over outque. doneque: Done Queue for the raw input file. This default to the '-d' input switch which has no default. If the first chr is NOT a '/', it is assumed under spool. This can be in FipSeq - which gets resolved at output (so can be conditional on metadata which has been sourced from the current file) before: (FipSeq) String to parse and add at the top of the file. after: (FipSeq) String to parse and add at the end of the file. beffile: (Path/filename) Contents of a file in FipSeq to parse and add at the top of the file (after 'before') aftfile: (Path/filename) Contents of a file in FipSeq to parse and add at the bottom of the file (before 'after') number:octal|dec|hex In FipSeq, make all escaped numbers Octal, Dec or Hex. default is octal log: Custom log line for the Fip Item log in FipSeq default is name of the parameter file (DF) and filename (SN) archive: (Archive Name) Archive all incoming raw data using this parameter file. The 'archive Name' can be FipSeq. This adds the file to the normal Fip archives in /fip/log/data It should be purged using 'ipmaint'. eg archive: \SU or combie:QS SU|NS,rawdata archive:\QS ie Use the contents of FipHdr SU, if not there, NS, if not there just use the word 'rawdata'. striptags: Strip all tags EXCEPT those specifically stated using the 'tag' keyword. default-strip: (tag|attribute|zap|everything|data|end|none) default strip all or part of the tag and its associated data (see strip: above for descriptions) ignore-non-json-data: ignore-non-xml-data: If there is any text or data BEFORE the start of the XML document or any after the end of the last End Tag, it is stripped. For JSON this is all data before the first '{' or '[' and after the last (matching) '}' or ']'. Normally it is preserved and output. locale:(valid locale) Change the locale from the System Locale to this The locale MUST be valid ! locale:dk extralocale: (2chr combinations) For changing uppercase to lower and vice versa, we can add to the normal locale by specifying a series of 2 letters which the lower then the upper. The lowercase chr is 1st then the upper, then a separator or space. eg extralocale:aA,bB,cC,dD,\212\232,\213\237 Normal a-z/A-Z are by default : in the example above they are included to give an idea of syntax chr:(octal/dec/hex number):(FipSeq string) hdrchr:(octal/dec/hex number):(FipSeq string) txtchr:(octal/dec/hex number):(FipSeq string) Replace this character with the string - usually an Sgml escaped chr. USE THIS TO REPLACE SINGLE CHRS WITH SGML CHRS (ie opposite of 'sgmlchr:' below). This can be a printable chr or an escaped number. The number is octal/dec/hex depending on the preceding 'number' keyword (if any). eg chr:\313:£ chr:<:< hdrchr:^:* Note that the ';' is part of the string and NOT a comment as it does NOT start the line. hdrchr works on new FipHdr fields only. txtchr works on data and when data is taken from a FipHdr field and added to the data part of a tag. chr works on both data and new FipHdr fields. NOTE from 19g51, the hash '#' is ONLY converted in NEW FipHdr fields - NOT in the original FipHdr if it exists. eoln: Convert Line Ends (ie CR and/or NLs) from the outbound feed. SGML should be terminated CR NL : eoln:\r\n for HTML (default) the EndOfLine is NL only : eoln:\n for NO eoln, specify NO subparameter : eoln: The subparameter can be any valid FipSeq. (SGML uses the term 'RE' (record end) for Carriage Return CR and 'RB' for LineFeed NL meaning record begin.) Note that, unless using the 'preserve-multiple-eolns', you should map eoln to something unique like eoln:<mypara> as normally CR NLs are reduced to a single End Of Line. preserve-multiple-eolns: Normally multiple end-of-lines are stripped as they are meaningless in the XML world. Use this to preserve them! preserve-top-spaces: Do NOT strip all spaces and blank lines at the top of the output file. preserve-padding-spaces: Do NOT strip all spaces and blank lines at the beginning of each tag. strip-multiple-spaces: Strip all multiple spaces and blank lines inside each tag. allow-presy-in-tag: In XML/HTML etc, reserved chrs like '<' or '>' cannot appear inside the attribute data of a tag - they must be encoded like < etc. Use this where there might be some non-comforming stuff. However the drawback here is that they MUST be inside dbl qtes ie <meta ds="helle<p>ooo" convert-CDATA-sections: convert-CDATA-sections:no - no dont ! (default) convert-CDATA-sections:yes - yes pls and zap the '<!CDATA[' and ']]>' convert-CDATA-sections:preserve - yes pls and leave the '<!CDATA[' and ']]>' Normally a CDATA section like : <![CDATA[ Vongerful Vondafool C&oe;penh&areing;gen <99thisIsAnon-compliant XMLtag> ]]> is considered a single, raw string of XML/SGML data. And all the tags and entities (like <) are not changed either. Use this parameter to convert them. Note that you should use this option CAREFULLY if any tag in the CDATA is the same as a tag in the main envelope. See below for more comments. sgmlhdrchr: (FipSeq string) : (FipSeq Chr or String) sgmltxtchr: (FipSeq string) : (FipSeq Chr or String) sgmlchr: (FipSeq string) : (FipSeq Chr or String) Translate Sgml escaped chr back into a single chr or a string. USE THIS TO REPLACE SGML CHRS WITH A CHR OR A STRING (ie opposite of 'chr:' above) Sgml escaped chrs always start with a '&' and end with a ';' : ">", "©right;" Note that case of both parameters IS important - These two are different : sgmlchr:Oring:<CapOring> sgmlchr:oring:<smallOring> This will take &XXXX; and translate it. eg. sgmlchr:lt:< sgmlchr:oumlaut:\202 sgmlchr:Utilde:{tildeU} sgmlhdrchr works on new FipHdr fields only. sgmltxtchr works on data and when data is taken from a FipHdr field and added to the data part of a tag. sgmlchr works on both data and new FipHdr fields. NOTE that if the input is any NITF, XML or HTML feed and the output is just plain text, then you almost always need : sgmlchr:lt:< sgmlchr:gt:> sgmlchr:amp:& sgmlchr:apos:" BUT you will want to preserve them /leave them alone if the output is the same or another NITF, XML or HTML flavour. unicodelist: (dec or hex number) : (list of single FipSeq Chrs) Starting at the number, fill in the map of SINGLE character replacements in sequential order For any map which is MORE than a single chr, use a '*' (or the value of the convert-unmatched-unicodes: parameter) and then use unicodechr: further down the parameter file. eg : ; Map Unicode Latin2s chrs to plain Ascii ... use a star for unmatched (or will match later) unicodelist:x100:AaAaAaCcCcCcCcDd unicodelist:x110:DdEeEeEeEeEeGgGg unicodelist:x120:GgGgHhHhIiIiIiIi unicodelist:x130:Ii**JjKkkLlLlLlL unicodelist:x140:lLlNnNnNnnNnOoOo unicodelist:x150:Oo**RrRrRrSsSsSs unicodelist:x160:SsTtTtTtUuUuUuUu unicodelist:x170:UuUuWwYyYZzZzZzf ; NOTE hex 132, 133, 152 and 153 are mapped to '*' as they need more than a single chr ;; .. so then we replace them properly unicodechr:x132:IJ unicodechr:x133:ij unicodechr:x152:OE unicodechr:x153:oe unicodechr: (dec or hex number ) : (FipSeq Chr or String) For all unicode chrs which are >= 256 (xA0), you can specify a map to a single chr or a string. The chr can also be specified as hex with a preceeding 'x' Commonly used ones are : ; trademark unicodechr:x2122:(tm) unicodechr:8194:\s unicodechr:8195:\s unicodechr:8201:\s unicodechr:8211:- unicodechr:8212:_ unicodechr:8216:' unicodechr:8217:' unicodechr:8220:" unicodechr:8221:" unicodechr:8249:<< unicodechr:8250:>> ; euro in a table unicodechr:8364:EUR ; fractions 1/3 .. 1/5 .. 1/6 .. 1/8 ... 7/8 unicodechr:x2153:\s1/3\s unicodechr:x2154:\s2/3\s unicodechr:x2155:\s1/5\s unicodechr:x2156:\s2/5\s unicodechr:x2157:\s3/5\s unicodechr:x2158:\s4/5\s unicodechr:x2159:\s1/6\s unicodechr:x215A:\s5/6\s unicodechr:x215B:\s1/8\s unicodechr:x215C:\s3/8\s unicodechr:x215D:\s5/8\s unicodechr:x215E:\s7/8\s ; ByteOrder ?? x.feff d.65279 o.177377 unicodechr:65279:\s convert-unmatched-unicodes: (FipSeq Chr) Single chr to represent a unicode chr which is NOT latin1 and NOT matched in 'unicodechr' default: '*' Normally these will be mapped to '*'. To pass-thru all unmatcheds, use : convert-unmatched-unicodes:passthru hdr-strip-between: start:(FipSeq Chr) end: (FipSeq Chr) Where the 1st 9 lines of text are used in FipSeq using \$1 etc, use this to replace any tags with a space. Normally the following would be used : hdr-strip-between: start:< end:> But if you have mapped the start/end tags to other chrs in 'ipxchg' (possibly to control the tags and replace later with 'txtchr') eg ; for lines used in FipSeq - like 'before'and 'after' hdr-strip-between: start:\201 end:\202 ; for text lines - Convert back from 201 202 <> txtchr:\201:< txtchr:\202:> sgmlchr-file:(filename) Use this to pull in a standard XML Entity file such as found at http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent See also the note on utf-8 below. Each line has an entry of : <!ENTITY hearts "♥"> <!-- black heart suit = valentine, U+2665 ISOpub --> convert-all-other-entities: This flag will automatically convert all entities NOT covered by chr/hdrchr/txtchr. If the entity is a number ˆ, it is converted to one or more bytes If the entity is a name €, it is converted to '(euro)' raw-data-type:ascii/utf8/utf16 default: ascii for 8bit chrs case-sensitive-tags:YES/NO In SGML and variants - HTML, early variants of NITF - the tag names are case INsensitive - ie <BODY> is the same as <Body> == <body> Ignoring case is the default for 'ipxml' BUT XML tags nowadays are case-SENSITIVE. So if you need to Our general view is that no sane person would run tags with the same name but with diferent case - but then we are not the experts ! Use 'case-sensitive-tags:yes' to turn this ON. ** This must be specified at the TOP of the parameter file BEFORE any fiphdr: or tag: !! ** This must be specified at the TOP of the parameter file BEFORE any fiphdr: or tag: !! unordered-list-chr: (chr or string) This changes the actual string used in the Unordered list. Default is "*". replace-fiphdr-tilde: (FipSeq chr) If a Tilde is found in a fiphdr field, replace it with this chr - default 0376 replace-fiphdr-eoln: (FipSeq chr) If an end-of-line (<p>, <br> CR, NL or CRNL) is found in a fiphdr field, replace with this chr - default is SPC alt-param-file:(text) (param file name) alt-param-file:<AlertML alertml.fip add-EQ: add the input folder as an extra FipHdr field EQ cont-chr: (FipSeq chr) default: 021 (DC1) Single chr to be used internally for flagging Continuation FipHdrs (ie for fiphdr:AB tag:hoho continue:) Use this if a 021 (hex 11) chr is valid data. cont-zap-chr: (FipSeq chr) default: 022 (DC2) Single chr to be used internally for flagging Continuation FipHdrs (ie for fiphdr:AB tag:hoho continue:) Use this if a 022 (hex 12) chr is valid data. max-total-fiphdr-size: (total) default: 32k chrs Max size of all FipHdr fields max-single-fiphdr-size: (total) default: 4000 chrs Max size of a single FipHdr field This overrides the -F input switch wrap-lines: (no of chrs) : (fipSeq) default: no wrapping Wrap text lines (but NOT plain text tables if processing tables) to this line length and insert the string ; wrap NONtabular stuff at 80 wrap-lines:80:<fipWRAP>\n ignore-xml-in-wrap:no or (number) default: no ignore any xml in the calculations for the linelength the number can be the amount to add for each xml tag - generally 0 ; dont add anything for XML ignore-xml-in-wrap:0 abstract-size: (number) abstract-fiphdr: (2 letter FipHdr code) abstract-msg: (message in FipSeq) stop-after-abstract:yes/no Create an abstract/first part of text when the derived data is equal or exceeds the abstract-size. If a file is smaller than the size, only a single, complete file is output Default - no abstract at all. The fiphdr is used to flag if the file is the Abstract or the Main Stop flag can be used to Not continue with the Main file (default is both files) The optional msg is inserted at the bottom of the text of an abstracted file. abstract-msg:\n\n***Abstract finishes, pls view original for remaining text***\n never-log: Never log files (overrides the -l/-L switches) always-log: Always log files (overrides the -l/-L switches) log-level: (number) 10 - default 20 - log all tables Input parameters (all optional) are : Either -1 : filename for a single shot default: spooled often this flag is used with -S (newname) to create a file called (newname) in spool/formsave for the DataFormatting module or -i : spooled input queue to scan default: spool/2sgml or -I : scan input queue and default: spooled stop after the last file has been processed. -o : output queue default: spool/2go -d : done queue for original raw data default: none-input deleted -D : display tags default: no use this ONLY when running '-1' single shot to display all tags, attributes and levels and their data. ie use to debug/tune. -F : default max size of a single FipHdr default: 4000 -h : optional extra FipHdr string to add default: none -JSON : data is in json default: xml unless overridden in the parameter file -l : log every new file pls default: do NOT log -L : log every new file pls with times default: do NOT log -Q : quiet flag - do NOT flag minor errors default: do -S : save this file in the save area default: spooled output with the following name eg : -S "#SN:\XK#PP:\PP" use this for DataFormats (same switch as ipformat) -t : scan time for the directory default: 2 secs -T : different folder under /fip/tables default: sgml This should only be used when upgrading and you need to run 2 ipsgmls -V : use the content of the -o input switch for the outque default: use outque if it exists -w : file wait time for files arriving default: none across a network (for NFS, make about 10 secs) -W : allow DataFormat output filenames default: no -XML : data is in xml default: xml unless overridden in the parameter file -z : name of the default parameter file default: tables/sgml/SGML -Z : force the parameter file to this default: DY or -z name -v : print version no and exit **************** Notes *********************** **** For Debugging, you can manually run the program with the -1 Single shot switch with the -D to display all the tags in an input file CMD>ipsgml -1testfile -D -zNewsML.fip -otestfolder | more This will create a file in /fip/spool/testfolder **** Rarely will you want 'sgmlchr:' and 'chr:' in the same parameter file - chr converts single chrs to sgml chrs and sgmlchr converts them back ! **** 'sgmlchrs' are done first BEFORE 'chrs' Then any Upper/Lower case conversion **** So if you have a 'before', 'after' string (or files) withe embedded SGML tags BUT still need to catch chrs '<' and '>' : 1. in the 'before' string, chg all < to { and > to } eg before:{!DOCTYPE abc.dtd}\n 2. change < and > using txtchr eg txtchr:<:< txtchr:>:> 3. change { and } using txtchr eg txtchr:{:< txtchr:}:> **** Extra FipHdr fields are available to use : Z1 is the size in bytes of the data part of the document (ignoring before, after, beffile and aftfile). Z2 is the size in bytes of the data of the document ie ignoring tag (ignoring before, after, beffile and aftfile). if you are using Z1 and Z2 already, populate 2 other fields by : newZ1: 2 letter code replacing Z1 eg newZ1:VT will put the sizes in FipHdr fields VT and VU. **** Extra System Variables are : \$1 first line of text ... \$9 ninth line of text **** NULs (characters of binary zero) are stripped from the output file. So a parameter like the following will have no effect at all ! tag:ds start:\000 **** Current Limitations are : No more than 2 million tags may be specified. **** If there is NO FipHdr or the SN field (which should be the name of the file) is missing, the original filename is used as the SN. Any hashes ('#') in this created SN field are changed to hex.9d/oct.235/dec.157 **** Program change - from version 14+, please use 'preserve-multiple-eolns' to keep ALL the end of lines of non-xml data. **** CDATA fields Note that an XML CDATA field is specified as tag named '![CDATA' - ie without the trailing '['. **** Splitting flies For SGML/XML files that contain multiple 'things', there is a means of splitting these either into discrete files or into a single file with a Splitter string/tag and FipHdr pertaining to just that file. Eg You might need to split off each ARTICLE from the following structure BUT still retaining the Page info <PAGE> some relevant page info <ARTICLE> some relevant article info </ARTICLE> <ARTICLE> second relevant article info </ARTICLE> </PAGE> Where a single output file with one of many 'splits' is required, use the following parameters : split-on-tag: (tag) split-on-endtag: (tag) split-on-tagattribute: (tag),(attribute) Create a split on this tag or tag/attribute The split is put BEFORE the start or AFTER the end tag depending on the option chosen. split-on-level: (number) While you can NOT specify trees for 'split-on-tag' (or tagatt), you may specify the level at which the split MUST tale place. So that if you have multiple levels of embedded tags - like NewsMl NewsComponents for example, use this to decide which level. eg : If you have NewsML/NewsEnvelope/NewsComponent/NewsComponent use split-on-level:4 to split ONLY on the 4th level, not the 3rd. use the -D input switch to show levels for a single file. This parameter has nothing to do with cooking. stop-on-tag: (tag) stop-on-endtag: (tag) stop-on-tagattribute: (tag),(attribute) stop-on-level: (number) ditto - but stop processing splitter-string: (FipSeq) This is placed in the data to signal the start of a new bit; FipHdr follows. splitter-string:********** BRS DOCUMENT START ************* Where a single output file is required, this is placed in the data to signal the start of a new bit; FipHdr follows. default is "\n<FIP-SPLIT>" new-file-on-split: (FipHdrField for Seqno) Instead of putting all the splits in one file with a <FIP-SPLIT> between this option creates a completely new file. The FipHdr specified will contain the sequence number of this file from 1. new-file-on-split:NZ split-total-fiphdr: (FipHdrField for blank or the total number of split) The last file split will have the total number of splits in this FipHdr field; all others will have it blank eg if there are 28 files split from a single input and 'split-total-fiphdr:AB' is specified .. the first 27 will have 'AB:' and the last, 28th 'AB:28' split-on-no-data: Normally only if the previous element had data will it be ended and the next file started. Use this flag to force a split EVERY time the split criteria is met, ignoring if there was any data. split-script: (path and name) Script to run AFTER processing this file - splits and checking for missing items Sometimes feeds cannot count. A single file with 25 items might number then sequentially - but skip a couple which can seriously blow any downstream tracking (fip or external) which is checking the Item Number before releasing all ipsgml can use a template to insert dummy items in place. split-missing-dataFH: 2 letter FipHdr field split-missing-template: (name of template file - contents will be FipSeq) split-missing-output: full path and filename in FipSeq split-missing-extra: And extra FipHdr metadata to be added to the item split-missing-log: log string eg ; compare the Fip split seqno IS, to the total no of items according to SEC, AX (not BQ which is the SEQUENCE) split-missing-dataFH:AX ; if there is a missing, fill this template ; ...put the filler seqno in IS split-missing-template:/fip/tables/edsys/SEC_EDGAR_FILLER.template split-missing-extra:#XX:addextra# ; .. move to here split-missing-output:/fip/spool/2tracker\V0_\F0\$o/SECfiller.su.\SU.an.\AN.seq.\ IS.max.\AX.cu.\CU.\$h\$n\$b.\$z.\$v.fip ; extra logging of split split-missing-log:maxseq.bq.\BQ type.\QY date.\QD feed.\F0 id.\AN .. table-width-fiphdr: (FipHdr field) This FipHdr will contain the maximum width of the table. eg table-width-fiphdr:AB table-width-minimum: (width) If 'table-width-fiphdr' is specified, make it a minimum of this. def. none strip-trailing-table-spaces:no/yes If there are any spaces atthe end of a table row, delete them (default) NOTE that if you are running splits, then you PROBABLY want to keep the FipHdr. This is because there is often a chunk of metadata BEFORE the split which needs to be saved for EACH split - and it has probably been stuffed in the FipHdr. **** Multiple specified Structures NewsML TopicSets and other multiple specified structures Considering a structure like : <TopicSet FormalName="Companies"> <Topic Duid="T00001"> <TopicType FormalName="Company"/> <FormalName Scheme="Listed Companies">PNOK.L</FormalName> <FormalName Scheme="Nasdaq codes">PNOOK</FormalName> <Description>Pocket Nook Corp</Description> </Topic> <Topic Duid="T00002"> <TopicType FormalName="Company"/> <FormalName Scheme="Listed Companies">FIP.L</FormalName> <FormalName Scheme="Nasdaq codes">DRIVL</FormalName> <Description>Mega Fip Corp</Description> </Topic> </TopicSet> ; get the Listed Coys and use '+' as a separator fiphdr:YC tag:TopicSet/Topic/FormalName dup:+ key:TopicSet/Topic/FormalName/Scheme="Listed Companies" ; get the Nasdaq codes and use '*' as a separator fiphdr:YN tag:TopicSet/Topic/FormalName dup:* key:TopicSet/Topic/FormalName/Scheme="Nasdaq codes" ; use U1, U2 etc as holders of the descriptions fiphdr:U1 tag:TopicSet/Topic/Description incdup: would give new FipHdr fields of YC:PNOK.L+FIP.L YN:PNOOK*DRIVL U1:Pocket Nook Corp U2:Mega Fip Corp **** Interpreting Tables IPXML may be used to convert XML tables to plain formatted text or in-line markup such as Quark. The two main, and exclusive, uses are 1. format table rows into plain text rows where the columns line up. 2. add inline markup dependent on the table and the row. This inline markup can be anything - Quark Tags, CCI, Atex, MediaSystem Justif, InDesign etc. A note of caution - IPXML will format tables (and tables within tables) with up to 108 (was 62 until version 19g3) rows each. Any more - use the data formatting package. For Lining-up-columns, it spaces out all the columns to the maximum in the table. If there is an 'align' attribute, then the data is aligned according to that. Otherwise the first column is flush LEFT and the rest flush RIGHT. This can be overwridden by default-tab-align:left/right/center How does it work ? Data for each row is held as FipHdr fields (usually UA-UZ then U0-9 then VA-VZ). At the end of the row, it is output as a row using a FipSeq line which defaults to : (spc) \UA (spc) (spc) \UB (spc) (spc) ..... \r\n for the number of columns in that table. This output can be replaced by using either the 'default-class' parameter or the 'class' attribute on a 'TABLE' tag. So if there is a <TABLE class="soccer-score">, then a file in tables/sgml/class/SOCCER-SCORE should contain one or more of the following keywords : table-start:[font=HelveticaBold][pointsize=16]SOCCER SCORE[quad]\n table-end:[quad]Data Supplied by Fippies.[quad]\n table-row:[font=Helvetica][tab][bold]\UA[roman][tab]\UB[tab]\UD[quad]\n The table-start is produced BEFORE the table, the table-end after, which each row has the table-row applied. Note that in the above example we missed out the third field \UC - there is noting to stop you rearranging the fields and NOT specifiying the data. Also you may use the lovely FipSeq 'partial', 'combie', 'unique' etc to play aroungd with the data. If you do NOT specify a complete output line with table-row (or thead-row), there are parameters for adjusting the look : column-gap: (FipSeq string) row-start: (FipSeq string) horiz-rule: (FipSeq chr) These allow you to specify the actual chrs that will start a table data line and the gap between each column and the character or string to use if an <HR> occurs in the table. eg Start each line with with a (hyphen) (space) and the gap is 4 spaces and horiz rules are multiple '+'. column-gap:\s\s\s\s row-start:-\s horiz-rule:+ Keywords in the main parameter file format-tables: This is necessary to flag that the tables need formatting. default-class:(default-class) name of a file in tables/sgml/class holding Styles for outputting each row. line-up-columns: This flags that the data will be space padded to line-up the columns. column-gap: (FipSeq string) default is 2 spaces row-start: (FipSeq string) default is 1 space row-end: (FipSeq string) default is NL horiz-rule: (FipSeq Chr) default is '-' bullet: (FipSeq Chr) default is '*' newUA: 2 letter code replacing UA as the first column of a row. Both must be a letter and the first cannot be 'Z'. The second will always be 'A'. fiphdr-for-table: (FipSeq string) default: none Extra FipHdr to add if there is a table in the data. split-tables-and-text: (FipHdr) Add Marker in text Or create NEW file on tables/text transition. split-tables-into-files: use this to split the incoming file into discrete files for tables and non-tables The default is NO to add the <FipSplitTables> string For files - A new file is created on start and end of table and the FipHdr is used to hold the Sequence number of this take. fiphdr-for-text: (FipSeq string) default: none Extra FipHdr to add if this subfile is a text element. This is ONLY if the 'split-tables-and-text' is specified. use-pi-widths:yes pi-colwidths:IDNtableColWidth This expects a PI tag with colwidths eg <?FingerPost IDNtableColWidth="10 10 10 20" ?> Default no wrap-table-cells:no/yes/(number) This has 2 purposes - - with a number (40 or over) : optimum col width of a table (ie dont squeeze too much !) - or Make this NO to automatically calculate the max width of each column and space out accordingly. Default is YES. max-col-width: (number) Force the colwidth to be a max of this number. Default: no max. If there are 3 numbers, the 1st is the max width of the 1st col and the 2nd is max width of the 2nd and the 3rd is the max width of all subsequent cols ie make the first col a max of 30 chrs and all others 20 max-col-width:30,20 interpret-style:idx/bizwir/html Interpret some css attributes if there are any (currently just alignment) Parameter 'idx' states they are HTML Tidy styles which are numeric from 1-n Parameter 'html' looks for ordinary html 'left', 'right', 'center' (note - both 'class=' and 'style=' are checked) <td class="bold32 spc33 right"> Parameter 'bizwir' looks for BusinessWire classes 'bwtextalign...' <td class="bwcellpaddingleft0 bwverticalalignbottom bwtextalignleft bwsinglebottomborder"> or <td class="bwpadl0 bwnowrap bwpadr0 bwvertalignb bwalignl bwsinglebottom"> max-cols-per-row: (number up to and including 108) default is a max of 108 columns per row (was 62 until version 19g3) Use this to allow up to 108 The data in columns that exceed this is ignored default-tab-align:left/right/center In the CLASS file table-start: (FipSeq) table-end: (FipSeq) table-row: (FipSeq) also same three for THEAD, TBODY and TFOOT. Eg: thead-start: (FipSeq) tbody-row: (FipSeq) ---------------------------------------------------------------------------- Version Control ;19g64 04jul11 redid endtags ;3 bugette with > 26 columns ;4 -V added ;5 woops endtags/tables ;6 added filter ;7-8 valgrind cleanups ;9 20feb12 added a 2nd column for max-col-width (ie 2->3) and cater for UTF8 chrs in a table ;10-15 6mar12 strip 'formsave/' off the front of -S is default output and added -W ;16-19 1oct12 outque is now FipSeq and upto 50 speedy outques ;18 bugette in trimming v.large FipHdrs ;20 15feb13 bugette - endtags and strip:everything ignored. ;21 4mar13 added unicodelist ;22 22apr13 bug if rowspan in last table ;23-27 26apr13 bugette with EndTags and trees ;28 29apr14 added file-trace ;29 23mar17 added no-log/always-log ; 30 minor ;31-34 4sep17 added data-format:JSON ;35 allow optional doneque ;36 allow class OR style for <td> alignment ;37-38 16dec17 reset pi-widths between tables (only allowed for a single table) ;39 cleanup ;40 json issuette in commonxml ;41-42 27sep18 allow fiphdr/key to be tagData not just tagAtt ;43 cleanups for speedy ;44 9feb19 added default-tab-align ;45-46 28mar19 bugette combination of split and output-raw-data ;47 12jul19 outque parsed ;48-49 8nov19 better handling of Json [] ;49 minor ;50-51 18mar20 hdrchr - if HASH/#, the original FipHdr is NOT changed - only any new fields ;52-53 9jul20 added -JSON and -XML as input variables ;54-55 minorbugette ;56 maxFipHdr->limit from 4000 ;57 11oct22 added split-total-fiphdr:AB ;58 4nov22 added split-on-level and output-raw-tag for Json and zap the last split file if data=0 ;59-60 28dec22 added fiphdr:.. max-dup as trim + issuette output-raw-data as last tag + splitSeqno start 1 ;61-62 17jul23 added split-missings ;63 8nov23 bugette issuette output-raw-data as last tag ;64 23apr24 default-strip tuning ;019f41 17may06 bugette in EndTags when strip:none ;a-c 21sep06 added new Xmlinternals TagSpecial (b nasty bug in 19a) (c CDATA quirk) ;d 14aug07 tweak to trees ;e1-22 20sep07 more on plain text tables - added Rowspan properly (;19 added addhdr-script ;20 added -T ; 21 bugette DX and 'dest' were swopped) ;23 2apr08 bugged in wrap ;24 9may08 added log-split ;25-26WINNT + key bugettes ;e27-35 23jun08 for strip:everthing and end tags and tables with no rows (35 utf8 bugette) ;e36-37 22oct08 added -F and max-single-fiphdr-size ;e38-40 27oct08 Bizwir - sup and inf added (plus start table strip bugette) ;e41 15dec08 added split-on-endtag: ; 42 internal-tuning wrap buffersize ;f1-3 01feb09 made Tag structure variable to cope with files > 2million tags ;f5 27feb09 added stop-on-tag/att/endtag stop-on-level ;f6-14 20mar09 added abstract-fiphdr and abstract-size ;9 bugette ;15 maxStyles->3000 ;16 rework levels in common ;17-24 bugette for FipHdrs > 64k (commonxml too) ;19 21oct09 minor check on abstract ;23-24 redid styles ;26-27 19jan10 added embedded tables ;28-29 22feb10 colspans with no col ;30 21mar10 bizwir class names change ;31-33 3jul10 bugette in very large spanned columns and in styles and added -h and extra-fiphdr: ;34 13aug10 wrinkle - table with ONLY colspans - and preserve utf8/16 if output-data-type=raw-data-type ;35-36 23aug10 added convert-unmatched-unicodes:pass-thru ;37-40 31jan11 added use-sx and Style for LSE fix/fast ;41 11may11 added default-strip: ;018z 21apr04 ;a-b more tables2text cleanups ;c woops split-script went missing ;d-e 28apr04 added tag:XX strip:everything ;f-h 01jun04 bugette in R-dualRics and Unlisteds ;i 30jun04 bugette in tab2txt ;j-k 02jul04 added maxdup: ;l-p 14jul04 (protect isspace with 0200) plus tables-if no PI, use wrap ;q-r 02sep04 added no-break for <> in wrap_cell ;s 17sep04 speedy ;t-u 04nov04 Rtrs-UL now in roychk ;v-z 05oct05 strip:none was missing the end tags ;017z 29may03 added eoln-in-fiphdr plus alt-param-file ;b-d 05jun03 2 bugettes - minor ;e 10jun03 no-data: added to fiphdr; ;f 20jun03 table Priority ;g 30jun03 PI-FingerPost IDNColWidth added ;h 17jul03 make list-fiphdr visible (ie leave in FipHdr, not zap) ;i 21aug03 bugette in specials ;j-m 31oct03 timings and very big FipHdrs ;n 26nov03 added parseable doneque ;o-q 12jan04 bugette in wrap cells and slim_fiphdr/max-fiphdr-size ;r-s 04mar04 added FipHdr EQ (input queue) on 'add-EQ' and bugettes in special RORIGIN2 and added cont-chr/cont-zap-chr ;t-u 13mar04 allow lead/trailing spaces in continuation tags ;w-z 06apr04 zap any sundry tags inside a table - for now. ;016z 23jun02 preceeding and trailing blank lines can be tables when splitting..... ;a 10jul02 bugette in splitting tables ;b 18sep02 added -D and -S for ipformat compatibility ;c/d/e 17oct02 added TableSplit string ;f 31oct02 added single quotes too ;g/h 25nov02 cleanup tables and ignore # in linkhdr for reference ;i 04dec02 added style-CharWidth for tables and Lists ;i/j/k/l 12dec02 BUG with large files and allow continuations ;m-w 19dec02 added row-end plus bugette in get_duid/links ;x 25apr03 added -P ;y-z 14may03 remove trailing spaces from a table line and replaceTilde ;015z 10oct01 added table processing and added convert-all-other-entities: last end-tag was NOT being handled correctly ;c 15nov01 bugette with last tag if PRE ;d 19nov01 added preserve-padding-spaces ;e 21nov01 added sgmlchr-file ;f/g 22nov01 added convert-to-utf-8 and bugette - spaces before/after attrib values ;h 03dec01 bugette with duplicate fiphdr fields ;i 11dec01 cleanedup splits and added split-on-no-data ;j 28dec01 tables cleanup ;k 10jan02 added split-script and more on tables ;l 17jan02 more on tables plus endtags not correct on continuations plus handling DOCTYPE attributes better plus handling Comments redone ;m 22jan02 allow trees for fiphdr:AA tag:a/b as well as tagatt ;n/o/p 28jan02 added 'fiphdr-for-table' and 'split-on-level' ;r/s 09mar02 added -I ;t/u 16apr02 order of ending file is now Check Dups, then Mandatorys then Standing ;v/w/x 22apr02 bugette in line-up-cols with keepattributes and allowPresyInTag ;y/z 27may02 added 2nd key and link on fiphdr ;014j 16nov99 sort_out_tags ;a incdup now starts at A not B and new seqno_it ;b 28apr00 added levels/end/standAlone in sort_out_tags ;c/d 27apr01 added preserve-multiple-eolns ;e/f 31may01 added CDATA and PI-processInds and new-file-on-split ;g/h 25aug01 bugs ! - continuation text and keepattribute and added locale ;j 03oct01 added ignore-non-xml-data and redid splitters (copyright) 2024 and previous years FingerPost Ltd.