ipsgml

Outstanding featurette/bug with tag:a strip:none - misses the end tag
NOTE : split sequence numbers start at one.

    ipxml/sgml/wml/html/newsml/json

This program is used to convert into, out of and between different tagged
format files such as XML or SGML or variants like NITF, NewsML, XHTML, WML or
HTML.

It can also be used to pull apart JSON files for the same effect.

Generally it can be used to convert things like :
    - NewsML    <-> IPTC 7901 or ANPA 1312
    - NITF      <-> plain ascii
    - XML       <-> HTML
    - SGML      <-> NITF
    - SGML      <-> plain ascii
    - WML       <-> ascii
    - HTML/NITF tables -> inline markup for Quark, InDesign or other Editorial
systems

Data can be extracted from the SGML tags or attributes and formatted into text
eg.
    - convert and/or replace the data within a tag
    - plain ascii files -> XML possibly using FipHdr fields to create tagged data

Definitions and Glossary
    tag something between '<' and '>' eg. <BODY>
        usually ending with tagend. eg. <LOCATION>Hollywood</LOCATION>
    data    non-tag information     eg. "Hollywood" in the above example
    attribute - sub field/data within a tag eg. <LOCATION ID="996"
PLACE="Hollywood">
    NITF    News Industry Text Format as put together by IPTC and NAA.
    XML,HTML Much simplified sub-set of SGML for WWW. - see www.w3.org
    JsonTag "tagname":"datavalue" - see www.json.org

It scans its input directory and each file is processed according to a
parameter file specified either as the default or as the DY: FipHdr field.

Two types of processing are possible
    - strip or modify tag, attributes and/or data
    - extract data or attribute-data and stuff in a FipHdr field which can then be
used to replace the top of the file or used by a subsequent program.

There is also a question of where to send the output file as this, by default,
is put in spool/2go for IPWHEEL to distribute. So it needs a Destination(s) or
DU FipHdr field. This is added by either :
    - It there is a DX FipHdr field in the input file, that is used.
    - If not, the keyword 'dest' is used in the parameter file.
    - If that is not specified either, it is sent to 'woops' the Intercept queue.
    - You may also specify it from the incoming data or attribute-data using the
'fiphdr' keyword.
In this case the contents of DX, 'dest' or 'woops' will be the default if there
is no data.

IPXML may be used to convert XML tables to plain formatted text or in-line
markup such as Quark.

The parameter file in tables/sgml defaults to SGML and has the keywords:
    tag:(sgml tag name) (optional subkeywords)
        Process a Start or End tag as follows :
        start:(FipSeq)
            optional string to replace the tag
        end:(FipSeq)
            optional string to replace the end tag ie. </location>
        strip:(tag|attribute|zap|everything|data|end|none)
            optional strip all or part of the tag and its associated data
            tag All information between '<' and '>' is ignored.
                This will also zap the end tag if there is one.
            attribute all attributes are ignored; tag and data preserved.
            zap All information - tag, attrib and data is zapped to the next tag.
            everything  Same as 'zap' but lower tags are always zapped too.
            data    All data for this tag is ignored; tag and attrib preserved
            end Zap everthing, including all other tags until and including the end tag
: </NAME> unless any other tags are specified as NOT being stripped.
            none    Preserve everything (default)
        keepattribute: (optional FipSeq)
            Used during strip to keep all the attribute data. Any
            data after the keyword is added before and after the attribute :
                tag:ds  start:** end:-- strip:tag keepattribute:=
                <ds num="1.5" ver="orig">oinky</ds>
            gives   **=1.5==orig=oinky--
            As the optional data is will checked against the mapping tables
            please make sure they are what you want them to be.
        endkeepattribute: (optional FipSeq)
            Same as KeepAttribute: (above) except the data is ONLY added after the
attribute
            and before data.
        att: (attribute name)
            used with keepattribute: - use when only one attribute is required
            tag:content strip:tag   att:content-role start:[fip- keepattribute:|
endkeepattribute:-fip]
            <content content-ref="c00000002"
content-role="urn:x-hoho:content-role:INTRO" auto-generated="false">
            generates : [fip-urn:x-hoho:content-role:INTRO-fip]
            which can then be mangled by ipxchg or other at a later stage

        upper:  force the field uppercase
        lower:  force the field lowercase
            Note that these two conversions only change data up to the next
            tag or end tag (ignoring <P>) which may not be the end of this tag.
        list-fiphdr:P3 If converting OrderedLists <ol> or unordereds <ul>, this is
the FipHdr field containing the item number.
        tag:ul        strip:tag    start:<FipUL>  list-fiphdr:P6
        tag:ol        strip:tag    start:<FipOL>   list-fiphdr:P6
        tag:li        strip:tag    start:"\n   \P6"
            The actual string used in the Unordered list can be changed from a '*' using
the parameter 'unordered-list-chr:+'

        fiphdridx: use a link-Fiphdr (see below) to extract some FipHdr data
referenced by
            tag:A       strip:tag   end:(\R7) fiphdridx:a@href=R7

        Note when specifying the tag, do NOT specify either the presy/endy ie the '<'
or '>'.
        eg  tag:location    start:[ModeBold]    end:[ql]\n  strip:tag
        There is a special case for a comment <!-- This is a comment -->, where
        the 'end' subkeyword specifies the end of the comment.

    fiphdr:(2-letter code)  (optional subkeywords)
        Either  tagdata:(name of tag)
                specify the tag name which contains the data required.
        Or  tagattrib:(name of tag),(name of attribute)
        Or  tagattribute:(name of tag),(name of attribute)
                specify the tag name and the attribute name which contains the data
required.
        Or  data: (FipSeq)
                general data to add to a FipHdr field.
                NOTE you MUST dblqte top and tail any data field that includes spaces :
fiphdr:AB   data:"aaa bbb ccc"
        Or  text:
                Stuff the first part of text into this hdr field
                This searches for the <TEXT> tag. If not found, the top of data is used.
                default length is 100 chrs unless you change
                with a 'max:1024' (see below)

        For any of the fiphdr-tag* options, subkeywords are 'dup', 'max',
        'upper', 'lower'

        continue: allow this fiphdr to continue and include lower level tags

        dup:(optional separator)
            Flag that this field may be duplicated. Duplicate fields are separated
            with a space unless a separator chr is also specified.
            For 'dup' to work correctly, each tag or attribute to be accessed is
            stuffed into one fiphdr line only.
            Each occurance of the duplicated tag MUST follow sequentially with
            no other tags interceeding
        incdup:
            A second method of handling duplicate tags or tag/attributes is to
            create a new FipHdr field by incrementing the second letter of the FipHdr
name
            eg  fiphdr:J6   tag:DEST    incdup:
                the first FipHdr will be    'J6'
                the second          'J7'
                the third           'J8'
                etc
            So the idea is to start with 'J0' (zero) if under 10
            duplicates are possible or 'JA' if 26.
        maxdup: (max number of duplicates allowed for this field)
            default: no limit for 'dup', 26 for 'incdup'
            Use this to limit the number of entries in a duplicated field.

        max: (max number of chrs in  this FipHdr field)
            limit the size of the data to a fixed amount
            max:25
            Note there is no default except the absolute maximum is 1023

        mandatory:
            This FipHdr MUST be added - so if it is NOT found in the data, a FipHdr will
be added for the last

        upper:  force the field uppercase
        lower:  force the field lowercase
            Normally these take the concept of lower and uppercase chrs
            from the LOCALE of the system you are running on. These can
            be supplemented by the 'locale' and 'extralocale:'
            keywords below.
        key: and key2:  Some XML variants reuse structures and it is the contents of
an attribute which describes what the data really is.
            In NewsML for example there can be multiple TopicSets with the attribute
'Scheme' on the 'FormalName' tag which varies. Use 'key' to define which one.
            eg
            fiphdr:PP tag:FormalName dup: key:TopicSet/Topic/FormalName/Scheme="Internal
MetaCodes"

            See below for more comments for use with multiple structures you MUST
specify at least the tag and attribute in the key.

            There can be up to two 'key's for each 'fiphdr' - see below for an example
using 2 keys are necessary for NewsML Topics.

        index: (Tag@attribute)
            Create an internal FipHdr for use with this index for outputting with
tag/fiphdridx above
            fiphdr:R7   tag:FormalName  dup:    key:FormalName@Scheme="Ticker"
index:Topic@Duid

        For fiphdr/tagdata there is an additional keyword of 'attribute-is-data:'.
        This forces any information in attributes in any lower tags to be treated
        as data.

        As some FipHdr fields have distinct meanings - SN, DU, DP etc - please use
        2 letter codes starting N or Q.
            eg  fiphdr:NA   tagdata:itemid  dup:+
            get the data from each <ITEMID> field. If there is more than one,
            they are separated by a '+'.

        general examples
            fiphdr:PN   data:\SN    max:6
            fiphdr:HT   data:"This is the old HS =\HS="
            fiphdr:DI   tagdata:brodtext    max:200
Other keywords :
    data-format:JSON / XML      default is XML

    start-text-tag: (tag)
        Tag signifying the begining of text data for 1st line (etc) of text (\$1, \$t
etc)
        The default is 'TEXT' but is often defined as 'BODY' :
            start-text-tag:BODY
        or for NITF, the body.content tag
            start-text-tag:body.content

    pinhdr:
    pindata:The <P> Paragraph tag is handled separately from other tags as it
often
        'neutral' and should not alter the current processing.
        Use these two keywords to define what to do with the start and end 'P' in
        either a FipHdr field or in the data part:
        pinhdr:     start:~ end:\s
        pindata:    start:\n    end:\n
            'start:' being the string output in place of a <P>
            'end:' being the string output in place of </P>
        Note that CR NL etc are not valid characters in the FIpHdr - if you do need
        them use another unique chr and use 'ipxchg' to convert at a later stage.
        Defaults for pinhdr:    start:\s    end:\s
        Defaults for pindata:   start:\n    end:\n

    dest: (one or more Fip Destinations separated by space or '+')
        This can be overridden by the DX: FipHdr field.
        Note that all destinations MUST be in the tables/sys/USERS file.
        As per normal case is important, so ZAPME and zapme are 2 different
destinations.
        eg. dest:logcopy+outsgml.
    stripfiphdr:    do NOT copy the existing FipHdr of the input file onto the
output.
            Normally the existing FipHdr is preserved in the output file.
    nofiphdr:   do NOT add a FipHdr to the output file.
            Any new FipHdr keywords are added without the tilde NL top and bottom.
    zapfiphdrfields: (List of FipHdr fields to zap)
        Delete all occurances of the FipHdr fields specified.
        This is ONLY valid where the FipHdr from the input file is retained for the
output.
        In this case it is normal to zap :
            zapfiphdrfields:XZ,XS,CX,DC,SZ,CQ,CP,XP
    addhdr-file: (fullpath/filename in FipSeq)      default: none
        Extra, optional FipHdr information held in an external file
    addhdr-script: (script in FipSeq)           default: none
        Extra, optional FipHdr information generated by an external program or script
        addhdr-script:/fip/local/find_iim.pl \EP/\EN > \E3
        Temporarily, 3 FipHdr fields are available for the script :
            \EP holds the input folder
            \EN holds the input filename
            \E3 hold the name of a TMP file to create that will be read for the list.
    extra-fiphdr: (FipSeq)                  default: none
        Extra, optional FipHdr information - note this overrides the -h switch

    use-sx:
or  use-external-file:
        if there is an SX FipHdr field with a path to the data file, use that rather
than the data in the input file.

    filename: (FipSeq)  New filename for the output file name.
    supercede:
or  overwrite:  Where 'filename' has been specified, if there is already a file
            with that name in the output queue, it is deleted first.
    script: (path and name) Script to run AFTER processing.
            The output filename and path is added to the script before running.
            Care must be taken NOT to run a script on a file that normally is written to
a spooled queue.
            For example, the default output queue is 'spool/2go' where program 'ipwheel'
may have already processed the file (and
            possibly deleted it) before the script has had time to function. So it is
normal to specify a holding queue, not used by any other program as 'outque:'
            The script must therefore delete the file after use OR delete them all in
the nightly maintenance - 'zapfiplog'
            Note also that script called only once at the end of the file. Use
split-script: to run on each split (if using splits).
    outque:     Output Queue for the output file.
            This can be a FipHdr or FipSeq - which gets resolved at output (so can be
conditional on metadata which has been sourced from the current file)
            This default to the '-o' input switch which defaults to spool/2go.
            If the first chr is NOT a '/', it is assumed under spool.
            The default is outque is used in preference to -o,
            UNLESS the -V switch is on were -o is used over outque.
    doneque:    Done Queue for the raw input file.
            This default to the '-d' input switch which has no default.
            If the first chr is NOT a '/', it is assumed under spool.
            This can be in FipSeq - which gets resolved at output (so can be conditional
on metadata which has been sourced from the current file)

    before: (FipSeq)    String to parse and add at the top of the file.
    after: (FipSeq)     String to parse and add at the end of the file.
    beffile: (Path/filename) Contents of a file in FipSeq to parse and add at the
                top of the file (after 'before')
    aftfile: (Path/filename) Contents of a file in FipSeq to parse and add at the
                bottom of the file (before 'after')
    number:octal|dec|hex    In FipSeq, make all escaped numbers Octal, Dec or Hex.
                default is octal
    log:    Custom log line for the Fip Item log in FipSeq
        default is name of the parameter file (DF) and filename (SN)
    archive: (Archive Name) Archive all incoming raw data using this
        parameter file. The 'archive Name' can be FipSeq.
        This adds the file to the normal Fip archives in /fip/log/data
        It should be purged using 'ipmaint'.
        eg  archive: \SU
        or  combie:QS   SU|NS,rawdata
            archive:\QS
        ie Use the contents of FipHdr SU, if not there, NS, if not there
        just use the word 'rawdata'.

    striptags: Strip all tags EXCEPT those specifically stated using the 'tag'
keyword.

    default-strip: (tag|attribute|zap|everything|data|end|none)
        default strip all or part of the tag and its associated data
        (see strip: above for descriptions)

    ignore-non-json-data:
    ignore-non-xml-data: If there is any text or data BEFORE the start of the XML
document or any after the end of the last End Tag, it is stripped.
        For JSON this is all data before the first '{' or '[' and after the last
(matching) '}' or ']'.
        Normally it is preserved and output.

    locale:(valid locale)
        Change the locale from the System Locale to this
        The locale MUST be valid !
            locale:dk
    extralocale: (2chr combinations)
        For changing uppercase to lower and vice versa, we can add to the
        normal locale by specifying a series of 2 letters which the lower
        then the upper.
        The lowercase chr is 1st then the upper, then a separator or space.
        eg  extralocale:aA,bB,cC,dD,\212\232,\213\237
        Normal a-z/A-Z are by default : in the example above they are included
        to give an idea of syntax

    chr:(octal/dec/hex number):(FipSeq string)
    hdrchr:(octal/dec/hex number):(FipSeq string)
    txtchr:(octal/dec/hex number):(FipSeq string)
        Replace this character with the string - usually an Sgml escaped chr.
        USE THIS TO REPLACE SINGLE CHRS WITH SGML CHRS (ie opposite of 'sgmlchr:'
below).
        This can be a printable chr or an escaped number. The number is
        octal/dec/hex depending on the preceding 'number' keyword (if any).
        eg  chr:\313:&pound;
            chr:<:&lt;
            hdrchr:^:*
            Note that the ';' is part of the string and NOT a comment as it does NOT
start the line.
        hdrchr  works on new FipHdr fields only.
        txtchr  works on data and when data is taken from a FipHdr field and
            added to the data part of a tag.
        chr works on both data and new FipHdr fields.
        NOTE from 19g51, the hash '#' is ONLY converted in NEW FipHdr fields - NOT in
the original FipHdr if it exists.

    eoln:   Convert Line Ends (ie CR and/or NLs) from the outbound feed.
        SGML should be terminated CR NL :           eoln:\r\n
        for HTML (default) the EndOfLine is NL only :       eoln:\n
        for NO eoln, specify NO subparameter :          eoln:
        The subparameter can be any valid FipSeq.
        (SGML uses the term 'RE' (record end) for Carriage Return CR and
        'RB' for LineFeed NL meaning record begin.)
        Note that, unless using the 'preserve-multiple-eolns', you should map
        eoln to something unique like eoln:<mypara> as normally CR NLs are reduced to
        a single End Of Line.
    preserve-multiple-eolns:
        Normally multiple end-of-lines are stripped as they
        are meaningless in the XML world. Use this to preserve them!
    preserve-top-spaces:
        Do NOT strip all spaces and blank lines at the top of the output file.
    preserve-padding-spaces:
        Do NOT strip all spaces and blank lines at the beginning of each tag.
    strip-multiple-spaces:
        Strip all multiple spaces and blank lines inside each tag.
    allow-presy-in-tag: In XML/HTML etc, reserved chrs like '<' or '>' cannot
appear inside
        the attribute data of a tag - they must be encoded like &lt; etc.
        Use this where there might be some non-comforming stuff. However the
        drawback here is that they MUST be inside dbl qtes ie <meta ds="helle<p>ooo"
    convert-CDATA-sections:
        convert-CDATA-sections:no   - no dont ! (default)
        convert-CDATA-sections:yes  - yes pls and zap the '<!CDATA[' and ']]>'
        convert-CDATA-sections:preserve - yes pls and leave the '<!CDATA[' and ']]>'
        Normally a CDATA section like :
        <![CDATA[ Vongerful Vondafool C&oe;penh&areing;gen <99thisIsAnon-compliant
XMLtag> ]]>
        is considered a single, raw string of XML/SGML data. And all the tags and
        entities (like &lt;) are not changed either. Use this parameter
        to convert them.
        Note that you should use this option CAREFULLY if any tag in the CDATA
        is the same as a tag in the main envelope. See below for more comments.

    sgmlhdrchr: (FipSeq string) : (FipSeq Chr or String)
    sgmltxtchr: (FipSeq string) : (FipSeq Chr or String)
    sgmlchr: (FipSeq string) : (FipSeq Chr or String)
        Translate Sgml escaped chr back into a single chr or a string.
        USE THIS TO REPLACE SGML CHRS WITH A CHR OR A STRING (ie opposite of 'chr:'
above)
        Sgml escaped chrs always start with a '&' and end with a ';' : "&gt;",
"&copyright;"
        Note that case of both parameters IS important  - These two are different :
            sgmlchr:Oring:<CapOring>
            sgmlchr:oring:<smallOring>
        This will take &XXXX; and translate it.
        eg. sgmlchr:lt:<
            sgmlchr:oumlaut:\202
            sgmlchr:Utilde:{tildeU}
        sgmlhdrchr  works on new FipHdr fields only.
        sgmltxtchr  works on data and when data is taken from a FipHdr field and
                added to the data part of a tag.
        sgmlchr     works on both data and new FipHdr fields.
        NOTE that if the input is any NITF, XML or HTML feed and the output
        is just plain text, then you almost always need :
            sgmlchr:lt:<
            sgmlchr:gt:>
            sgmlchr:amp:&
            sgmlchr:apos:"
        BUT you will want to preserve them /leave them alone if the output is
        the same or another NITF, XML or HTML flavour.

    unicodelist: (dec or hex number) : (list of single FipSeq Chrs)
        Starting at the number, fill in the map of SINGLE character replacements in
sequential order
        For any map which is MORE than a single chr, use a '*' (or the value of the 
convert-unmatched-unicodes: parameter)
        and then use unicodechr: further down the parameter file.
        eg :
            ; Map Unicode Latin2s chrs to plain Ascii ... use a star for unmatched (or
will match later)
            unicodelist:x100:AaAaAaCcCcCcCcDd
            unicodelist:x110:DdEeEeEeEeEeGgGg
            unicodelist:x120:GgGgHhHhIiIiIiIi
            unicodelist:x130:Ii**JjKkkLlLlLlL
            unicodelist:x140:lLlNnNnNnnNnOoOo
            unicodelist:x150:Oo**RrRrRrSsSsSs
            unicodelist:x160:SsTtTtTtUuUuUuUu
            unicodelist:x170:UuUuWwYyYZzZzZzf
            ; NOTE hex 132, 133, 152 and 153 are mapped to '*' as they need more than a
single chr
            ;; .. so then we replace them properly
            unicodechr:x132:IJ
            unicodechr:x133:ij
            unicodechr:x152:OE
            unicodechr:x153:oe
    unicodechr: (dec or hex number ) : (FipSeq Chr or String)
        For all unicode chrs which are >= 256 (xA0), you can specify a map to a
single chr or a string.
        The chr can also be specified as hex with a preceeding 'x'
        Commonly used ones are :
            ; trademark
            unicodechr:x2122:(tm)
            unicodechr:8194:\s
            unicodechr:8195:\s
            unicodechr:8201:\s
            unicodechr:8211:-
            unicodechr:8212:_
            unicodechr:8216:'
            unicodechr:8217:'
            unicodechr:8220:"
            unicodechr:8221:"
            unicodechr:8249:<<
            unicodechr:8250:>>
            ; euro in a table
            unicodechr:8364:EUR
            ; fractions 1/3 .. 1/5 .. 1/6 .. 1/8 ... 7/8
            unicodechr:x2153:\s1/3\s
            unicodechr:x2154:\s2/3\s
            unicodechr:x2155:\s1/5\s
            unicodechr:x2156:\s2/5\s
            unicodechr:x2157:\s3/5\s
            unicodechr:x2158:\s4/5\s
            unicodechr:x2159:\s1/6\s
            unicodechr:x215A:\s5/6\s
            unicodechr:x215B:\s1/8\s
            unicodechr:x215C:\s3/8\s
            unicodechr:x215D:\s5/8\s
            unicodechr:x215E:\s7/8\s
            ; ByteOrder ?? x.feff d.65279 o.177377
            unicodechr:65279:\s
    convert-unmatched-unicodes: (FipSeq Chr)
        Single chr to represent a unicode chr which is NOT latin1 and NOT matched in
'unicodechr'
        default: '*'
        Normally these will be mapped to '*'.
        To pass-thru all unmatcheds, use : convert-unmatched-unicodes:passthru

    hdr-strip-between: start:(FipSeq Chr) end: (FipSeq Chr)
        Where the 1st 9 lines of text are used in FipSeq using \$1 etc,
        use this to replace any tags with a space.
        Normally the following would be used :
            hdr-strip-between:  start:<  end:>
        But if you have mapped the start/end tags to other chrs in 'ipxchg'
        (possibly to control the tags and replace later with 'txtchr')
        eg  ; for lines used in FipSeq - like 'before'and 'after'
            hdr-strip-between:  start:\201  end:\202
            ; for text lines - Convert back from 201 202 <>
            txtchr:\201:<
            txtchr:\202:>
    sgmlchr-file:(filename)
        Use this to pull in a standard XML Entity file such as found at
            http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
        See also the note on utf-8 below.
        Each line has an entry of :
            <!ENTITY hearts   "&#9829;">
                <!-- black heart suit = valentine, U+2665 ISOpub -->
    convert-all-other-entities: This flag will automatically convert all
        entities NOT covered by chr/hdrchr/txtchr.
        If the entity is a number   &#136;, it is converted to one or more bytes
        If the entity is a name     &euro;, it is converted to '(euro)'
    raw-data-type:ascii/utf8/utf16        default: ascii for 8bit chrs
    case-sensitive-tags:YES/NO
        In SGML and variants - HTML, early variants of NITF - the tag names are
        case INsensitive - ie <BODY> is the same as <Body> == <body>
        Ignoring case is the default for 'ipxml'
        BUT XML tags nowadays are case-SENSITIVE. So if you need to
        Our general view is that no sane person would run tags with the same name
        but with diferent case - but then we are not the experts !
        Use 'case-sensitive-tags:yes' to turn this ON.
    ** This must be specified at the TOP of the parameter file BEFORE any fiphdr:
or tag: !!
    ** This must be specified at the TOP of the parameter file BEFORE any fiphdr:
or tag: !!

    unordered-list-chr: (chr or string)
        This changes the actual string used in the Unordered list.
        Default is "*".
    replace-fiphdr-tilde: (FipSeq chr)
        If a Tilde is found in a fiphdr field, replace it with this chr - default
0376
    replace-fiphdr-eoln: (FipSeq chr)
        If an end-of-line (<p>, <br> CR, NL or CRNL) is found in a fiphdr field,
replace with this chr - default is SPC
    alt-param-file:(text)   (param file name)
        alt-param-file:<AlertML      alertml.fip
    add-EQ:
        add the input folder as an extra FipHdr field EQ
    cont-chr: (FipSeq chr)          default: 021 (DC1)
        Single chr to be used internally for flagging Continuation
        FipHdrs (ie for fiphdr:AB   tag:hoho    continue:)
        Use this if a 021 (hex 11) chr is valid data.
    cont-zap-chr: (FipSeq chr)      default: 022 (DC2)
        Single chr to be used internally for flagging Continuation
        FipHdrs (ie for fiphdr:AB   tag:hoho    continue:)
        Use this if a 022 (hex 12) chr is valid data.
    max-total-fiphdr-size: (total)      default: 32k chrs
        Max size of all FipHdr fields
    max-single-fiphdr-size: (total)     default: 4000 chrs
        Max size of a single FipHdr field
        This overrides the -F input switch

    wrap-lines: (no of chrs) : (fipSeq) default: no wrapping
        Wrap text lines (but NOT plain text tables if processing tables) to this line
length and insert the string
        ; wrap NONtabular stuff at 80
        wrap-lines:80:<fipWRAP>\n
    ignore-xml-in-wrap:no or (number)   default: no
        ignore any xml in the calculations for the linelength
        the number can be the amount to add for each xml tag - generally 0
        ; dont add anything for XML
        ignore-xml-in-wrap:0
    abstract-size: (number)
    abstract-fiphdr: (2 letter FipHdr code)
    abstract-msg: (message in FipSeq)
    stop-after-abstract:yes/no
        Create an abstract/first part of text when the derived data is equal or
exceeds the abstract-size.
        If a file is smaller than the size, only a single, complete file is output
        Default - no abstract at all.
        The fiphdr is used to flag if the file is the Abstract or the Main
        Stop flag can be used to Not continue with the Main file (default is both
files)
        The optional msg is inserted at the bottom of the text of an abstracted file.
            abstract-msg:\n\n***Abstract finishes, pls view original for remaining
text***\n
    never-log: Never log files (overrides the -l/-L switches)
    always-log: Always log files (overrides the -l/-L switches)
    log-level: (number)
        10 - default
        20 - log all tables

Input parameters (all optional) are :
Either
    -1 : filename for a single shot     default: spooled
        often this flag is used with -S (newname)
        to create a file called (newname) in spool/formsave
        for the DataFormatting module
or  -i : spooled input queue to scan    default: spool/2sgml
or  -I : scan input queue and       default: spooled
        stop after the last file has been processed.

    -o : output queue           default: spool/2go
    -d : done queue for original raw data   default: none-input deleted
    -D : display tags           default: no
        use this ONLY when running '-1' single shot to
        display all tags, attributes and levels and their data.
        ie use to debug/tune.
    -F : default max size of a single FipHdr    default: 4000
    -h : optional extra FipHdr string to add    default: none
    -JSON : data is in json         default: xml
        unless overridden in the parameter file
    -l : log every new file pls     default: do NOT log
    -L : log every new file pls with times      default: do NOT log
    -Q : quiet flag - do NOT flag minor errors  default: do
    -S : save this file in the save area    default: spooled output
        with the following name
        eg : -S "#SN:\XK#PP:\PP"
        use this for DataFormats (same switch as ipformat)
    -t : scan time for the directory    default: 2 secs
    -T : different folder under /fip/tables default: sgml
        This should only be used when upgrading and you need to run 2 ipsgmls
    -V : use the content of the -o input switch for the outque  default: use outque
if it exists
    -w : file wait time for files arriving  default: none
        across a network (for NFS, make about 10 secs)
    -W : allow DataFormat output filenames  default: no
    -XML : data is in xml           default: xml
        unless overridden in the parameter file
    -z : name of the default parameter file default: tables/sgml/SGML
    -Z : force the parameter file to this   default: DY or -z name
    -v : print version no and exit

**************** Notes ***********************

**** For Debugging, you can manually run the program with the -1 Single shot
switch with the -D to display all the tags in an input file
    CMD>ipsgml -1testfile -D -zNewsML.fip -otestfolder | more
    This will create a file in /fip/spool/testfolder

**** Rarely will you want 'sgmlchr:' and 'chr:' in the same parameter file -
chr converts single chrs to sgml chrs and sgmlchr converts them back !

**** 'sgmlchrs' are done first BEFORE 'chrs' Then any Upper/Lower case
conversion ****

So if you have a 'before', 'after' string (or files) withe embedded SGML tags
BUT still need to catch chrs '<' and '>' :
    1. in the 'before' string, chg all < to { and > to }
        eg  before:{!DOCTYPE abc.dtd}\n
    2. change < and > using txtchr
        eg  txtchr:<:&lt;
            txtchr:>:&gt;
    3. change { and } using txtchr
        eg  txtchr:{:<
            txtchr:}:>

**** Extra FipHdr fields are available to use :
    Z1 is the size in bytes of the data part of the document (ignoring before,
after, beffile and aftfile).
    Z2 is the size in bytes of the data of the document ie ignoring tag (ignoring
before, after, beffile and aftfile).
    if you are using Z1 and Z2 already, populate 2 other fields by :
        newZ1: 2 letter code replacing Z1
    eg  newZ1:VT
        will put the sizes in FipHdr fields VT and VU.

**** Extra System Variables are :
    \$1 first line of text
        ...
    \$9 ninth line of text

**** NULs (characters of binary zero) are stripped from the output file.
    So a parameter like the following will have no effect at all !
        tag:ds  start:\000

**** Current Limitations are :
    No more than 2 million tags may be specified.

**** If there is NO FipHdr or the SN field (which should be the name of the
file) is missing, the original filename is used as the SN. Any hashes ('#') in
this created SN field are changed to hex.9d/oct.235/dec.157

**** Program change - from version 14+, please use 'preserve-multiple-eolns' to
keep ALL the end of lines of non-xml data.

**** CDATA fields
Note that an XML CDATA field is specified as  tag named '![CDATA' - ie without
the trailing '['.

**** Splitting flies

For SGML/XML files that contain multiple 'things', there is a means of
splitting these either into discrete files or into a single file with a
Splitter string/tag and FipHdr pertaining to just that file.

    Eg  You might need to split off each ARTICLE from the following structure BUT
still retaining the Page info
    <PAGE>
        some relevant page info
        <ARTICLE>
            some relevant article info
        </ARTICLE>
        <ARTICLE>
            second relevant article info
        </ARTICLE>
    </PAGE>

Where a single output file with one of many 'splits' is required, use the
following parameters :
    split-on-tag: (tag)
    split-on-endtag: (tag)
    split-on-tagattribute: (tag),(attribute)
        Create a split on this tag or tag/attribute
        The split is put BEFORE the start or AFTER the end tag depending on the
option chosen.
    split-on-level: (number)
        While you can NOT specify trees for 'split-on-tag' (or tagatt), you may
specify the level at which the split MUST tale place.
        So that if you have multiple levels of embedded tags - like NewsMl
NewsComponents for example, use this to decide which level.
        eg : If you have NewsML/NewsEnvelope/NewsComponent/NewsComponent
            use split-on-level:4 to split ONLY on the 4th level, not the 3rd.
            use the -D input switch to show levels for a single file.
            This parameter has nothing to do with cooking.
    stop-on-tag: (tag)
    stop-on-endtag: (tag)
    stop-on-tagattribute: (tag),(attribute)
    stop-on-level: (number)
        ditto - but stop processing
    splitter-string: (FipSeq)
        This is placed in the data to signal the start of a new bit; FipHdr follows.
        splitter-string:********** BRS DOCUMENT START *************
        Where a single output file is required, this is placed in the data to signal
the start of a new bit; FipHdr follows.
        default is "\n<FIP-SPLIT>"
    new-file-on-split: (FipHdrField for Seqno)
        Instead of putting all the splits in one file with a <FIP-SPLIT> between this
option creates a completely new file.
        The FipHdr specified will contain the sequence number of this file from 1.
        new-file-on-split:NZ
    split-total-fiphdr: (FipHdrField for blank or the total number of split)
        The last file split will have the total number of splits in this FipHdr
field; all others will have it blank
        eg if there are 28 files split from a single input and
'split-total-fiphdr:AB' is specified
            .. the first 27 will have 'AB:' and the last, 28th 'AB:28'

    split-on-no-data:
        Normally only if the previous element had data will it be ended and the next
file started. Use this flag to force a split EVERY time the split criteria is
met, ignoring if there was any data.
    split-script: (path and name)   Script to run AFTER processing this file

- splits and checking for missing items
    Sometimes feeds cannot count.
    A single file with 25 items might number then sequentially - but skip a couple
which can seriously blow any downstream tracking (fip or external) which is
checking the Item Number before releasing all

    ipsgml can use a template to insert dummy items in place.
    split-missing-dataFH: 2 letter FipHdr field
    split-missing-template: (name of template file - contents will be FipSeq)
    split-missing-output: full path and filename in FipSeq
    split-missing-extra: And extra FipHdr metadata to be added to the item
    split-missing-log: log string
eg
; compare the Fip split seqno IS, to the total no of items according to SEC, AX
(not BQ which is the SEQUENCE)
split-missing-dataFH:AX

; if there is a missing, fill this template
; ...put the filler seqno in IS
split-missing-template:/fip/tables/edsys/SEC_EDGAR_FILLER.template
split-missing-extra:#XX:addextra#

; .. move to here
split-missing-output:/fip/spool/2tracker\V0_\F0\$o/SECfiller.su.\SU.an.\AN.seq.\
IS.max.\AX.cu.\CU.\$h\$n\$b.\$z.\$v.fip

; extra logging of split
split-missing-log:maxseq.bq.\BQ type.\QY date.\QD feed.\F0 id.\AN
..

    table-width-fiphdr: (FipHdr field)
        This FipHdr will contain the maximum width of the table.
        eg  table-width-fiphdr:AB
    table-width-minimum: (width)
        If 'table-width-fiphdr' is specified, make it a minimum of this. def. none
    strip-trailing-table-spaces:no/yes
        If there are any spaces atthe end of a table row, delete them (default)

    NOTE that if you are running splits, then you PROBABLY want to keep the
FipHdr.
    This is because there is often a chunk of metadata BEFORE the split which
needs to be saved for EACH split
    - and it has probably been stuffed in the FipHdr.

**** Multiple specified Structures

NewsML TopicSets and other multiple specified structures
Considering a structure like :
    <TopicSet FormalName="Companies">
        <Topic Duid="T00001">
            <TopicType FormalName="Company"/>
            <FormalName Scheme="Listed Companies">PNOK.L</FormalName>
            <FormalName Scheme="Nasdaq codes">PNOOK</FormalName>
            <Description>Pocket Nook Corp</Description>
        </Topic>
        <Topic Duid="T00002">
            <TopicType FormalName="Company"/>
            <FormalName Scheme="Listed Companies">FIP.L</FormalName>
            <FormalName Scheme="Nasdaq codes">DRIVL</FormalName>
            <Description>Mega Fip Corp</Description>
        </Topic>
    </TopicSet>

; get the Listed Coys and use '+' as a separator
fiphdr:YC tag:TopicSet/Topic/FormalName dup:+
key:TopicSet/Topic/FormalName/Scheme="Listed Companies"
; get the Nasdaq codes and use '*' as a separator
fiphdr:YN tag:TopicSet/Topic/FormalName dup:*
key:TopicSet/Topic/FormalName/Scheme="Nasdaq codes"
; use U1, U2 etc as holders of the descriptions
fiphdr:U1 tag:TopicSet/Topic/Description incdup:
would give new FipHdr fields of
    YC:PNOK.L+FIP.L
    YN:PNOOK*DRIVL
    U1:Pocket Nook Corp
    U2:Mega Fip Corp

**** Interpreting Tables

IPXML may be used to convert XML tables to plain formatted text or in-line
markup such as Quark.

The two main, and exclusive, uses are
    1. format table rows into plain text rows where the columns line up.
    2. add inline markup dependent on the table and the row.
This inline markup can be anything - Quark Tags, CCI, Atex, MediaSystem Justif,
InDesign etc.
A note of caution - IPXML will format tables (and tables within tables) with up
to 108 (was 62 until version 19g3) rows each. Any more -  use the data
formatting package.

For Lining-up-columns, it spaces out all the columns to the maximum in the
table. If there is an 'align' attribute, then the data is aligned according to
that. Otherwise the first column is flush LEFT and the rest flush RIGHT.
This can be overwridden by
    default-tab-align:left/right/center

How does it work ?

Data for each row is held as FipHdr fields (usually UA-UZ then U0-9 then
VA-VZ).

At the end of the row, it is output as a row using a FipSeq line which defaults
to :
    (spc) \UA (spc) (spc) \UB (spc) (spc) ..... \r\n
for the number of columns in that table.

This output can be replaced by using either the 'default-class' parameter or
the 'class' attribute on a 'TABLE' tag.

So if there is a <TABLE class="soccer-score">, then a file in
tables/sgml/class/SOCCER-SCORE should contain one or more of the following
keywords :
    table-start:[font=HelveticaBold][pointsize=16]SOCCER SCORE[quad]\n
    table-end:[quad]Data Supplied by Fippies.[quad]\n
    table-row:[font=Helvetica][tab][bold]\UA[roman][tab]\UB[tab]\UD[quad]\n

The table-start is produced BEFORE the table, the table-end after, which each
row has the table-row applied.

Note that in the above example we missed out the third field \UC - there is
noting to stop you rearranging the fields and NOT specifiying the data.

Also you may use the lovely FipSeq 'partial', 'combie', 'unique' etc to play
aroungd with the data.

If you do NOT specify a complete output line with table-row (or thead-row),
there are parameters for adjusting the look :
    column-gap: (FipSeq string)
    row-start: (FipSeq string)
    horiz-rule: (FipSeq chr)
These allow you to specify the actual chrs that will start a table data line
and the gap between each column and the character or string to use if an <HR>
occurs in the table.
eg Start each line with with a (hyphen) (space) and the gap is 4 spaces and
horiz rules are multiple '+'.
    column-gap:\s\s\s\s
    row-start:-\s
    horiz-rule:+

Keywords in the main parameter file
    format-tables:
        This is necessary to flag that the tables need formatting.
    default-class:(default-class)
        name of a file in tables/sgml/class holding Styles for outputting each row.
    line-up-columns:
        This flags that the data will be space padded to line-up the columns.
    column-gap: (FipSeq string)     default is 2 spaces
    row-start: (FipSeq string)      default is 1 space
    row-end: (FipSeq string)        default is NL
    horiz-rule: (FipSeq Chr)        default is '-'
    bullet: (FipSeq Chr)            default is '*'
    newUA: 2 letter code replacing UA as the first column of a row.
        Both must be a letter and the first cannot be 'Z'.
        The second will always be 'A'.
    fiphdr-for-table: (FipSeq string)   default: none
        Extra FipHdr to add if there is a table in the data.
    split-tables-and-text: (FipHdr)
        Add Marker in text Or create NEW file on tables/text transition.
    split-tables-into-files:
        use this to split the incoming file into discrete files for tables and
non-tables
            The default is NO to add the <FipSplitTables> string
        For files - A new file is created on start and end of table
        and the FipHdr is used to hold the Sequence number of this take.
    fiphdr-for-text:  (FipSeq string)   default: none
        Extra FipHdr to add if this subfile is a text element.
        This is ONLY if the 'split-tables-and-text' is specified.
    use-pi-widths:yes
    pi-colwidths:IDNtableColWidth
        This expects a PI tag with colwidths eg
            <?FingerPost IDNtableColWidth="10 10 10 20" ?>
        Default no
    wrap-table-cells:no/yes/(number)
        This has 2 purposes -
        - with a number (40 or over) : optimum col width of a table (ie dont squeeze
too much !)
        - or Make this NO to automatically calculate the max width of each column and
space out accordingly.
        Default is YES.
    max-col-width: (number)
        Force the colwidth to be a max of this number.  Default: no max.
        If there are 3 numbers, the 1st is the max width of the 1st col
                and the 2nd is max width of the 2nd
                and the 3rd is the max width of all subsequent cols
        ie make the first col a max of 30 chrs and all others 20
            max-col-width:30,20
    interpret-style:idx/bizwir/html
        Interpret some css attributes if there are any (currently just alignment)
        Parameter 'idx' states they are HTML Tidy styles which are numeric from 1-n
        Parameter 'html' looks for ordinary html 'left', 'right', 'center'
            (note - both 'class=' and 'style=' are checked)
            <td class="bold32 spc33 right">
        Parameter 'bizwir' looks for BusinessWire classes 'bwtextalign...'
            <td class="bwcellpaddingleft0  bwverticalalignbottom bwtextalignleft
bwsinglebottomborder">
        or  <td class="bwpadl0 bwnowrap bwpadr0  bwvertalignb bwalignl
bwsinglebottom">
    max-cols-per-row: (number up to and including 108)
        default is a max of 108 columns per row (was 62 until version 19g3)
        Use this to allow up to 108
        The data in columns that exceed this is ignored
    default-tab-align:left/right/center

In the CLASS file
    table-start: (FipSeq)
    table-end: (FipSeq)
    table-row: (FipSeq)
also same three for THEAD, TBODY and TFOOT. Eg:
    thead-start: (FipSeq)
    tbody-row: (FipSeq)

----------------------------------------------------------------------------

Version Control
;19g64  04jul11 redid endtags ;3 bugette with > 26 columns ;4 -V added ;5 woops
endtags/tables ;6 added filter
    ;7-8 valgrind cleanups
    ;9 20feb12 added a 2nd column for max-col-width (ie 2->3) and cater for UTF8
chrs in a table
    ;10-15  6mar12 strip 'formsave/' off the front of -S is default output and
added -W
    ;16-19  1oct12 outque is now FipSeq and upto 50 speedy outques ;18 bugette in
trimming v.large FipHdrs
    ;20 15feb13 bugette - endtags and strip:everything ignored.
    ;21  4mar13 added unicodelist ;22 22apr13 bug if rowspan in last table
    ;23-27 26apr13 bugette with EndTags and trees
    ;28 29apr14 added file-trace
    ;29 23mar17 added no-log/always-log ; 30 minor
    ;31-34 4sep17 added data-format:JSON ;35 allow optional doneque ;36 allow
class OR style for <td> alignment
    ;37-38 16dec17 reset pi-widths between tables (only allowed for a single
table)
    ;39 cleanup ;40 json issuette in commonxml
    ;41-42 27sep18 allow fiphdr/key to be tagData not just tagAtt
    ;43 cleanups for speedy
    ;44 9feb19 added default-tab-align
    ;45-46 28mar19 bugette combination of split and output-raw-data
    ;47 12jul19 outque parsed
    ;48-49  8nov19 better handling of Json [] ;49 minor
    ;50-51 18mar20 hdrchr - if HASH/#, the original FipHdr is NOT changed - only
any new fields
    ;52-53 9jul20 added -JSON and -XML as input variables ;54-55 minorbugette ;56
maxFipHdr->limit from 4000
    ;57 11oct22 added split-total-fiphdr:AB
    ;58  4nov22 added split-on-level and output-raw-tag for Json and zap the last
split file if data=0
    ;59-60 28dec22 added fiphdr:.. max-dup as trim + issuette output-raw-data as
last tag + splitSeqno start 1
    ;61-62 17jul23 added split-missings
    ;63  8nov23 bugette  issuette output-raw-data as last tag
    ;64 23apr24 default-strip tuning

;019f41 17may06 bugette in EndTags when strip:none
    ;a-c 21sep06 added new Xmlinternals TagSpecial (b nasty bug in 19a) (c CDATA
quirk)
    ;d 14aug07 tweak to trees
    ;e1-22 20sep07 more on plain text tables - added Rowspan properly
    (;19 added addhdr-script ;20 added -T ; 21 bugette DX and 'dest' were swopped)
    ;23 2apr08 bugged in wrap ;24 9may08 added log-split ;25-26WINNT + key
bugettes
    ;e27-35 23jun08 for strip:everthing and end tags and tables with no rows (35
utf8 bugette)
    ;e36-37 22oct08 added -F and max-single-fiphdr-size
    ;e38-40 27oct08 Bizwir - sup and inf added (plus start table strip bugette)
    ;e41 15dec08 added split-on-endtag: ; 42 internal-tuning wrap buffersize
    ;f1-3 01feb09 made Tag structure variable to cope with files > 2million tags
    ;f5 27feb09 added stop-on-tag/att/endtag stop-on-level
    ;f6-14 20mar09 added abstract-fiphdr and abstract-size ;9 bugette ;15
maxStyles->3000
        ;16 rework levels in common ;17-24 bugette for FipHdrs > 64k (commonxml too)
        ;19 21oct09 minor check on abstract ;23-24 redid styles ;26-27 19jan10 added
embedded tables
        ;28-29 22feb10 colspans with no col
        ;30 21mar10 bizwir class names change
        ;31-33 3jul10 bugette in very large spanned columns and in styles and added
-h and extra-fiphdr:
        ;34 13aug10 wrinkle - table with ONLY colspans - and preserve utf8/16 if
output-data-type=raw-data-type
        ;35-36 23aug10 added convert-unmatched-unicodes:pass-thru
        ;37-40 31jan11 added use-sx and Style for LSE fix/fast
        ;41 11may11 added default-strip:
;018z   21apr04 ;a-b more tables2text cleanups
    ;c woops split-script went missing
    ;d-e 28apr04 added tag:XX strip:everything
    ;f-h 01jun04 bugette in R-dualRics and Unlisteds
    ;i 30jun04 bugette in tab2txt
    ;j-k 02jul04 added maxdup:
    ;l-p 14jul04 (protect isspace with 0200) plus tables-if no PI, use wrap
    ;q-r 02sep04 added no-break for <> in wrap_cell
    ;s 17sep04 speedy
    ;t-u 04nov04 Rtrs-UL now in roychk
    ;v-z 05oct05 strip:none was missing the end tags
;017z   29may03 added eoln-in-fiphdr plus alt-param-file
    ;b-d 05jun03 2 bugettes - minor
    ;e 10jun03 no-data: added to fiphdr;
    ;f 20jun03 table Priority
    ;g 30jun03 PI-FingerPost IDNColWidth added
    ;h 17jul03 make list-fiphdr visible (ie leave in FipHdr, not zap)
    ;i 21aug03 bugette in specials
    ;j-m 31oct03 timings and very big FipHdrs
    ;n 26nov03 added parseable doneque
    ;o-q 12jan04 bugette in wrap cells and slim_fiphdr/max-fiphdr-size
    ;r-s 04mar04 added FipHdr EQ (input queue) on 'add-EQ'
        and bugettes in special RORIGIN2 and added cont-chr/cont-zap-chr
    ;t-u 13mar04 allow lead/trailing spaces in continuation tags
    ;w-z 06apr04 zap any sundry tags inside a table - for now.
;016z   23jun02 preceeding and trailing blank lines can be tables when
splitting.....
    ;a 10jul02 bugette in splitting tables
    ;b 18sep02 added -D and -S for ipformat compatibility
    ;c/d/e 17oct02 added TableSplit string
    ;f 31oct02 added single quotes too
    ;g/h 25nov02 cleanup tables and ignore # in linkhdr for reference
    ;i 04dec02 added style-CharWidth for tables and Lists
    ;i/j/k/l 12dec02 BUG with large files and allow continuations
    ;m-w 19dec02 added row-end plus bugette in get_duid/links
    ;x 25apr03 added -P
    ;y-z 14may03 remove trailing spaces from a table line and replaceTilde
;015z   10oct01 added table processing and added convert-all-other-entities:
    last end-tag was NOT being handled correctly
    ;c 15nov01 bugette with last tag if PRE
    ;d 19nov01 added preserve-padding-spaces
    ;e 21nov01 added sgmlchr-file
    ;f/g 22nov01 added convert-to-utf-8 and bugette - spaces before/after attrib
values
    ;h 03dec01 bugette with duplicate fiphdr fields
    ;i 11dec01 cleanedup splits and added split-on-no-data
    ;j 28dec01 tables cleanup
    ;k 10jan02 added split-script and more on tables
    ;l 17jan02 more on tables plus endtags not correct on continuations
        plus handling DOCTYPE attributes better
        plus handling Comments redone
    ;m 22jan02 allow trees for fiphdr:AA tag:a/b as well as tagatt
    ;n/o/p 28jan02 added 'fiphdr-for-table' and 'split-on-level'
    ;r/s 09mar02 added -I
    ;t/u 16apr02 order of ending file is now Check Dups, then Mandatorys then
Standing
    ;v/w/x 22apr02 bugette in line-up-cols with keepattributes and allowPresyInTag
    ;y/z 27may02 added 2nd key and link on fiphdr
;014j   16nov99 sort_out_tags
    ;a incdup now starts at A not B and new seqno_it
    ;b 28apr00 added levels/end/standAlone in sort_out_tags
    ;c/d 27apr01 added preserve-multiple-eolns
    ;e/f 31may01 added CDATA and PI-processInds and new-file-on-split
    ;g/h 25aug01 bugs ! - continuation text and keepattribute and added locale
    ;j 03oct01 added ignore-non-xml-data and redid splitters

(copyright) 2024 and previous years FingerPost Ltd.