Guide to the Data Formatting Module

—+ The Guide to the FIP Data Formatting Module
%TOC% needed

Relevent Program documentation.

* form
* ipformd
* ipformat
* ipformbl
* ipformch
* ipfsel
* ipfprep
* ipfchk
* ipsetter

HOW TO FORMAT !

Time for Some Golden Rules:

  • 1: Study the INPUT file in detail.
    Try to get several input files as you need to know EXACTLY what variations
    there can be in the input.
    Try to get a specification for the input if it exists, as that can save hours trying to guess.

  • 2: Study the OUTPUT file in detail.
    Again a spec is useful.

  • 3: Talk to people about why they did it that way or why they want it that way.
    If you are trying to copy an existing format, check with the Users what they REALLY need. Often things were done in a strange way because of system shortcomings or user’s personalities. Get a different person in charge and they may have different emphases. You may find that there is a subsquent system written on the system you are delivering to that is making a goodmany changes that you could include in your file processing. Similarly you may find that a lot of manual work is done on the file which you could circumvent.

  • 4: Remember to eat Lunch.
    No system is worth missing food and drink.

Now the Steps :

1. Sort out Copy Flow for both testing and live running.

  • Do you need an ‘xchg’ before formatting ?
    It is always a good idea to rationalise the input file as far as possible
    – running a preliminary xchg may help.

  • Do you need an ‘xchg’ after formatting ? Again there may be some tidying up needed once the file is processed that is better
    suited to ipxchg than to ipformat.

  • What about special processing like needing to ‘sort’
    output ? Do you need to sort the copy inbound ? Do you need to resort it outbound ? Do you need to select specific records ?

  • Does it need mutiple datafomats – occasionally you get a file that can be sorted one way, then formatted to say pick up items between consecutive records, then sorted another way and reformatted for output.

2. Map out what you want to format.

  • It is usually easier to make notes BEFORE starting to test.>
  • Just a few notes is all that are required.

3. Setup and start testing the main Format.

  • Often is easier to copy an existing Text file from form/text, use it as a template and rework it to meet the new requirements. There are two ways of testing copy ‘offline’. At the Unix command line you can use the ‘form’ program to set yourself up a testing environment.
  • Alternatively you can use the form module in W4.

4. Work out the other parameters in the Copy Flow.

  • See the relevant section in this guide for an example of Copy Flow.

5. Release

  • Dual running over a number of days is always advised.

6. Check

It is always worth going back to the users and seeing how things can be improved for them. There may be things you have missed, or just things that they now realise you can do, that they have never thought to ask for before.

Pointers

  • If you need to sort the output file : use the ‘job’ parameter in the text/PROCESS file (see sections Copy Flow and [[Ipformd][pformd]] in the Program Section).
  • if you need to select only certain records from the input file matched against a standing list of keys, use [[Ipfsel][ipfsel]] (see description in the Program Section).
  • if you need to track several input files have arrived over a period of time, flag if they are missing and process if they are there (for instance, a number of postscript files for an OPI), use [[Ipformch][ipformch]], (see description in the Program Section).
  • if the input file(s) are very dirty and inconsistent, clean them up beforehand with [[Ipfprep][ipfprep]] (see description in the Program Section).

USING ‘FORM’ TO TEST AND TUNE

There are two interfaces to test & tune your dataformats … form is a fast command and basic line driven interface that runs under Unix. There is also a web based interface available through W4.

FORM, the user interface for the Data Formatting module, can be used to test
and tune a process ‘internally’ – without actually sending the output file anywhere nor deleting the input.
It can also be used to run an XCHG before and/or after the format.

Several tests can be run and their settings are held in a series of Settings files in tables/form/test.

Note at the moment ‘form’ will NOT scan the PROCESS file and run any [[Ipformd][pformd jobs]] (qv) as it is used purely for getting the parameter file in form/text 100% correct.

Firstly you choose an existing Settings file – if there is one.

Then you need to get the input file into the queue in spool you are using for testing.

If it is the first time you will need to ‘mkdir’ your test area. Generally one has been setup called ‘spool/test’.

If the input file is on diskette or on another system, just copy it over. If the file came from a wire service or dialup and is in an FIP Archive log, you can add a destination in the sys/USERS file and resend to it. Generally, if FIippies have been onsite, there is a default one already setup called ‘test’ which goes through ‘ipedsys’ to cleanup the filename BUT does not go through any ‘xchg’ :

	sys/USERS : test=	DP:com2  DQ:2edsys  EQ:test SC:no DC:no DF:testing

This uses edsys/TESTING to place the file in spool/test.

To get into the test bit of ‘form’, type ‘x’ at the form prompt. This will reveal firstly which Settings files are available which you can choose from.

The current settings are displayed before the list of options available.

All options are a single character (case-insensitive) followed by ‘enter’.

When changing settings, a ‘*’ can be used to list all the files in the relevant directory. Case is only sensitive for filenames.

root @ com_server2/ > form
form-com_server2:x
** Test FORM - Looking for Settings files
-- List of Files in the Queue : /fip/tables/form/test
total 2
-rw-rw-rw-  1 root		93 Mar 17 10:53 RESRAC
-rw-rw-rw-  1 root		94 Mar 17 19:52 RESTEST
** Hit Return to continue .. 

** Test FORM - Choose a Settings files (or return to ignore) :restest
** Test FORM - Existing settings are :
	Working Queue		: /fip/spool/test
	Input file		: rem737
	TEXT Param file		: RESTEST
	XCHG before		: PA2FORM
	XCHG after		 : PA2ATEX
	DU for Live Tests	: sunres


** Options are :
	Change Settings		: C
	List the Working Queue	: L
	Look at the Input File	: I
	 ..  .. the Output File	: O
	 ..  .. the BreakOut	: T
	Edit TEXT file		: E
	 ..  Xchg before file	: B
	 ..  Xchg after file	: A
	 ..  PROCESS for jobs	: P
	Help			: H
	Run a Test		: R
	Send Live Test		: S
	Quit			: Q : 

In (slightly) more detail, the options are :
Change Settings- change any of the 4 inputs

Listing the working queue ‘ls -l’ of the working queue
Look at the Input file More, Dump, Edit, Tail the input which MUST BE in the working queue
Look at the Output file More, Dump, Edit, Tail the output which is hidden in the temp queue.
Look at the BreakOut file More, Dump, Edit, Tail the breakout file which is that created by IPFORMAT showing how the input is split into fields and records
Edit the TEXT parameter file IPVI the parameter file in form/text
Edit the Xchg before file IPVI the (optional) xchg file to be used BEFORE the format.
Edit the Xchg after file IPVI the (optional) xchg file to be used AFTER the format.
Help More this file !
Run a Test run IPFORMAT and the look at the output file.
Send Live Test copy the test file and send to the DU (destination) specified.
Quit No documentation available on this

In more more detail …..

-- Change Settings :
** Change Settings - Existing settings are :
	Working Queue		: /fip/spool/test
	Input file			: rem727
	TEXT Param file	 : RESTEST
	XCHG before		  : PA2FORM
	XCHG after			: PA2ATEX
	DU for Live Tests		 : sunres

  (At each prompt, Use '*' for list of files)
	Change Working Queue	 : W
	Change Input file		 : I
	Change TEXT Param file  : T
	Change XCHG before		: B
	Change XCHG after		 : A
	Change DU for Live Tests: D
	List the Working Queue  : L
	Quit			: Q or enter : 

Note that a ‘*’ ‘enter’ at any of the Change file lines will list the relevant queue before reprompting.

Type ‘-none-‘ (or in fact just ‘-‘) to set an optional field to ‘-none-‘.

Note that any reference to ‘jobs’ should be ignored in this version – I do.

Looking at any of the files :

** Look at file : rem727 : Options are :
	More	 : M or enter	- This does a 'cat -v' before to show ControlChr
	Dump	 : D		- Essentially an 'od -ab'
	Edit	 : E		- Using 'vi'
	Tail	 : T		- Last 20 lines of a 'cat -v'
	Quit	 : Q : 

Running a Test :

– A simple format with no xchgs –

** Running please wait ... 

	/fip/bin/ipformat -i /fip/spool/hoswrk/rem727 -p ATEXHORSES -o /fip/xFORM.XX.DEFAULT.XX -s -D -l -xo 

** Ok Done	 Hit Return to continue ..

This will now go directly into the Looking at Output file menu.

– A more complicated run with an xchg before and after –

** Running please wait ... 
	Saving the input in /fip/x/FORM_rem727
	ipxchg -1 /fip/x/FORM_rem727 -D PA2FORM -o /fip/x -F 
	** Ok : Xchg Before finished 
	/fip/bin/ipformat -i /fip/x/FORM_rem727 -p RESTEST -o /fip/x/FORM.XX.DEFAULT.XX -s -D -l -xo 
	** Ok : Format finished 
	ipxchg -1 /fip/x/FORM.XX.DEFAULT.XX -D PA2ATEX -o /fip/x -F 
	** Ok : Xchg After finished 
  Hit Return to continue ..

Again this will now go directly into the Looking at Output file menu.

Note that ‘form’ allows you to save the setting you have chosen in a Settings file, so that the next time you go into form it should display the settings from the last time.

Sending a Live Test :

This will copy the input test file and send it to the Destination (DU) called the ‘Live DU’ above.

It then tails the Item Log of that system underthe assumption that ‘ipformat’ will give a message when the file is through. – Cntrl C to Stop and return to the main Form Test prompt.

*Prerequisties for sending a Live test (if that is possible) :*

In form, you need to specify :

  • an input file
  • a working queue
  • a destination (DU) which MUST be in the USERS file.
  • Also remember to check your SC/DCs for your xchg’s.
  • Also remember to add the selection lines in tables/form/PROCESS.

—-

USING THE WEB BASED VERSION OF ‘FORM’ TO TEST AND TUNE

The web interface also allows you to run offline tests of your dataformat.
It is similar to form, but gives you slightly nicer views of the file (for
instance hidden characters, high characters and line endings are all high-lit
in red.)

i) *You will need to have the form option enabled in your W4 logon*
This is normally done by adding the line

;;;;; Data Formatting
options:Data Formatting:/fip-pages/form/dftest.html:_blank

to either your W4 logon, or, more usually, to a template called by your W4
logon.

ii) This should be a fairly intuitive interface (feedback welcomed though)

You first select or create a test ….. a test is a definition of how you are
planning to test a file, so it will define the input file, any xchgs you plan
to run against it, and the format you plan to run.

 

The top section of the left hand pane will describe the parameters you
choose, and the links in the bottom selection can be used to select a
sequence of xchgs, sorts and formats.
Having selected the parameter files to run

COPY FLOW

How do you route copy into, and out of, the data formatting module ?

– Using the normal FIP routings !

An simple example is the Horse Racing Cards which arrive via dialup modem from the course administrators :

Stage

Program

Input Queue

Parameter tables and Remarks

Input

VWIRE

wire/RM
where RM is the name of the incoming feed

Routing

IPROUTE

2brouted

route/RM
Actual routing line is :
1 z="*HORSE RACING*" +horses
ie make an 2nd copy of the file and send to destination ‘horses’

Dist’n

IPWHEEL

2go

sys/USERS
horses= DP:localhost DQ:form SC:NO DC:NO
ie process on whichever system it arrived on, send the file directly to spool/form with NO chr translation (ipxchg).

Format Selection

IPFORMD

form

form/PROCESS
Selection of actual Format prarameter file
; Selection criteria for Horses
DU=horses >atexhorses

ie if the DU field is exactly equal to ‘horses’
do the ‘atexhorses’ job.

Actual Processing

IPFORMAT

form/text/ATEXHORSES
The original file is deleted at the end while the new, formatted file is sent back to ipwheel with a new destination (DU) of ‘atexhorses’.

Dist’n of output

IPWHEEL

2go

sys/USERS
atexhorses= DP:atex1 DQ:junk-wir SC:HORSES DC:ATEX DF:DATAFORM
ie send file to 2atex queue via ‘xchg’.

Chr Xchg

IPXCHG

xchg

xchg/HORSES2ATEX
Clean up the data

Send to Atex

IPGTWY

2atex

gateway/DATAFORM
Send it to junk-wir

EXAMPLE 2 : STOCKS : More complicated examples are for the various Stocks.

These large Hong Kong tables follow the same path as ‘horses’ except that the names of the parameter files are obviously different.

Selection in PROCESS is on Filename only

Format text file

form/text/HKSTOCKS

output xchg

xchg/HKSTOCKS

For the Regional Stocks, there are two different input formats (plus Manilla which is different but simpler) which need to be used to create almost the same output.

The Input variations are :


First type :

RIC

DISPLAY NAME

LAST

TODAY’S HIGH

TODAY’S LOW

HISTORIC CLOSE

AAH.AS

ABN-AMRO HLDGS

59.9

60

59.3

59.5

ACHN.AS

ACF HOLDING

35.7

35.7

35.5

35.5

AEGN.AS

AEGON NV

112.4

113

12.2

112.8

Second Type :

*FASTCLOSE

RIC

950301

1900

KLS

SECURITY

DATE

HIGH

LOW

LAST TRADE

PREVIOUS CLOSE

AYER HITAM TIN STK

950301

4.100

3.960

4.100

3.960

AYER HITAM PLANT STK

950301

14.300

14.300

14.300

13.500

ACIDCHEM STK

950301

6.350

6.100

6.350

6.000

Manilla Stocks :

NAME

CLOSE

HIGH

LOW

PRECLOSE

A Soriano

3.2

3.2

3.15

3.2

A Soriano B

3.2

3.2

3.2

3.15

There are two outputs which are identical EXCEPT London and New York prices are quoted in fractions NOT decimals and so they require a different Character Xchg.

The processing is more complicated than for ‘atexhorses’ as some of the names of the Stocks are changed AND the output is sorted alphabetically. For example ‘INTL BUS MACHINE’ needs to be ‘IBM’, but as ‘IBM’ it will start the ‘I’s and NOT appear after ‘Inland Steel’ as in the input feed.

So we use the ‘job:’ keyword in the PROCESS file to get IPFORMD to run a series of jobs rather than just start IPFORMAT as in the example above.

Look at the processing for London and New York :
While the selection remains similar to ‘atexhorses’ :

SN=LON.TXT	>fracstocks	; London

we select on the filename this time.

But At the top of the PROCESS file there are a series of parameters for the job ‘fracstocks’ :

;
; Job sequence for London and New York - Fractions
job:fracstocks  /bin/rm -f formsave/FRACSTOCKS*
job:fracstocks  /fip/bin/ipformat -p fracstocks -i $i -D -S FRACSTOCKS
job:fracstocks  /fip/bin/ipxchg -1 formsave/FRACSTOCKS -D fracstocks -F -o formsave
job:fracstocks  /bin/sort +0 -3 -o formsave/FRACSTOCKS.s formsave/FRACSTOCKS
job:fracstocks  /bin/mv formsave/FRACSTOCKS.s 2go/#SN:\SN#DU:atexstocks

So the copy flow for London Stocks is that the FORMAT stage is replaced by ALL these job lines in sequential order :

  • Remove any files starting FRACSTOCKS in formsave
  • Run IPFORMAT with the input file and parameter file FRACSTOCKS leaving the output in formsave/FRACSTOCKS
  • Run IPXCHG once on formsave/FRACSTOCKS overwriting the input with the xchged file. This is to get the correct StockNames eg:
    x/Hong Kong Telecm/HK Telecom
  • Sort formssave/FRACSTOCKS on the first three words, creating an output file called formsave/FRACSTOCKS.s
  • Move formsave/FRACSTOCKS.s to 2go with destination (DU) ‘atexstocks’ and preserving the original filename (SN)
  • Please refer to the documentation on IPFORMD in the programs section for more information on jobs.

    One further note on the Stocks is that all the output files pass through the STOCKS2ATEX xchg.

    This is used to add column headers before certain Stock names. eg :

    x/Northern Elec/\n{M1Stock\rClose\rHigh\rLow\rPrev\r\n{M0Northern Elec

    4. PARAMETER FILE REFERENCE GUIDE

    This is the reference and hints section describing the main Parameter file used for processing.

    Overview

    Part 1. Define what the input file looks like plus a general section covering fixed information.

    Keywords:

    filtyp:

    recsep:

    reckey:

    recsiz:

    fldsep:

    fldkey:

    fldsiz:

    stripeol:

    number:

    startkey:

    keycasesens:

    wild:

    wchr:

    set:

    include:

    calc:

    fraction:

    base:

    date:

    style:

    partial:

    match:

    hdr:

    nohdr:

    name:

    chrset:

    before:

    after:

    Part 2. Output Section

    —-

    Overview

    Record Processing Lines

    This is flagged as beginning with the ‘output:’ keyword. It describes what processing should be done for each input record.output

     r=X

    lines to describe the output.

    Record lines can have system variables, input fields, tests, builtin formatting etc.

    Each one of the keywords is described below.

    Each line is a self-contained item ending with a NewLine (Unix) or CR LF (PC) or CR (Mac). The text parameter file can be edited by any word-processor on any (normal) platform AS LONG AS the end result is a pure, raw ascii file with no Presentation or fancy graphics embedded.

    Comments are the usual semi-colon in front.

    	; comment

    Reserved Names

    The list of keywords is the list provided above plus a series of tests and builtins described below.

    Note that it is possible – but not advised – to override some of these keywords. So these names should be considered reserved. In addition a few other names have been reserved for future use :

    • blksep:
    • blkkey:
    • blksiz:

    The Structure of the Parameter File

    The text file is split into 2 main parts as described above. The OUTPUT section must the second and is marked by the keyword ‘output:’ on a line on its own.

    By common consent, the first part starts with the definitions of the file, records and fields but this is NOT strictly necessary. The advice is – do whatever is easiest for you.

    Comments and the binary version of the Parameter file

    Comments are lines STARTING with a semi-colon. You can have millions of comment lines and, except for the first run, they will have no effect on run-time speeds.

    This is because ‘ipformat’ uses a compact, binary version of the Text Parameter file which is built automatically by ‘ipformbl’ every time you modify the Text version.

    The only time you need touch the binary versions (in tables/form/bin) is that they should be deleted during every software upgrade of the DF module.

    Comments are encouraged

    The Processing Loop is…

    The actual processing cycle is :

    For each input file hitting queue spool/form :

    • ‘ipformd’ will select the correct Parameter file using form/PROCESS
    • Normally this will mean starting ‘ipformat’ with the chosen file

    Once started by ‘ipformd’, ‘ipformat’ will go through the following steps :

    • – Preprocessing
    • – Create a Fip style header unless not required with ‘nohdr’
    • This can be the standard one or can have extra fields added using ‘hdr’
    • – Add Data at the beginning of the output file if the ‘before’ keyword has been specified.
    • – Processing input file
    • – Split the input file into records and for each record :
    • – Start at the ‘output’ section of the Parameter file
    • – Check each record specification line. If it is for that record type, process it
    • – Loop around for the next record
    • – Postprocessing
    • – Add Data at the end of the output file after the processed data if the ‘after’ keyword has been specified.
    • – Create a Fip filename
    • This can be the standard one or can be replaced if the ‘name’ keyword is specified.

    – Send the file spool/2go for ‘ipwheel’ to distribute usually via ‘ipxchg’.

    That’s it !

    —-

    Syntax of Each Keyword

      Syntax for ‘filtyp’

      Syntax:

      	filtyp: (type).

      Where type is

      text

      – Ordinary text file with each record having a defined separator.

      fixed

      – fixed record sizes

      variable

      – variable record sizes

      If the filtyp:f or v, you will need to specify the size (or for variable maximum size) of each record.

      For most applications, filtyp:textmeans you have to also define the record separator, ‘recsep’ too.

      Syntax for ‘recsep’ and ‘fldsep’

      Syntax:

      	recsep: (FipSeq string)
      	fldsep: (FipSeq string)
      eg:		recsep:\036

      Normally the separator will be \n or \r\n for NewLine or Carriage NewLine.

      Note that if you just put ‘\n’, ‘ipformat’ will automatically take any combination of CR and NL.

      Syntax for ‘reckey’ and ‘fldkey’ – Define the record or field key

      This defines the type and size of either the record or the field key.

      Normally keys are positioned at the beginning of the record/field but optionally these can be at the end or at an offset from the beginning or end.

      Syntax:

      	reckey: (length) : (type) : (posn) : (EndChr) : (delY/N)
      	fldkey: (length) : (type) : (posn) : (EndChr) : (delY/N)

      where

      • length or size of key – This can be 0 for any length
      • type (optional) can be
      • a-alphabetic
      • u-uppercase
      • l-lowercase
      • n-number
      • p-printable
      • s-space(or tab, CR, NL or FF)
      • b-binary (ie anything)
      • x-alphanumeric
      • c-control (ie < 040 or >= 0177),
      • z-anpa hdr field (ie alnum plus non-quad/format punct.
      • t-punctuation
      • posn (optional) is the offset on the key from the start of the zone If negative count from the end of the zone
      • endchr (optional) is a single chr which terminates the key

      The separator can be any punctuation chr as long as the same chr is used for each field. eg the following two are equal :

      	fldkey  3,n,,|
      	fldkey  3|n||\174

      Ie the end Chr is a pipe but as that is used as a separator, use the octal value

      What is a key ? The key is used really as a TYPE of PROCESSING flag for the output section.

      It can be a unique record key – such as a stock code – but if you have several thousand, it is going to be unwieldy specifying all of them.

      So generally we are trying to classify records into general types. For example of a text file containing schools results like :

      	School	Pinky High
      	Head	James Pinky
      	Pupil	Ramsay		Macdonald	31.6
      	Pupil	Thatcher	Margret		77.3
      	Pupil	U-Dones		Helen		99.3

      We can use the first field, which is alpha and variable length followed by a space.

      	reckey:0:a

      This can be signalled in the output section as :

      	r="school"	(do the school bit)
      	r="head"	(do the head bit)
      	r="pupil"	(do the pupil bit)

      Syntax for ‘recsiz’, ‘fldsiz’ – Define the length of fixed size records & fields

      Syntax :

      	recsiz: (length)
      	fldsiz: (length)

      where length is the size of key

      Syntax for ‘keycasesens’ – Field and Record keys can be upper AND lower

      Normally all keys – record or field – are considered to be case insensitive. So a key r = ‘aaa’ will pick up both AAA and aaa.

      Use this command to force the difference.

      Syntax :

      	keycasesens:yes

      Syntax for ‘number’ – Change the number system for specifying non-printable chrs

      When a non-printable chr is specified in the form ‘\012’ the ‘number’ system can be changed to decimal or hex.

      The default number system is octal.

      Syntax:

      		number:dec
      	or	number:hex
      	or	number:oct

      The change takes effect for all lines in the parameter file lower down until changed again by another ‘number’ keyword (Why you would specify different number systems for different parts, I have no idea).

      So a New Line chr will be

      		\012		octal
      		\010		decimal
      		\0a		hex

      Syntax for ‘wild’ – Allow wild card strings using a particular chr

      Syntax :

      	wild: (Chr to use to signify a wild string)
      	wild:*

      This allows a wild string to be used when specifying record keys.

      Note there is NO automatic wild string chr – you always have to specify it.

      For example:

      	wild:$

      Allows us to specify in the output section :

      	r=$	"This is done for all records" 

      Syntax for ‘wchr’ – Allow a wild card character using a particular chr

      Syntax :

      	wchr: (Chr to use to signify a wild character)
      	wchr:?

      This allows a wild character to be used when specifying record keys.

      Note there is NO automatic wild chr – you always have to specify it.

      For example :

      	wchr:?

      Allows us to specify in the output section :

      	r="abc?e"	"This is done for all records abc(something)e" 

      Syntax for ‘stripeol’ – Do NOT strip multiple blank records

      Syntax :

      	stripeol:no

      Where the ‘recsep’ is some combination of CR and NL, normally blank lines or multiple occurances of CR and NL are stripped.

      This command is used to turn that option OFF and to treat all lines as valid records, even ones with no data.

      Syntax for ‘startkey’ – Force the record Key or type of the BEFORE-FIRST record

      Syntax :

      	startkey:yes

      This forces the key BEFORE the first record to be ‘x1594’ which can be used in the ‘ifprv’ test – if previous key.

      Syntax for ‘set’ – Short forms or format names

      Syntax :

      	set	(name)	(any fixed text) 
      	set	pagehdr	Stock<t>Close<tr>High<tr>Low<tr>Prev<qr>#\n

      Set lets us specify easy-to-remember names to reference strings

      Sets can NOT be split over several lines.

      All leading and trailing spaces are stripped. So use wither double quotes to embed or the ‘\s’ escape string. To specify a double quote, use the octal string.

      FipSeq strings are useable but note that ‘set’ are parsed when the parameter file is chnaged/created so that any Variable data will be of that time and any FIP hdr data meaningless.

      	eg	set	timedate	\$d-\$m-\$y \$h:\$n

      will produce the date and time of when the parameter file was last changed.

      To get run time date/time, specify the same string in the output section record line NOT the ‘set’

      The name of the set – timedate in our example – is case-INsensitive. So it may be called as TIMEDATE or !TimeDate or any other variation known to man.

      Syntax for ‘include’ – Include another file of instructions

      Syntax :

      	include	(filename)
      	include	quark.tags

      This will include another text file in tables/form/text with more Data Formatting commands.

      The filename is force UPPERCASE as normal so in the above case the file will be :

      	/fip/tables/form/text/QUARK.TAGS

      Generally include files will have commands such as ‘set’, ‘calc’ which are common to a series of Data Formats – like a quark styles for example.

      Note that if you update the include file, ipformat will NOT rebuild its binary so you need to ‘touch’ all the other text files in form/text to get the new version. Normally of course ipformat realises a change has been made to the main text file and rebuilds its binary automatically. :

      	ipt form/text
      	touch *

    • Syntax for ‘calc’ – Define a Calculation or Choose output style for a number
    • Syntax :

      	calc:(name)	(The calculation) 
      	calc:percent	100*(c1/c2)

      Define a calculation where cX are used as variables. The ‘name’ is that used in the ‘output’ section.

      Calculations can NOT be split over several lines.

      Variables are loaded at run-time using savnum, savbyte, savint, savswint, savlong or savswlong.

      The default precision for a calculation is 2 decimal places. This can be overridden using the syntax :

      	calc:percent:0	100*(c1/c2)

      where the ‘0’ after the name is the number of decimal places in the range 0-6.

      Calc can also be used to change the output format of a number by specifying the precision. If the raw data is in 4 dec places and you only want 2:

      	calc:dec2:2	c22

      and in the output section, load c22 from data – in this case field 3 :

      	savnum=22 f3

      Operators can be :
      * + plus
      * – minus
      * * multiply
      * / divide

      Take care when dividing by zero !

      Use round brackets to denote how the calculation should be worked out. Ie the deepest level is calculated first, and then the calculation is worked from left to right.

      For example :

      	(c1/c2)*(c3/(c5+c4))/100
      will add run through the following order :
      	Step 1	(c5+c4)
      	Step 2	(c1/c2)
      	Step 3	(c3/result of Step 1)
      	Step 4	(result of Step 2 * result of Step 3)
      	Step 5	(result of Step 4 / 100)

      Syntax for ‘fraction’ – Define a Fraction

      Syntax :

      	fraction:(name):(precision) (Output style WITH & WITHOUT fraction)

      where
      * name is the function to be used in the output
      * precision is smallest denomiator – 2, 4, 8, 16, 32, 64, 128 or 256 – default is eighths

      • + is used as separator – which can be any punctuation
      • first style is parse string used if THERE IS A FRACTION and integer
      • + is used as separator
      • second style is parse string used if THERE IS ONLY AN INTEGER and NO FRACTION
      • + is used as separator
        		fraction:star:16	+\ZI \ZD/\ZN+\ZI.0+
        		fraction:stox:32	|\ZI (\ZD-\ZN)|\ZI|

        Fraction takes a field, partial field, saved field or calculation and split into 3 portions which can then be used as the normal FipSeq.

      • ZS is the sign + or –
      • ZI is the integer
      • ZD is the fraction amount
      • ZN is the fraction denominator

      Specify two styles – the first for if there is a non-zero fraction amount and the second if there is none.

      Syntax for ‘base’ – Define a Base number

      Syntax :

      	base:(name):(precision) (Output style WITH & WITHOUT fraction) 

      Base is exactly the same as fraction except base will NOT attempt to ‘reduce’ the fraction.

      • ie 2/4 is left as ZD=2, ZN=4 while fraction will force ZD=1 ZN=2

      Syntax for ‘date’ – Generate a date and/or time from a number

      Syntax :

      	date:(name):(data order)	(Output style)
      	date:mono:dm	 +Last \ZW was \ZD-\ZN-\ZZ+

      where mono is the name which is used in the output section

      The Order of the raw data is important. So we divide it up into a series of 0 or more 2-digit numbers (or space digit) whose order is specified using :

      • d – day — 1-31
      • m – month — 1-12
      • y – year — (last 2 digits only, must be after 1970)
      • c – century — 19 or 20 only
      • h – hour — 0-23
      • t – minute — 0-59
      • x – padding — ignore 1 or 2 numbers at this position. Only 1 padding is allowed.

      So for the 21st March 2010, the incoming data must be :

      if it is : 21032010

      data order is : dmcy

      if it is : 210310

      data order is : dmy

      if it is : 032110

      data order is : mdy

      Obviously only one format of data can be handled by a single ‘date’ but you can have as many ‘date’s as needed.

      Note that spaces, letters and punctuation are stripped and/or used as field delimitors.

      So the following are all equivalent :
      * 200896
      * 20th 8-August, 96
      * This day 20th August (the 8th month) in the Year 96

      The Output format is defined in normal FipSeq between two deliminators (‘+’ in our example above, but can be any punctuation except semicolon ‘;’).

      The new data is added to a series of extra FipHdr fields :

      • ZD – 2 digit day of month
      • ZM – 2 digit month
      • ZY – 2 digit year e.g. 92
      • ZZ – 4 digit year e.g. 1992
      • ZW – Day of week e.g. Monday, Tuesday etc
      • ZS – 3 character day of week e.g. Mon, Tue etc
      • ZN – Full name of month e.g. January, February
      • ZL – 3 character month e.g. Jan, Feb, Mar etc
      • ZJ – Julian day of year
      • ZH – Hour 00-23
      • ZI – Hour 00-12
      • ZT – Minute 00-59
      • ZP – am/pm
      • ZA – Week Number 00-53
      • ZB – Week Number 01-53
      • ZC – Day of the week 0-6

      (see also manual page for ‘strftime’ for slightly more information)

      Note that actual Day and Month names depend on the LOCALE of your shell/computer.

      The Default output format, if none is specfied, is :

      	+\ZW, \ZD \ZN \ZZ+

      Which for (order dmcy), (data) 16111997 English LOCALE gives

      	Sunday, 16 November 1997

      Note that if any information is NOT supplied, the run time date/time is used.

      Syntax for ‘partial’ – Further subdivide a record or field

      Syntax :

      	partial:(name)	(type):(length):(startchr):(endchr)

      where

      • type can be
      • a : alphabetic
      • u : upper case alpha
      • l : lower case alpha
      • n : numeric
      • t : punctuation
      • p : printable (generally Spc to~ })
      • s : space ( tab, space, ff, cr, nl, lf)
      • z : anpa type string (alphanumeric plus hyphen)
      • b : binary
      • length can be zero for unlimited.
      • startchr and endchr are optional.

      There can be up to 100 partial fields as required.

      The contents of a Partial field are accessed by specifying pX where X is the sequential no from the start of the partial – ie first is p1, then p2 etc

      Syntax to partial a field in the output section record line is

      	(partial name) (field name)

      and then you can use any partial fields. For example:
      * To pull apart a data in the form : 26th March 1997

      		; comment		26  th		March	1997 
      		partial:pdate	s:0 n:2 a:0 s:0 a:0 s:0 n:4 

      In the output section, assuming a field 3 of record type 99 contains the date, we can out put just the year by :

      		r=99	pdate f3	p7

      will produce “1997”

      Syntax for ‘match’ – search and replace on a single field

      Syntax :

      	match:(matchname)	/(search string)/(replacement)
      	match:(matchname):c	/(search string)/(replacement)

      where

      • matchname is a unique alphabetic name
      • c (optional) is for case-SENSITIVE searches
      • the delimiters – ‘/’ in the above example – can be any punctuation as long as they are are the same for that match line

      This is a localized search and replace or search and zap function which is applied ONLY to a record, field, partial field or save area.

      Normally the search is case-INSENSITIVE but can be forced so with the ‘:c’ after the matchname.

      No wild chr or wild strings are permissable at present.

      To specify in the output section :

      		(matchname) (zone)

      where

      • matchname is the SAME as specified in the ‘match’
      • zone is the record, field, partial or save area.

      Up to thirty matches can be specified for a single zone.

      For example :

      	match:jan	/1/January/ 
      	match:feb	/2/February/ 
      	match:mar	 /3/March/ 
      	etc .. etc 
      	match:dec	/12/December/

      In the output section – let’s pretend field 4 of record type 23 contains a number we want to translate to a month :

      	r=23	dec nov oct sep aug jul jun may apr mar feb jan f4 

      Note that dec nov and oct are done first else if ‘f4’ was “11” then the jan match will replace “11” with “JanJan” !

      ‘match’ complements the character xchg in IPXCHG. However they are slightly different in that ‘match’ is localised in that you apply it ONLY to a single field whereas IPXCHG works on a whole file (or using flags, to a selected paragraph, line or section of the file).

      Syntax for ‘style’ – Reformat a data zone

      Syntax :

      	 style: (name)	(a single printf conversion syntax)
      		  style:twonum	 %.02d
      • – Uses printf which is nasty but a standard (of sorts). Do a man printf for fuller information if you need.
      • – Always starts ‘%’
      • – If the expression does not end with an ‘s’ (‘d’ for integer for example), then the string in the header field is first converted to that type.
      • – Specify One and ONLY one expression (can not have %s%d%f) – as it takes the first only
      • – do NOT use for fixed data, ONLY the conversion string.
      • – Types are :
      • string : s
      • char : c
      • long : d,i,o,p,u,x,X
      • float : f,e,E,g,G
      • % : print a % !
      • type n is ignored ??

      Examples

      – to trim a string, use a dot

      : %.5s

      – To pad a string with spaces

      : %5s

      – To pad a string with spaces (left justified)

      : %-5s

      – To pad a number with leading zeros

      : %.06d

      Syntax for ‘name’ – Overwrite the default name of the output file

      Syntax :

      	 name: (Fip Hdr strings)

      Remember that the ‘name’ is a list of Fip Hdr fields which will probably include the SN field which is the original name.

      The default name on output is :

      	#SN:(original filename)#DU:(name of the paramfile)s#SC:FORM#DF:FORM

      Where DU is forced lowercase.

      The ‘hdr’ and ‘name’ keywords have the same syntax and roughly the same use so further information is in under ‘hdr’.

      Remember ‘name’ is done AFTER all processing of the data – it is the last thing done before the file is sent on for further distribution. So information gleaned from the input, perhaps left in Save Areas, can be used in the name. ‘hdr’ however is done FIRST, BEFORE any data is touched.

      Syntax for ‘hdr’ – Add extra fields to the FIP header of the output file

      Syntax :

      	 hdr: (Fip Hdr strings)

      Remember that although the specification MUST be kept on a single line, HASH may be used as a delimitor for Header fields.

      Note also that as the output file is a completly NEW file and has no physical connection to the input file, NO Fip Hdr fields are transfered from input to output UNLESS specified in the ‘hdr’ or ‘name’.

      Generally use ‘hdr’ to preserve fields you need as for ‘name’ there is a limit to the number of characters – and the type of characters : no meta characters or slashes ‘/’etc – such as the Source Header for example :

      	hdr:#SH:\SH#SN:\SN#DF:roger

      will transfer the SH and SN fields and force DF to roger.

      As ‘hdr’ is processed BEFORE the data, no information generated by IPFORMAT during processing is available. However the ‘name’ keyword is processed AFTER so such data may be added then.

      The default Fip Hdr put on an output file has the following fields :

      SU:form

      – ie Source is form

      HS:form_0_95-3-9_17:41:40_4_67

      – for tracking the file

      HT:794792500

      – date and time !!

      ‘hdr’ supplements but does not replace these.

      Other useful fields can be :

      • DF – output format for ipedsys, ipgtwy, ipout, ipprint etc. This must be in the ‘name’ parameter as by default it contains ‘DF:form’ eg :
        	DF:albert

        will pickup tables/print/ALBERT if ipprint is the program sending to the final destination.

      • CX – force the xchg name to be used to the contexts of this field. eg
        	CX:STOCKS

        will override the SC2DC fields normally used for ‘ipxchg’.

      Syntax for ‘nohdr’ – do NOT add a Fip header to the output file

      Syntax :

      	nohdr:

      This is used when the output file is to be used immediately by a Unix program which does not understand the Fip Hdr – sort for example.

      Syntax for ‘chrset’ – force the SC or Source Character set

      Syntax :

      	chrset: (name)

      This fills in the SC: Fip Hdr field. The default is FORM.

      Syntax for ‘before’ – Insert data BEFORE any output from the input file.

      Syntax :

      	before: (Fip Seq strings and Record processing commands)

      Syntax for ‘after’ – Insert data AFTER ALL output.

      Syntax :

      	after: (Fip Seq strings and Record processing commands)

      Both ‘before’ and ‘after’ can have tests, builtins, contents of save areas etc although obviously for ‘before’, most of these may have nothing in.

    In the Output section – Record processing lines

    The actual data is processed and output using the Record processing lines.

    The Syntax of each line is that the first bit specifies which record type or key the rest of the line applies to :

    		r=(key)		(output)
    For example :
    	r=abc	"Fried fish starts " s1 spc f4 " and the rest ..."

    There can be multiple lines for the same record type. The following two lines will give the same result as the one above :

    		r=abc	"Fried fish starts " s1 
    		r=abc	spc f4 " and the rest ..."

    For lines where you want to process for all EXCEPT a particular record type/key, use teh syntax ‘r#’ :

    		r#35	"Nobody wants record 35s !"

    How to specify you want to use, format and/or output zones ?

    records

    r3

    or r=”abc” if not numeric

    fields

    f99

    or f=Z if not numeric

    partial fields

    p22

    – always numeric

    save zones

    s1

    – always numeric

    flags

    x199

    – always numeric

    calculations

    c4

    – always numeric

    counters

    z4

    – always numeric

    blocks

    b77

    – always numeric

    set name

    specify name as in the ‘set’

    fixed text

    ” some fixed text ”

    A ‘*’ can be used in certain cases to signify ‘ALL’ zones ie

    clrflag=*

    clear all flags.

    f*

    output all fields from this record.

    Use double quotes for alphabetic keys and those with embedded spaces.

    You should try not to use ‘sets’, ‘partial’s or ‘match’s with names in the form ‘z999’ where z is one of the single letters above and 999 is a number in the rangle 1-999.

    Note that blocks are ‘super records’ but should be ignored for now.

    Note that case is IGNORED in keys in the current version .

    When ‘ipformat’ finds a name in the record processing line, it does the following sequence :

    • – check to see if it is an already specified constant ‘set’ name.
    • – if not, is it a zone – record, field, partial or block (eg p44, f3)
    • – if not, is it a builtin command (see below)
    • – if not, is it a save zone, flag or counter (eg s1, x33, c7)
    • – otherwise it is considered some FipSeq string and saved as such.

    Spaces, End-Of-Lines and Double Quotes in the output

    One common failing when putting together a new parameter file is to completely forget about spaces (or other separators) and end-of-lines (CR or NL or CR NL or whatever) in the output file.

    The point is – you have to specify them as NOTHING is implicit in the output file. There is no hidden magic which suddenly realises that you want an end-of-line when you need it. You have to state where and when you want them.

    Generally this will be done by either putting them as constant/’set’s or specifying them in the record processing line. The following are exactly the same :
    Either

    	set	spc	\s
    	set	ql	\n
    	output:
    	r="BIG"		f5 spc f3 spc f99 spc f5 ql
    Or
    	output:
    	r="BIG"		f5 \s f3 " " p99 \s f5 \n

    As you can specify a space as either ‘\s’ or in double quotes, to output a double quote character, you need to specify it as an number : \000.

    Builtins

    There are a number of builtin conversion routines for formatting zones – records, fields, save areas etc.

    These are called by placing the name of the conversion BEFORE the name of the zone eg :

    		zapspcextra p5
    

    which means :

    • ‘zap all leading, trailing and multiple spaces from partial field 5’

    A single zone can be subject to several builtins :

    zappunc zapspc caps f=Z
    

    which means :
    * take field “Z” and zap all punctuation and zap all spaces and force uppercase before outputting.

    Builtins for case conversion :
    caps force zone uppercase
    lwrcase force zone lowercase
    idicase force zone idiot upper and lowercase
    upper1 force first letter of every word uppercase
    initial only display first letter of each word followed by a full stop
    Builtins for removing spaces:
    zapspc remove all spaces from zone
    zapspcextra remove all leading, trailing and multiple spaces from zone
    zapspclead remove all leading spaces from zone
    zapspctrail remove all trailing spaces from zone
    Builtins for removing punctuation:
    zappunc remove all punctuation from zone
    zappuncextra remove all leading, trailing and multiple punctuation from zone
    zappunclead remove all leading punctuation from zone
    zappunctrail remove all trailing punctuation from zone
    Builtins for Counters:
    setctr set a counter
    incctr add one to a counter
    decctr subtract one from a counter
    clrctr clear a counter or set it to zero
    Builtins for Calculations:
    savnum save a printable number in a variable
    savbyte save a single byte in a variable
    savint save a binary integer (2 bytes) in a variable
    savswint save a binary integer (2 bytes swapped) in a variable
    savlong save a binary long (4 bytes) in a variable
    savswlong save a binary long (4 bytes swapped) in a variable
    Miscellaneous:
    strlen returns the length of the string which can be output or saved or tested
    zapleadzero removes leading zeros from zone
    zapctl remove all control characters from zone
    incfile include standing file at this point

    r=99	incfile /home/standing/ s4
    newfile finish this file, send it and start another

    r=abc	newfile

    if any more information is specified AFTER the ‘newfile’ on the record processing line, it will be added to the FIP Hdr unless ‘nohdr’ has been specified. eg:

    r=abd	newfile	#DF:newform#QQ:\$Z
    log log message in the Item Log
    continue ignore all other tests for this record and continue with the next data record
    stop! stop processing now. If there is an ‘after’ section it is done before the program finishes. (please note the exclamation mark !)
    reckey output the actual record key. This is useful where wild cards are used for all records but you still need to output what the key was.

    Tests

    There is a further selection of tests which can be made one zones inside the date.

    These enable you to select even finer some processing depending on actual data. If and ONLY if the test is true is the rest of the line continued with.

    Syntax for Tests

    	(ifxxx) (first string) (second string if required) 

    where strings can be fields, partials, saves or fixed text

    Actual tests can be :
    ifprv/ifnprv – test previous record type/key or not
    ifeq/ifne – test if 2 zones are equal or not
    ifgt/iflt – test if a zone is greater than another or not
    ifflag/ifnflag – test if a flag is ON or OFF
    ifnul/ifnnul – test if a zone is empty or not
    ifspc/ifnspc – test if a zone only contains spaces or not
    ifalpha – test if a zone only contains letters a-Z or not
    ifnum – test if a zone only contains number/digits 0-9 or not
    ifcon/ifncon – test if a string is (not) found within another
    ifpunct – test if a zone only contains punctuation or not

    Note that sequence is important for comparing two fields that may be different lengths as ifeq will be true if the first field is complete ie :

    		1st=AAA		2nd=AAABC	will be true 
    		1st=AAABC	2nd=AAA		will be false 

    Example 1 :

    	r=24	ifprv r=35	"Last record was type 35 and this is 24"

    Only if the previous record type was “35” will the string be output

    Example 2 :

    	r=24	 f3 ifnul f3 " _ " x99

    For record type 24, output field 3 and if there was nothing in it, output a (spc) (dash) (spc). Flag 99 will also be set if there was nothing there.

    Example 3 : When using numeric data, please ensure that all extraneous characters are stripped from the zone before the test. In particular strip commas, plus signs, currency symbols etc.
    For example, if field 7 has data like p9300.0007 and save field 9 has 10,000 compare the two by :

    	match:mnop	/p//
    	match:mnocomma	/,//
    	output:
    	r=99	ifgt mnop mnocomma f7 mnop mnocomma s9		"Field 7 >  Save 9"

    Using Flags

    Flags are a really useful means for deciding type of processing to do – or NOT to do.

    Commands for setting, clearing and testing flags are
    To set a flag x999 where 999 is the flag number
    To clear a flag clrflag=999
    To clear all flags clrflag=*
    To test a flag is ON ifflag x3 (rest of the commands on line are done ONLY if true)
    To test a flag is OFF ifnflag x5 (rest of the commands on line are done ONLY if false)

    For example, let’s use flag 3 to test if record type ‘abc’ has Richard, Helen or George in the first field. Print out ‘New name is (name) (newline)’ if it does :

    	r=abc	clrflag=3 
    	r=abc	ifeq "Richard" f1	x3 
    	r=abc	ifeq "Helen" f1		x3 
    	r=abc	ifeq "George" f1	x3 
    	r=abc	ifflag x3		"New name is " f1 nl 

    Save areas

    Save areas may be used to store strings – either in their original state or after conversion/formatting by other built-ins. The maximum save number is 299.

    Commands for setting, clearing and testing save areas are :

    To output a save area

    : s299 — where 299 is the save number

    To clear a save area

    : clrsave=299

    To clear all saves

    : clrsave=*

    To save data in a save area

    : save=299 (string)

    	eg	save=1	f3 
    		save=5	caps f7
    save the contents of field 7 in save area 5 AFTER forcing to Uppercase

    To append data to a save area

    : savcat=88 (string)

    	eg	save=77 "ABC"
    save zone 77 holds ABC

    		savcat=77 "DEF"

    save zone 77 now holds ABCDEF

    Save areas may be used in the normal ‘if’ tests, eg :

    	ifeq "AAA" s1	x88 
    if the contents of save area 1 starts “AAA., set flag 88 ON

    Using Counters

    Counters are integers (ie proper numbers with no decimals or fractions in the range -32000 to +32000.

    They are signalled by ‘zX’ where X is a number.

    They can be used to count the number of occurences of a record or field or even types of data and act accordingly.

    All counters are set to zero when the program starts and by using the builtins :

    • incctr
    • decctr
    • setctr
  • clrctr
    in the Record processing lines, you can manipulate them.

    For example, to add some random markup every 10th line of a record type AB using counter 26 :

    	r="AB"	incctr=26	ifeq 10 z26	clrctr=26	"[pt9][font99]"

    ie : For all records type AB, add 1 to counter 26, then test if ctr 26 is equal to 10; if so reset ctr 26 back to zero and output string ‘[pt9][font99]’.


    The syntax for ‘setctr’ is

    setctr=99 345

    – set ctr 99 to a fixed number 345

    setctr=297 p3

    – set ctr 297 to the contents of partial field 3.

    In the second example, if the p3 is NOT a number, ctr 297 is set to zero. Also if p3 is a decimal number like ‘123.456’, only the main number is saved.

    Using Calculations

    Calculations are defined in the first part of the parameter file and used in the record processing part :

    For example :

    	calc:mktcap	c1*c2
    	output:
    	r=BC	savnum=1 f5	savnum=2 f7	mktcap

    In this example we define ‘mktcap’ to be variables 1 and 2 multiplied together. Then in the output section, for record type BC. field 5 is saved in variable 1 and field 7 in variable 2 before we do the calculation and output the result.

    A quick word about BINARY numbers.

    Normally fields will hold printable data – such as in the example above – and we use the builtin ‘savnum’ to take that number for use in the calculation(s).

    However some data is already in a binary form. Use builtins ‘savbyte, savint, savswint, savlong and savswlong’ to load these numbers. Often these will be derived from a partial field using the ‘b’ for binary field type. eg:

    	partial:bindata	b:2 b:4 b:2 b:4

    What is a swapped integer or long ? Some computers – like the PDP-11 and most Intel 16+ bit chips – hold the data in reverse byte mode.

    – So if the data has been generated on a SPARC OR rs6000 or a Mac the data is ‘normal’ – use savint or savlong.

    – While data from PDP-11s or Intel based PCs could well need to be swapped.

    Loading Variables :
    * Save a printable zone as a number variable – use ‘savnum’

    savnum=5 p4

    – save the contents of p4 as a number. So if p4 held the string ‘789’, c5 would be the number ‘789.
    * Save a fixed number in a number variable – use ‘savnum’ again

    savnum=7 1234

    – loads the number ‘1234’ into c7.
    * Save the contents of a single byte – use ‘savbyte’

    savbyte=33	p7

    Note that the contents of the variables, c1, c2 etc are not amended by the calculation UNLESS you specifically save it, eg :

    	r=BC	savnum=1 f5	savnum=2 f7	savnum=3 mktcap

    will load c3 with the result of the ‘mktcap’ calculation.

    Examples of Builtins :
    STRLEN

    	; test the field 2 is greater than 44 chrs (ie 44 is less than strlen of f2)
    	r=HH	 ifnnul f2		 iflt 44 strlen f2		 "Big Field 2 here over 44 chrs long" \n
    	r=KK	 "Save Field for Name (s55) is " strlen s55 " chrs long"
    

    ZAPLEADZERO

    	; data - field 99 is 00000330303, field 101 is 00000000.00
    	r=3	  "This outputs 330303=" zapleadzero f99 ", while this is
    0.00=" zapleadzero f101
    

    Putting it all together – some examples

    EXAMPLE ONE

    ; file is variable text type
    filtyp:t
    ; each record is separated by CR NL 2 letter type
    recsep  \r\n
    ; There are NO fldsep - we will use partials
    ; There are NO reckey or fldkey - we will test strings for the type of processing
    ; allow wild cards
    wild:*
    ; 
    set	  qc		\004\n
    set	  topbit  \n{M2Processing Date :
    set	dash	" _ "
    
    ;Partial a Class line which contains the Class/Name/Length of race
    ; eg : Class 2 - ATV Anniversary Hcp. - 1000 M
    partial:pclass  p:0::\s s:0 n:0  s:0 t:1 p:0::- t:1 p:0
    
    ; localised matchs - search and replace
    match:mhcp	/(Hcp.)//
    match:mhcp2	/Hcp.//
    ; replace M with meters
    match:mmeters	?M?meters?
    ;
    ;******************** output section ***********************8
    output:
    ; Start by clearing flags 99 and 1 for each input record...
    r=*	clrflag=99 clrflag=1
    
    ; Now test for ONLY those lines which match our needs...
    ;| all  |if field1 start|partial field1 |if partial	 |set	 |set flag 99 on
    ;| recs |with Class	  |according using|field 3 is not|flag 1 |too
    ;|		|			 |pclass	 |empty	 |one	 |
    r=*	  ifeq "Class" f1 pclass f1		 ifnnul p3		 x1		x99
    
    ; Print out only the names of a new race - only process if flag 1 is ON
    ; Use flag x101 to output [rf3] for the FIRST race only - which is the 1st class
    r=*	  ifflag x1		 ifnflag x101	 [rf3]	x101
    ; partial f1 again using pclass, if partial field 6 is NOT empty, remove extra
    ;	spaces, Do the two search and Replaces and output followed by a 004 NL
    r=*	  ifflag x1		 pclass f1		 ifnnul p6 zapspcextra mhcp mhcp2 p6 qc
    ; remove extra spaces from partial 1 and output it, output partial 2 and 3, then
    ;	if partial field 8 is NOT empty, add (spc) (dash) (spc) etc
    r=*	  ifflag x1		 zapspcextra p1 p2 p3 ifnnul p8 dash mmeters zapspc p8
    r=*	  ifflag x1		 qc
    

    FipSeq

    Many keywords in the DF module can have variables as well as fixed text for parameters.

    These ar generically called FipSeq strings and can be :

    		- Normal Ascii printable text : remember that leading and trailing spaces
    		are always trimmed so use double quotes to embed :
    		"	Some leading spaces and some trailing	 "
    		Also in the record specification ALL spaces between fields are
    		stripped; again use double quotes to embed or Unix escape chr \s
    		- Unix style escape chrs : backslash then lowercase chr :
    		Carriage return	CR  : \r
    		New Line	NL  : \n
    		Space		SPC : \s
    		Backspace	BS  : \b
    		Tab		TAB : \t
    		Backslash		 : \\
    		Form feed or Vertical Tab  FF or VT : \f
    		Wild chr (if specified) : \w
    		Hexadecimal number  :\x99
    		CR NL			 : \l
    		- Octal numbers : backslash and 3 digits zero padded : \001, \377
    		These can be decimal or hex by using the 'number:' keyword.
    		- Internal FIP header fields : backslash and 2 uppercase chrs :\SN, \DQ
    		to extract fields from the Source Header ( Fip field SH) use \X?
    		ie \XP for Priority.
    		- System variables :
    		\$D : day of month in 99 format
    		\$M : month in xxx format
    		\$I : month in 99 format
    		\$Y : year in 99 format
    		\$H : hour (99)
    		\$N : min (99)
    		\$B : sec (99)
    		\$J : julian date (3digits, Jan1 is 001)
    		\$S : 3 digit ascending sequence number
    		\$Z : 4 digit ascending sequence number
    		\$A : atex orig field (SOURCE;06/06,14:35)
    		\$C : number of chrs in file
    		\$W : number of words in file (IP_WORD_LEN)
    		\$R : Random letter
    		\$O : end optional text
    		\$X : strip trailing spaces of buffer so far
    
    Fip Header fields can be further manipulated using pseudo-fields :
    	fixed: QZ		 1234543
    	partial:QT		ST,3,2,U,<,>
    	combie:QZ		 ep|na,(000000)a
    	option:QT		 ep,11,7,s
    
    For fixed fields : 
    	fixed: QZ		 1234543 
    	ie If QZ is specified, replace with 1234543 
    	Syntax  fixed: [newfield]		 [tab/space]	  [fixed text] 
     
    For partial fields. An example : 
    	partial:QT		ST,3,2,U,<,> 
    	ie If QT, take ST header field posn 3 for 2 chrs, UPPERCASE. 
    	Syntax  partial: [newfield]	  [tab/space] 
    		[existing field] [comma] [startposn] 
    		[opt comma length] 
    		[opt comma processing] 
    		[opt comma start chr] 
    		[opt comma end chr] 
    	where : Start and Length start from 1 not 0. 
    	 Length can be zero or not defined for all characters in the field 
    	 Processing is U-uppercase, L-lower, N allow only numbers, P-printables 
    	 The Start Chr can be used to start the string. If there is also a 
    	 length then this length is FROM the Start Chr. 
    	 The End Chr can be used to end the string when it is undefinite length.
     
    For combinations : 
    	combie:QZ		 ep|na,(0000000)a 
    	ie Use EP header field, if not there use NA field, if not use the 
    		fixed text '(0000000)a'. 
    	Syntax  combie: [newfield]		[tab/space] 
    		[existing field1] [|] [existing field2] 
    		[opt comma] [opt default fixed text] 
     
    For optional fields (used in conjuction with the \$O flag): 
    	option:QT		 ep,11,7,s 
    	ie If EP header field exists and has a space in the 7th position, 
    		send this text else strip text until the \$O flag. 
    	Syntax  option: [newfield]		[tab/space] 
    		[newfield] [?] [existing field] [comma] [size] 
    		[opt comma] [opt posn of test chr] 
    		[opt comma] [opt posn to send remainder of fld] 
    	where size is minimum size of field. 
    	The send parameter will send contents of the field from that position 
    	onwards. If not present, the field is used ONLY as a test and NOT 
    	to send chrs.  Note that both size and test are start from 1 not 0. 
    	A single chr can be tested to be non-space as in the example above. 
    	If either the size or the test is FALSE, all text and sebsequent data 
    	whether fixed or variable (including more Optionals) is ignored until 
    	the EndOpt flag is met - '\$O' (see below). 
    
    

    Watch out using FipSeq strings in ‘set’s

    Note that ‘set’ are parsed when the parameter file is chnaged/created so that any Variable data will be of that time and any FIP hdr data meaningless. eg:

    	set	timedate	\$d-\$m-\$y \$h:\$n
    

    will produce the date and time of when the parameter file was last changed.

    However FipSeq variables specified in record output lines will return run-time data eg :

    	r=*	"And now the date and time : \$d-\$m-\$y \$h:\$n"
    

    will produce date and time when that record was processed.

    Importance of your LOCALE

    Unix allows you to play around with character sets – called Locales – and this can have repercussions for the data formatting module.

    These are defined as part of the ENVIRONMENT.

    • look at the man pages for ‘setlocale’
    • check you own settings with ‘env | more’
      for LANG, LOCPATH etc.

    For any non-English environment, it is important to define :

    • What is a Alphabetic chr ? – normally a-z, A-Z
    • remember all the accented characters
    • What is a control character ? – normally octal 0-037
    • sometimes these can also be octal 200-237.
    • What is punctuation ? – normally “,.!@#$%^&*()-_=+[]{};’:”<>
    • if you want to use ‘zappunc’, make sure.

    If you get it wrong, you may find that an accented chr you consider to be alphabetic is processed by ‘ipformat’ as a binary chr. So take care.

    Current Version

    Current version limits are

    	flags		- 300 allowed in the range 1-300
    	counters	- 300 allowed in the range 1-300
    	saves		- 300 allowed in the range 1-300
    	calculations	- 300 allowed in the range 1-300
    	partials	- 100 allowed in the range 1-100
    	matches		- 30 allowed for any one field
    	record length	- 64k maximum
    	fields in a record	- max 100
    	there can be up to 1000 'set's and other constants
    There is also an internal buffer size for the size of the binary of the parameter file which is 16k - however most binaries are under 1k and the biggest seem so far is about 5k.
    	keysize must be less than < 20 chrs
    	ipformat misses 1st record if the initial sep is missing
    	split keys are not allowed
    	all keys are case INSENSITIVE
    

    Save areas must be less than 16K each. In versions from 040, the program should handle many changes, additions etc. However if you do use buffers which are TOO big, an error message to the fact is logged in the Item Log and data MAY be ignored.

    Until modified, note that a clrsave=* will reset everything.

    POSTSCRIPT DRIVER - ipsetter

    Please see [[Ipsetter]]

    ------------------------------------------------------------------------------

    PROGRAM DETAILS

    This section covers the following programs :

    • form
    • ipformd
    • ipformat
    • ipformbl
    • ipformch
    • ipfsel
    • ipfprep
    • ipfchk

    form

    Manual interface to the data formatting package

    Allowable commands are :
    x go into test mode
    l look at log
    m look at log - for 'form' items
    c check crontab - for items about to go
    - c all for ALL of root's crontab
    t look at the individual Parameter file in tables/text/text
    and show the contents
    p look at main form files in tables/form - PROCESS, SETTER, SETPAGE etc
    g go auto
    h help
    v version
    q quit

    -----------------------------------

    ipformd

    Please see informed

    This is the daemon for data formats.

    It uses a parameter file is used to route and process incoming files. This parameter defaults to tables/form/PROCESS.

    It first uses a selection table to decide what the job really is. As the list is top down, only the first valid selection is processed.

    The 'jobname' found is usually the name of a parameter file in tables/form/text.

    IPFORMD will automatically start IPFORMAT with parameters of the input file and the jobname/parameter file.

    However, optionally IPFORMD can be used to run a sequence of 'jobs' specified for a particular jobname.

    The syntax for the PROCESS file is :
    ; comment
    ; the following is a selection line
    (hdr field) = string [opt (tab) & (hdr) = string ...] (tab) >jobname (nl)
    job: (jobname) (program to run)
    trace: (jobname)
    test: (jobname)

    To describe the Selection syntax in detail :
    (hdr field) = string [opt (tab) & (hdr) = string ...] (tab) >jobname (nl)
    Each selection is on a single line. If necessary, multiple conditions
    can be specified with the '&' to 'and' them.
    The operation equal, '=', can also be NOT equal '!='.
    Source Header fields (in SH) are preceeded by X, ie XC for category.
    A '*' is used a wild card string; a '?' is single wild card chr. To
    search for a string/chr embedded somewhere in a field, uses a '*'
    before and after.
    If embedded spaces are needed in the string-to-be-searched, use an '*'.
    Note that the search string is case_insensitive.
    Both the selection file and the main file are scanned completely, so
    that one file may be sent to none, one or several destinations
    according to the same or different criteriae.

    For the 'job' parameter :
    - '$i' refers to the input file name (Note \$i is still the FIP
    System Variable 'month')
    - all queues and files are assumed to be under /fip/spool
    - Never assume however that the path environment has been setup, so we advise
    you specify full pathnames for the programs.
    - all 'job' lines MUST precede the selection - ie be above.
    - FIP System variables and Header fields can be accessed.
    - there can be one or many or very many job lines.
    - any program can be run
    - if a script/program returns an error, it is logged in the Item log and
    further processing stops.
    If a 'job' exists for a jobname, ipformd will NOT run ipformat but will run what is specified - which may be ipformat of course.

    The 'trace' parameter is used for setup, tuning and testing a new job. All it does is tell IPFORMD to log each line in the Item log. EG:
    trace:shares
    Trace MUST always be specified in the PROCESS file BEFORE the jobs (ie on a line nearer the top of the file) and all jobs must be before the selection lines.

    The 'test' parameter does the same as 'trace' BUT NONE of the programs are actually run. This allows you just to check syntax etc. EG:
    test:racecards
    Test MUST always be specified in the PROCESS file BEFORE the jobs (ie on a line nearer the top of the file) and all jobs must be before the selection lines.

    First examples - simple jobs :
    ; Sports copy to Tranmere Rovers FC.
    XC=S* & XK=*Football* >tranmere
    EP=TAR. >tarmac
    RD=*Broken_Hill* >bhp

    where TRANMERE, TARMAC and BHP are all various parameter files in tables/form/text

    Second example of a sequence of jobs. Here, if the file starts 'borsen', IPFORMAT is strted twice (sequencially, serially - ie NOT at the same time) using two parameter files in tables/form/text : WKENDSHARES and DAILYSHARES .

    ; shares : for both weekday and weekend
    job:shares /fip/bin/ipformat -p dailyshares -i $i
    job:shares /fip/bin/ipformat -p wkendshares -i $i

    ; Selection for job called 'shares'
    SN=borsen* > shares

    Third example shows how to sort, xchg and basically really screw around.

    ; Reformat, sort and generally destroy the horses ...
    job:geges /bin/rm -f formsave/NAGS*
    job:geges /fip/bin/ipformat -p geges -i $i -D -S NAGS
    job:geges /fip/bin/ipxchg -1 formsave/NAGS -D geges -F -o formsave
    job:geges /bin/sort +0 -3 -o formsave/NAGS.done formsave/NAGS
    job:geges /bin/mv formsave/NAGS.done 2go/#SN:\SN#DU:nagsdone

    ; Selection for job called 'geges'
    SU=RACEWIRE & SN=horse* > geges

    Fourth Example is where the 'test' parameter is added to the 'geges' job in the Third example (Remember the 'test' must be specified on a line at the top of the PROCESS file BEFORE any 'jobs' for the same jobname) :
    test:geges
    Going into the MUI, ip, and doing a 'l' to list the log (or 'm' to more) gives:
    Sat Mar 4 11:52:30 ipformd !i : Incoming File : geges : : geges
    Sat Mar 4 11:52:30 ipformd !f : Test/NotRun : /fip/bin/ipformat -p geges -i /fip/spool/form/geges -D -S NAGS
    Sat Mar 4 11:52:30 ipformd !f : Test/NotRun : /fip/bin/ipxchg -1 formsave/NAGS -D geges -F -o formsave
    Sat Mar 4 11:52:30 ipformd !f : Test/NotRun : /bin/sort +0 -3 -o formsave/NAGS.done formsave/NAGS
    Sat Mar 4 11:52:30 ipformd !f : Test/NotRun : /bin/mv formsave/NAGS.done 2go/#SN:geges#DU:nagsdone

    Other Points worth noting (ish) ..
    Break out - If either the input parameter -x or a header field FZBO is
    present, the input file is 'broken apart' into blocks, records and fields. The
    resultant file is called (dest)_(SN) in spool/formtest where dest is as above
    and SN is the filename.

    If an incoming file matches none of the tests, it is deleted and an error logged.

    In the selection file, remember to specify long names first. In the following example, job 'sunrac2' never gets processed as all files will be jobbed as 'sunrac'

    XK:RAC* >sunrac
    XK:RACING* >sunrac2

    Input parameters are (all optional) :
    -c : name of a queue into which copies of all incoming files
    are made. default: no copies
    -f : file creep time default: 0
    -i : queue to scan default: spool/format
    -l : do NOT log every incoming file/destination default: log
    -n : run the program at reduced priority default: nice 5
    -p : processing file to use default: tables/form/PROCESS
    -s : run files serially (ie one after the other) default: parallel
    -t : scan time of directory default: 3 secs
    -T : Always trace jobs. This is the same as the 'trace' parameter
    used for setup, tuning and testing a new job. All it does
    is tell IPFORMD to log each line in the Item log. def: no
    -x : debugging ON - ALL incoming files will be 'broken out' in formtest
    parameter is 'o'ctal, 'd'ecimal or 'h'ex. default: off
    -z : calm down time default: 5 secs
    To attempt to let ipformat finish one job before the next
    -v : display version number and exit.

    ipformat

    Ipformat is the key formatting program. It can read and split text, csv and
    xml files into
    records and fields and reassemble them using conditions and calculations.

    Please see [[Ipformat]]

    ipformbl

    Please see ipformbl

    IPFORMBL takes the text parameter file and builds a binary so that IPFORMAT can run faster.

    ipformch

    This is the checker for data formats. It is started usually by crontab.

    Please see Ipformch

    ipfsel

    This program is used to Select and sort lines from an incoming data files
    using
    a selection file.

    A parameter file is used to determine where the files should be sent and other
    parameters. This file is in tables/from/select and default to SELECTION. This
    may be overridden by the contents of the Fip Hdr field 'DF'.

    Please see ipfsel

    ipfprep

    Please see ipfprep for up to date information

    This program prepares incoming data to tweak it before IPFORMAT is run against it.

    ipfchk

    Please see ipfchk for up to date information

    This program scans a directory and checks the incoming files for CRC errors.

    If found the lines are stuffed into an error file and sent to an errdst
    while the resultant file is flagged with the HE field as ERROR.

    ------------------------------------------------------------------------------
    -->

    © FingerPost Ltd. 1996 and beyond

    Examples

    * [[Dataformatting examples]]