webwire

webwire

FOR HTTPS/port 443, please use the 'webwiressl' version of this program.

Webwire goes and gets pages of data from Other people's web sites automatically
and then sends those pages to your destination - usually the editorial system -
in the normal Fip fashion.

These can be updates of weather, financial data, sports results, backup for
Wire services if the satellite is down, graphics, software. In fact most
things.

It can be used either :
	- on a timed basis to get regular known pages.
	- on demand by sending a file into spool/webpoll with the FipHdr field DF set
to the parameter file required.

What it can do -
	- drill down links to several layers deep,
		optionally ignoring the data on the top levels.
	- select only certain links - either in XML, HTML, JSON or CSV
		- you set masks to filter which to get and which to ignore.
	- logon automatically to protected sites
		and save Cookie information for use in later accesses.
	- fill in standard form data to get make on-demand searches.
	- strip or rework HTML tags to make the data more presentable.
		This is meant for reasonably simple pages while more complicated ones
		will be routed through 'ipsgml' and/or 'ipxchg'.
	- Use an external list of values to make several grabs to the same
site/page/script
		but varying the search data for each hit. eg to pull all the values of a
financial index. (This we call a 'values-file')
	- Grab a 'id' from a web service and then sequentially call all pages using
intermediate ids from the last to the new one.

What it cannot do -
	- play tunes.
	- run javascripts or any other applet type affairs. (yet..)
	- run FTP, GOPHER or whatever (for these and especially FTP, see program
'ipftp' and 'iptimer').

The current version is primarily for getting text data but can be used for
images etc if required.

There is a TUNING mode to be used for setting up a new link and trying to clean
up the relevant parameter file WITHOUT sending (possibly) live data to the
required destination.
	- This shows the data with escaped unprintables and '$' at the end of a line.
	- All links and forms are also displayed.
	- Any pages saved in Tuning mode are NOT sent to the normal output queue
(spool/2go) but are left in spool/webtest for future perusal and/or deletion.
	- To run, choose your parameter file in tables/webwire and run 'webwire'
manually in a window:
		webwire -T AUS.STOX | more		for prompt before calls
	or	webwire -A -T AUS.STOX | tee aussies	for no prompting

There are Two (sometimes three) types of parameter file :
	1. Main Parameter file which sets up the polling of certain pages at set times
(if any).
	2. A Page Description file for each site/page accessed.
	3. Optional lookup file of values where you want to repetitively hit a site
changing certain values each time. (eg a sport site for several divisions or a
list of stox to get)

----- Main Parameter file -----

The syntax of the Main Parameter File - by default tables/webwire/XWEB :
	; comment line
	poll:(pram file)	day:(MTuWThFSaSu)	time:20:30	mustget:

In detail, the 'poll' keyword :
	Pram file is the name of the Page Description file - see below for its syntax
	day: 	Day of week to run the job :
			M	Monday
			Tu	Tuesday
			W	Wednesday
			Th	Thursday
			F	Friday
			Sa	Saturday
			Su	Sunday
			X	Every day.
			Z	Weekdays M-F.
			Case is NOT important.
			Commas (but NOT spaces) may be used to separate.
			Default is every day.
	either
	time: 	Time of the day	on 24 hour clock.	Default is 18:00.
	or
	every:	interval between grabs			Default: none
		every: (mins)	[(optional)	start:(starttime) end:(endtime)
		every:30	start:07:30	end:19:00
		The minimum interval is 1 min and maximum is 3 hours (ie every:180 mins)
		You may also specify in seconds using 'secs' or 'seconds'
		immediately after the number (with no spaces)
			every:10secs	start:09:30 end 09:50
eg:
	poll:AP		day:ALL		time:20:10
		Get the Page file tables/webwire/AP every day at 20:10
	poll:Forex	day:MTuWThF	time:16:30
	poll:Forex	day:MTuWThF	time:16:40
		Get the Page file tables/webwire/FOREX every week day at 16:30 and 16:40

There can be none or up to 200 polls in the main parameter file.
Note that the page is grabbed ONLY if the program is running.

----- Page Description Parameter files -----

The individual Page description parameter files are also in tables/webwire. The
syntax of these are :
	; comment start with a semi colon like this

MANDATORY
	url:	Full url of the page.				default: none
		There MUST be one and only one 'url:' specified.
		You can also specify the page, cgi and any subparameters.
		eg	url:www.fingerpost.co.uk
			url:www.big-press-org/sports/baseball/index.htm
			url:www.marketlook.co.uk/scripts/Summary.dll?HandleSummary

	dest:	Fip Destination for the files			default: WEBDATA
		This is the 'DU' FipHdr field as per the USERS file.
		eg	dest:w3saves

OPTIONAL:
	use-tls: no/yes
	use-ssl: no/yes
	use-https: no/yes
		Use Secure Sockets Layer (TLS/SSL) - also called HTTPS	default: no
		If the url starts 'https://....' then this command is NOT needed.
	ssl-method: (1,2,3,23,999)
		Version number to use for TLS/SSL		default: 999 for current default (2 or 3)
	ssl-password: (password)
	ssl-passwd: (password)				  default: none
		Optional password if the handshake requires a shared secret
	ssl-cert: (name of a PEM certificate file)		default: none
	ssl-root-cert: (name of a root PEM certificate file)	defaunt: none
		Optional certificates - held in tables/ssl

	port:	Port number of the Remore Server.		default: 80
		This forces the port to be this if none is specified.
	nofiphdr: Do NOT add a Fip Hdr to the file.		default: yes pls
	source:	Fip Source of the files. (FipHdr 'SU').		default: XWEB
		Unless 'noarchive' is specified, all data files will be archived under this
name in log/data.
		This can be in FipSeq so that 'combie' can be used to set a default..
	noarchive: Do NOT archive these files in log/data.	default: archive
	maxlevel:3	Maximum no of levels to drill down.	default: 1
		Normally the URL you have requested is the data you want.
		However if that is an index page with links that may change,
		it may be these lower-level pages that are needed. 'maxlevel'
		states how many levels of link the actual data pages are.
		Default is 1 = do NOT drill down any of the links.
		Note that level 1 is the first page.
	ignorelevel: Used with 'maxlevel' where the information		def: no
		required is on a linked page and NOT on the first page,
		use 'ignorelevel' to ignore all those pages on intermediate
		levels.  Note that level 1 is the first page.
		eg	; ignore levels 1, 2, 4 and 6
			ignorelevel:1,2,4,6
	matchlinks: Only follow links which match this mask.	def: all links
		Used only if 'maxlevel' is greater than 1.
		There can be many 'matchlinks'.
		Use the '*' as a wild card string and '?' as a wild chr.
		eg	; get all links ENDING 'html'
			matchlinks:*html
	matchforms: Only process forms which match this mask.	default:no forms
		Used only if 'maxlevel' is greater than 1.
		There can be many 'matchforms'.
		Use the '*' as a wild card string and '?' as a wild chr.
			eg	; get all forms ENDING 'asp'
				matchforms:getfile.asp
	matchframes: Only follow frames which match this mask.	def: all frames
		Used only if 'maxlevel' is greater than 1.
		There can be many 'matchframes'.
		Use the '*' as a wild card string and '?' as a wild chr.
		eg	; get all links ENDING 'html'
			matchframes:*.top
	matchkeys: Only follow links which match this test.	def: all links
		Used only if 'maxlevel' is greater than 1.
		Used only or 'linktag' where an attribute MUST be set for the link to be
valid
		There can be many 'matchkeys.
		Use the '*' as a wild card string and '?' as a wild chr.
		eg	; <hotel id=33 name="Fawlty Towers" url="http://www.ohnonotagain.com"
status="current" />
			linktag:hotel@url
			matchkeys:hotel@status=current
			matchkeys:hotel@status=ready
	match-case-sensitive: yes/no
		all matches and ignores can be case sensiive or in-sensitive
		DEFAULT changed 05u to INsensitive - previously sensitive.
	skip-links: Name of a file in /fip/fix/webwire holding names of links
		and forms already accessed; so that only new ones are tried.
			eg	skip-links:webwirelinks.$d
			default: none
	skip-details-tag: (tagname) extra details (such as a publishdate) for check if
existing links have been updated
			see below on the section for RSS feeds
			default: none
	skip-purge-after: (hours) Number of hours to keep the skip entry
		default is 24.  You might want to tune this :
			make bigger if sites add/take off old material
			reduce the time if the same link is used for different data
	skip-save-data: (FipSeq field)
		Sometimes there is some data in the link which changes for every access -
such as a Cookie or SessionId
		eg the first access might get
			search.do;jsessionid=A9823A4622A23C10C4EC7F1825BF9E26.node1?messageId=268482
		and the second
			search.do;jsessionid=FCC18E9582E77C2AD9EFE6C68CA0F0A2.node1?messageId=268482
		But they both happen to be the same file - messageId=268482
		Use FipSeq to just get the data that contains ONLY the information you want
to save.
		Certain FipHdr fields hold relevant info:
			WX is the field marker '^'
			WS is the skip details tag (optional - see above)
			WT is the type - 'a'-anchor
			WL is the level no
			W$ is the actual link - anchor, form etc
			WH is the associated display text from an anchor tag
		In the above example :
			; split on the '?' - get the second field
			repeat:Q1	W$,?.2
			; skip string is now 'messageId=268482' - note the FipSeq needs a backslash
			skip-save-data:Q1

	skip-balance-group: name of a balance group (in tables/sys/BALANCE) to
distribute
		the skip file when changed (see doc on 'ipbalan')
		This is often used where a second system could be used as a redundant server
		if the main system fails. (see also -B input switch)
	ignorelinks:	Of the Links found, skip any matching this mask. default: all
links
		Used only if 'maxlevel' is greater than 1.
		There can be many 'ignorelinks'.
		Use the '*' as a wild card string and '?' as a wild chr.
		eg	; ignore any links pointing at any 'netscape' or 'microsoft' site
			ignorelinks:*microsoft*
			ignorelinks:*netscape*
			; ignore any links requiring 'ftp:'
			ignorelinks:ftp:// *
			; ignore any links to other sections
			ignorelinks:../ *
			; ignore any links to any index
			ignorelinks:*index*
	httphdr: Extra lines of HTTP header you may need.	default: none
		Remember to add a NL at the end of each line.
		There can be multiple httphdr lines but pls remember to add a 'n' at the
		end of each one. (or you can try to force all on one httphdr line!)
		eg	httphdr:Authorization: Basic AbGtGgbhpdOkOTE=n
			httphdr:User-Agent: Mozilla/4.0n
			httphdr:Host: wibble.wobble.comn
		see below for 'useful, common header lines'
		** ALL basic-authentication MUST BE HIGHER IN THE PARAMETER FILE THAN httphdr
OR proxy-logon
	basic-authentication: (fiphdr field) (logon:password)
		Build a FipHdr field with the BasicAuthentication formatted logon:password
		Pls remember to escape any funny chrs - like backslashes
		** ALL basic-authentication MUST BE HIGHER IN THE PARAMETER FILE THAN httphdr
OR proxy-logon
		eg	basic-authentication:BA DOMMY\zipple:Ardvark99
			httphdr:Authorization: Basic BAn

	method:	POST/GET/DELETE/PUT etc				default: GET unless 'post:' is specified
		normally this is a single UPPERCASE action - with NO spaces.
	post:	 Post a Form					default: get url
		see below for processing a form using method=POST.
	filename: Filename for the output file in FipSeq.	default: WEB$Z
		If this does NOT start with a '/' it is left under the
		Output Queue as specified on startup (default spool/2go)
		eg	filename:AFP$d.$z
	striptags:(yes|no) Strip tags and attributes		default: no
	wild: (FipSeq)	Character used as a Wild String for	default: '*'
		'matchlinks/ignorelinks'.
		eg	wild:377
	singlewild: (FipSeq) Character used as a single		default: '?'
		Wild chr for 'matchlinks/ignorelinks'.
		eg	singlewild:!
	number: (o|d|h)	Number system for FipSeq		default: octal
		octal, decimal or hexidecimal
		The following are all equivalent :
			number:octal
			before:40
			number:decimal
			before:32
			number:hex
			before:20
	before:	FipSeq String to add before any data.		default: none
	after:	FipSeq String to add after any data.		default: none
	script:	Script to run on ths data of the incoming file.	default: none
	outque: Output folder (in FipSeq)			default: spool/2go
		This overrides both the default and the '-o' output switch
		except for Testing/Tuning mode where the file is forced
		to spool/webtest.
	log:	FipSeq custom logging for the item log.		default:SN SU EF : EH,EP
		This logs each Page grabbed
		Note that
			EH or ST	remote site host
			EP or SP	remote site port
			EN or SF or SG	remote site url		SG is the actual link, the others are the
link used to grab
			EF		parameter file used
		The default is that no incoming files are logged by webwire
	custom-log:	FipSeq custom logging for the item log.		default: none
		This can be used to log link details in a custom log
/fip/log/webwire/(date)_(paramfile).fip
	
custom-log:pnac.YN|date.YT|procdate.T7|taketime.T9|source.TU|take.TZ|head.TH
	log-https-errors:warn
		Any failure to go secure in https connections are flagged as warnings
		The transmission is always aborted. This parameter affects only the logging.
		default:  !x for failures

	extra:
	extra-pre: Extra FipHdr fields to be added to the output file.  default: none
		To separate FipHdr fields, pls use a '#'.
		extra-pre is added as soon as the file is read - so may be used for
information in the URL
		extra is only used for any output file and is not used at all for any other
purpose.
		eg	extra:ZH:NYNZ#DI:Headline News#QZ:333
	tag:	FipSeq String to replace the start tag		default: none
		such as <H1>. There can be many 'tag's.
		eg	tag:P		{Para}n
	endtag:	FipSeq String to replace the End tag		default: none
		such as </P>, </TITLE>. There can be many 'endtag's.
		eg	endtag:TITLE	n
	getimages: Also get all the images
		By default all images - *.gif or *.jpeg are ignored.
	keep-alive: yes/no					default: no
		Just that !	default:no
	http-version: 1.0 or 1.1				default:1.0
	only-get-if-modified:	(FipSeq message if not found)	default: get
		This will check the remote server for the time the page was
		last modified. This does not work with old servers and some
		set to HTTP/1.0.
		If modified since, the page is read
		If not, the optional message is sent
		If there is no message, no data is sent - just a note in the item log
	ignore-key:PHPSESSID
		When matching for skip files, ignore this key-value pair.
		see the section below on Repeat Offenders
	max-items: (number)					default: 0 for all
		Max number of items to grab per session
		Some sites only allow you to read 5 or 10 items before blocking you.
		Use this to creep under that total.
	pause-between-files: (secs)
		Gap/wait/pause between grabs	default is 5 for standalone, 1 for iptimer
		This is overridden by the -w input switch
	one-output-file: Put ALL data in a single output file.
		The default is one file per page/access
		Use this with 'values' to create a single output file.
		This ONLY uses the FipHdr of the first file if 'values' have been specified.
	end-of-document: Where a site is sending really really crap HTML - or XML
		use this to state what the last tag.
		For no checking at all : end-of-document:
		Default:		end-of-document:</HTML>
		See below for a standard-fingerpost-rant on crap HTML.....
	end-of-cookie-page: end text which signifies the end of a logon or cookie page
		This is rarely changed.
		default is </HTML>
	connection-timeout: (secs)
	wait-end-timeout: (secs)
		For slow, busy sites, data - especially big files - may take a lot longer
than normal to be
		retreived. Use this to expand that time. Default is 120 (it should be
divisible by 5 for some arcane reason)
	pretend-301: (3 digit number)
	pretend-302: (3 digit number)
		Ignore redirects (HTTP return code 301 or 302) and assume they are this
return code
		pretend-301:200
		this will take a 301 and save the data as through it was an incoming file.
	no-data: (FipSeq string in place of data)
		Do not get/send the data - just this string
	data-is-binary:(yes/no/maybe)
		Data files at the lowest level are binary or not
		default is check for <?xml, Tiff, Jpeg, MsWord/Office, EPS and PDF
automaticatlly
			otherwise it is treated as text
	ignore-mime-if-binary: (yes/no)
		if yes = Strip the MimeHeader off binary files
		default is no to leave it on - so you know what the file really is !
	proxy-server: If using a proxy, these are the name and port to aim at.
	proxy-port:
	proxy-logon: This is the logon and password to get thru the firewall
		if required. The format is (logon) (colon) (password) and is
		converted to base 64.
		proxy-logon:Y2hyaXMuaHVnaGpvbmVzOnBhbnRoZXIK=

		** ALL basic-authentication MUST BE HIGHER IN THE PARAMETER FILE THAN httphdr
OR proxy-logon
		To generate use basic-authentication or:
			echo -n "logon:password" | sffb64 -i
		eg	echo -n "chris:sleekpanther" | sffb64 -i
		gives	Y2hyaXM6c2xlZWtwYW50aGVy
			proxy-logon:Y2hyaXM6c2xlZWtwYW50aGVy=
	proxy-is-squid:yes/no	Is the proxy a Squid ?	default: no
	proxy-handshake:yes/no	Does the proxy need to say hello first ?	default: no
		If the proxy is a Squid, this MUST be NO

	logeachfile:(dest) Send a Success/failed msg to this destination
			for each file. There is no default. This log file is
			just a FipHdr with the following extra fields :
				DR-File Sent OK		DR:ok or DR:error
				DG-Will Retry later	DG:retrying, DG:stopped
				DT-Some message text	DT:No connection
			default: no log created.
		The text for the DR and DG can be in FipSeq and so can contain
		FipHdr and other variables. As they are FipHdr fields, please
		do NOT put NL, CR etc in the fields.
		Note that System Variable $q holds the time taken for transmission.
	DRgood:(text)	Message for the FipHdr field DR on a   successful tx
			default: ok
	DRbad: (text)	Message for the FipHdr field DR on a unsuccessful tx
			default: error
	DGcont:(text)	Message for the FipHdr field DG if, after an
			unsuccessful tz, another attempt will be made.
			default: retrying
	DGstop:(text)	Message for the FipHdr field DG if no further
			attempts will be made as the file was sent successfully
			or the maximum no of attempts has been tried.
			default: stopped
	fiphdr-for-logeachfile: (FipSeq) or
	msgeachfile:(FipSeq) Additional information to add to the FipHdr of the
			'logeachfile' or 'loglasterrfile' msg. This should be in FipHdr
			format and be in FipSeq. It can be used to pass FipHdr fields
			in the outgoing file into the log file.
			eg	msgeachfile:	DF:logdialnSS:SSn
			default: nothing added


To save the contents of a particular Tag or TagAttribute, use the 'fiphdr'
keyword :
	fiphdr:(FipHdr field)  (optional subkeywords)
		Either  tag:(name of tag)
				specify the tag name which contains the data required.
		Or	data:(FipSeq)
				for adding FipHdrs with standing data.
				fiphdr:TT	data:$e$y$i$d
				will create a FipHdr field DT with the current date in it
		Or	tag:(name of tag)@(name of attribute)
				specify the tag name and the attribute name which contains the data
required.
		Or there can also be a 'key' parameter for selecting the data ONLY if there
is Key attribute with its data equal to a certain string:
			eg: if the tag is <meta name="category" content="f"/>
				fiphdr:NC	tag:meta@content key:meta@name=category
				Get the contents of the content attribute of 'meta' where another attribute
called 'name' has the value 'category'
			or	fiphdr:NC	tag:meta	key:meta@name=category
			or	fiphdr:NC	tag:meta@name=category
				Get the data for the 'meta' tag that has an att 'name' = 'category'
			Double quotes around the Key Data are optional unless there are embedded
spaces. The Key Data can be in FipSeq.

		For any of the tag options, use 'dup' to flag duplicated fields.
			dup:(optional separator)
				This field may be duplicated. Duplicate fields are separated
				with a space unless a separator chr is also specified.

		Where there might be embedded tags inside the main tag, use 'repxml' to
specify a replace string
			repxml:(FipSeq)
			eg fiphdr:AL	tag:TD	repxml:+s+
				and the data is <td>abc<br>efg<br>line3</td>
				will give	AL:abc+ +efg+ +line3

		As some FipHdr fields have distinct meanings - SN, DU, DP etc - please use
other 2 letter codes starting N or Q.
		In the current version of webwire, you CANNOT specify trees of tags ie
fiphdr:AA tag:entry/id.

	eg	fiphdr:NA	tag:itemid  dup:+
			get the data from each <ITEMID> field. If there is more than one,
			they are separated by a '+'.

	fiphdr-save:(FipSeq)
	fiphdr-file:(Filename in /fip/fix/webwire/fiphdr)
		This allows data to be stored as FipHdrs at the end of the session - and read
at the begining of the next
		So items like Sequence numbers and time-of-access can be passed between
attempts.
			; default name
			combie:QA	WA,default
			; save and possibly reuse the FipHdrs ....
			repeat:JQ	J1,+,1
			repeat:JD	J2,+,1
			fiphdr-save:BQ:JQnBD:JDnXX:some commentn
			fiphdr-file:websave_QA
		** This must be lower down the parameter file than any FipSeq if you are
using FipHdr fields as the example above !
		There can be multiple 'fiphdr-file' - all of which are read as the parameter
file is read.
		But if there is a fiphdr-save, ONLY the last 'fiphdr-file' is stored to.

	fiphdr-on-all-levels:
		Add the FipHdr to each file on every level - default: no
	fiphdr-hash: (single chr in FipSeq)
		This will replace a Hash '#' in a FipHdr field (as Hashes are normally
end-of-fiphdr field)

	meta-to-save:(FipSeq)
	meta-save-file: (Filename)
	meta-save-on-tag: (tag name)
		This meta file is appended to on the End-of-tag specified (or end-of-file if
no tag specified)
			; save these fields to the lookup file
			meta-to-save:J3|J5|J6|J1|J4|$h:$n:$bn
			meta-save-file:/fip/data/blob/$e$y$i$d/WA
			meta-save-on-tag:LINK
	reset-fiphdr-on-tag: (tagName)
		Trim the FipHdr - and extra, added fields - on the end of this tag to the
same position when the tag started
		This can be used in meta-save to make sure that FipHdr fields from one group
of tags to not linger and are not used for the second or subsequent
		default: not used.
	grab-on-tag: (tagName)
	grab-on-endtag: (tagName)
		Any links should be grabbed at the start or end of this Tag
		default: all links are grabbed at the end of the page
		An extra parameter may be specified on the same line for level eg
		grab-on-endtag:VALUE	level:3
		grab-on-endtag:params/param/value/struct/member
	retry-404-max:3
	retry-404-gap:1
	retry-404-error:abort/ignore/move
	retry-404-queue:2go
	retry-404-fiphdr:#CE:300#DU:nextstage
		Retry links which return a 404 Not Found error. Max is the number of retries
and Gap is the pause in seconds between the retries
		Use this for those sites which are a bit slow to add the data files the links
point to.
		If the files really are not there - and you do NOT want to abort the
transmission - use 'retry-404-error:ignore' to continue with the next grab
		OR you can use retry-404-error:move and retry-404-queue:(queue in spool) and
retry-404-fiphdr:(FipSeq) to send a item
	retry-500-code:505
	retry-500-max:5
	retry-500-gap:1
	retry-500-error:abort/ignore/move
	retry-500-queue:2go
	retry-500-fiphdr:#CE:300#DU:nextstage
		Retry links which return this system error - code can be any 3 digit number
above 400.
		Max is the number of retries and Gap is the pause in seconds between the
retries
		Use this for those sites which are a bit slow to add the data files the links
point to.
		If the errors continue - and you do NOT want to abort the transmission - use
'retry-500-error:ignore' to continue with the next grab
		OR you can use retry-500-error:move and retry-500-queue:(queue in spool) and
retry-500-fiphdr:(FipSeq) to send a item

More Complex sites ------

-- The links are not in the normal Anchor or Frame tags.
If the Site returns an XML feed rather than HTML, you can specify what the
contents of which tags you want to play with. There can be up to 10 tags
specified.
	linktag:(tagname)
or	linktag:(tagname)@(attribute)	(for version 05u onwards)
	linktag:TEXT
	linktag-2:Slavver
	linktag-3:Bone

or to imitate the defaults :
	linktag-1:a@href
	linktag-2:frames@src

- sites which return other data which is not xml - such as CSVs
	data-type:CSV	(can be CSV for comma sep format, JSON, PSV for Pipe sep, TXT)
	data-type-sep:|
	data-type-eoln:
	data-link-idx:2
		define the column containing the link to the data
	headline-link-idx:3
		define the column containing the headline
	skipdetails-link-idx:1
		define the column containing the skipdetails

- RSS feeds
Sometimes a link can point to data which gets updated and there is a second tag
which gives either a unique-id or a date/time which you need to track for any
changes. Use the 'skip-details-tag' to specify the second tag - it is the
combination of the 'linktag' and 'skip-details-tag' which should be unique.
For general RSS 2.0 feeds, this can either be 'pubDate' or 'guid' :
	linktag:link
	skip-details-tag:pubDate
In RSS feeds there is often a fake 'link' at the top which is the channel.
Usually you do not want this one - often it is a URN not a real URL, so use
'matchlinks' or 'ignorelinks' to bypass it.
if more than one skip details are needed, up to 9 skip-details-tag-X can be
specified.

-- If the data in the link is not complete..
Use templates to slot data from a link into another call. This is again used
extensively for XML work - like soap.

It uses either just a template (in FipSeq so you can add Header Fields etc) or
a template AND a template file if there is a lot of data.
	level2template:/query.dll?src=QD
	level3template:/getFile.dll?file=W$
	level3template-file:soap-getfile.xml

There are 4 templates  for level2, 3, 4 and 5. 'maxlevels:' and 'ignorelevels'
must always be used with these to specify which one you need the data from.

A levelXtemplate on its own will generate a GET.
To POST something, you will also have to specify a 'levelXdata: (FipSeq)' eg
	; level 3 - To get THAT file is always a POST of
FileManager1%24gvwFiles%24ctl03%24gvlnkNam
	; .. using the different EVTVAL and VWSTAT
	level3template:/proximity/Admin/FileManager.aspx

level3data:__EVENTTARGET=A2&__EVENTARGUMENT=&__VIEWSTATE=G8&__EVENTVALIDATION=G9
will force as
	POST /proximity/Admin/FileManager.aspx
will data filled in for fiphdrs A2, G8 and G9 eg

_EVENTTARGET=FileManager1%24gvwFiles%24ctl03%24gvlnkName&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUJMjQyMzY1MzEX%3D&__EVENTVALIDATION=%2FwEWAwKVxPyBCgLc

Note there is no level1template as that is the same as the URL:.. BUT there is
a 'level1template-file' version. In this case the URL: should be just that.

The template-files are normally in /fip/tables/webwire. They are NOT force
uppercase.

The default Content-Type for POSTing data or forms,
'application/x-www-form-urlencoded', sometimes needs to be changed for
templates.
It can be changed with the 'levelXmime' parameter. For example, soap normally
likes a content-type of 'application/soap+xml':
	level1mime: application/soap+xml
unless you are Microsoft of course who usually/sometimes want
	level1mime: text/soap

The 'W$' in the example is because each link is put into a temporary FipHdr
field called W$ as it is being used. If the link data is too much or too
little, use FipSeq to chop/add/replace.
Eg 	if the data in the link is "nine:/rt/newsart/Id="z1jit4":text"
	And you want a link like
		/searchDB?database=nine&link=/rt/newsart/Id="z1jit4"&format=text
	use	repeat:R1	W$,:,1
		repeat:R2	W$,:,2
		repeat:R3	W$,:,3
		; if there is no 3rd field, use 'xml' instead
		combie:W4	R3,xml
		level2template:/searchDB?database=R1&link=R3&format=W4

Values -------
Values can be -
	- EITHER a file containing lines of values to be used to repeatedly grab data
for a single file.
		using values-file:(filename in tables/webwire)
	- OR a sequential number
		using values-seqno:(min value):(max value):(incremental value)
		plus	values-seqno-fiphdr-from: (FipHdr field containing the From seqno - ie
start grabbing from the NEXT id after this)
			values-seqno-fiphdr-to: (FipHdr field containing the To seqno - ie each
seqno until and INCLUDING this one)
	values-get-url:
	values-post-url:
	values-post-data:
		Fipseq to POST a form or GET a link from a line in the
		values file. See below for a description.
	values-sep: Separator chr for splitting fields in the values file.
		default is a pipe - '|'
	values-leave-spaces: Normally leading spaces are trimmed from each
		field in the values file. Use this to preserve them.
	values-parallel: (Number of Simultaneous Hits)
		For 'values' the default is to run the hits serially, one after the other has
finished. Use this to send out a number of hits
		at the same time which should reduce the total time by a large factor.
However, you should check with the remote and test
		what the number should be. For Apache sites for example, 8 is a common
default setting.
		eg	values-parallel: 10
	values-fiphdr: Normally fipHdr W1 will contain the first field of the values
file, W2 the second etc.
		So data can be specified by W1
		Use this parameter to specify another field - ie if W1 is being used
elsewhere.
		** Note that if you are using iptimer to start webwire running a values file,
the Wx fields will be zapped in the output file.
		So in this case, always use 'value-fiphdr:' with a different FipHdr if you
want to use the Values in iproute or another downstream program.
		eg	values-fiphdr:R1
	values-pause: (secs)
		Gap/wait/pause between Grabs using the next value	default is 0 for none
	zap-values-file: (yes/no)
		Delete the values file after it has been used.	default no
		Only files in /fip/x/ starting TMP.. can be deleted.

Note that in the FipHdr - unless the 'nofiphdr' keyword has been requested, the
following fields will be filled in :
	Day and time in the normal HH,HD,HY etc fields
	ST	host
	SP	port
	SF	url - path/filename being grabbed
	SG	url - path/filename with is the link
Where webwire is sitting on a scrolled queue (using -i), the folder name is in
EQ and the filename EN (with all '#' replaced by the chr chged by
'fiphdr-hash')

Input Parameters (all optional) :
either	-i : scrolled queue				default: no default
		This checks the folder and for each file, checks the FipHdr for 'DF' which is
used for the name of the parameter file to run against
		This allow a variety of parameter files to be run
or	-1 : Run a single time and exit			default: spool
		The parameter is the name of the individual parameter file in tables/webwire
(ie NOT The top or main parameter file)
or	-T : Tuning mode				default: spool
		Display links and data for the page requested. Runs only that page and then
exits.
		The parameter is the name of the individual parameter file in tables/webwire
(ie NOT The top or main parameter file)
	-A : In Tuning mode, do NOT prompt before searching a link	default: prompt
	-a : log the actual link of each accesses in the FipLog		default: no
		This can be quite a lot of logging if you are grabbing lots of files !
		But is quite useful when starting/adding a new feed.
	-B : default balance group for skip files			default: none
		(see skip-balance-group parameter)
	-C : warm restart for cookies			default: always ask for new cookies/logon
		ie do NOT re-logon if the previous session logged on and saved the cookie
		if any cookie is missing or has timed out, all cookies are wiped and webwire
needs to be re-run to logon and download.
	-d : done folder for -i scrolled queue		default: none
		This can be overwritten by the 'doneque:' parameter
	-D : display the Request and Response		default: do not
	-e : exit with the Result Code of the last grab.	default: normal program exit
		The Normal exit is 0 if ok, negative number if not
		With -e this will be 0 for ok, and -1 (timeout) but 4XX or 5XX for page
errors.
	-F : do NOT add a FipHdr to the output file	default: do
		this can be overridden by the 'nofiphdr:no' parameter
	-h : extra FipHdr information			default: none
		This is in FipSeq and should normally be quoted
		Note this is the means that 'iptimer' sends variable information to webwire
		eg : -h"SN:hello#TC:200401031"
	-H : display the Request and Response in fancy HTML	default: do not
	-I : wire id						default: 0
		used to track which instance of a multi-webwire system a file arrived/logged
	-k : ignore the Skip list (used mainly in tuning)	default: use skip-links:
	-K : Do NOT save or process any data, just build up a skip file.
		This can be used before putting sites into production so that
			all old links are ignored and only new links will be tracked.
		ie run 'webwire -1 (name) -K' once beforehand.
	-l : no logging to the FipLog except for errors	default: log all
	-L : log new files and errors to the FipLog	default: log all
	-N : path and filename of the output file 	default: fip standard
		use this to leave the file(s) in a non-std folder.
	-o : output queue in 'spool' 			default: spool/2go
		This can be overwritten by the 'outque' parameter
		This is ignoring in Tuning mode.
	-O : force ALL output to this queue in 'spool' 	default: spool/2go
		This overwrites the 'outque' parameter
		This is ignoring in Tuning mode.
	-s : generate statistics for bandwidth usage	default: no
		using Hour_group files
	-S : generate statistics for bandwidth usage	default: no
		using name of group_client files
	-t : track status				default: no
		this can be overwridden by the parameter
			track-status:no
	-V : if using spool-a-folder (-i) then stop when it is empty	default: keep
spooling
	-w : Wait in seconds between accessing links.	default: 5
	-x : Proxy server host or IP address		default: none
	-X : Proxy server port				default: 80
	-y : Proxy logon				default: none
	-Y : Proxy server is Squid			default: no
	-z : parameter file in 'tables/webwire'.	default: XWEB
	-v : Print the version number and exit

---- Other Notes ----

-- Netiquette --

Pls note if you are grabbing data off another site, then you should contact the
webmaster of the remote and let them know. Certainly if you are accessing every
few seconds, then there is a good chance they will put you on some refuse list.
So it pays to be nice !

-- How to find out the actual url....

Sometimes it is quite difficult to find out the real path to use for the url.

Especially so for script-driven gets and puts.

NetScape or Iexploiter is invaluable in this case..
 - use either 'View Source' or 'History' normally gives the game away!

Snooping using tcpdump or windump
	0. Open a Terminal/Cmd window and start you browser - without hitting the site
yet
	1. Find out which interface
		tcpdump -D
	2. Leave tcpdump running in background
		On Mac OSX you will need to be sudo
		tcpdump -i1 -w remo.tdmp -X host www.remote.host

	3. On the browser, do the absolute minimum ..
		.. do a simple logon and grab ne file using Firefox, Mozilla, IExp, Safari
etc
	4. CntrlC to stop tcpdump
	5. run tcpdump to show data
		rcpdump -r remp.tdmp > remo.fip
	6. call up remo.fip in an editor.


-- Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie Cookie

Cookies are neat but nasty.
If you already know the cookie you need, just make a file in /fip/fix/webwire
with the name of the cookie (case is important on Unix boxes) and slap in the
whole of that cookie which has the syntax
	(key)=(data)
ie	zumzum=hungryTummy

Before grabbing data pages we can attempt to logon to a box and get its cookies
!!
This uses from 1 to 9 GETs or POSTs

	add-cookie:C1; C2	Add the Cookie on to the end of the HTTP headers in this
form
	get-cookie-1:	Command to send to get a cookie or to logon.
	get-cookie-data-1: Optional data usually required for a POST
	get-cookie-http-1: more HTTP headers used ONLY for this GET/POST
	cookie-fiphdr-1: name of the cookie to use as a FipHdr field C1 to C9
			ie if there are several cookies returned but only one
			is needed, put the key as the cookie-fiphdr
			ie Set-Cookie: ABC=12345
				add-cookie:C1; perm=yes
				cookie-fiphdr-1:ABC
			will result in a Cookie: ABC=12345; perm=yes
			If you  want all the cookies to be saved, use '*'
				cookie-fiphdr-1:*
	cookie-ignore-redirect-1: just that
		Ignore any redirects (like 302 Moved).
	follow-cookie-redirect:	just that !
		ie if you get a 302 Moved Temporarily status Plus a Location from a cookies
-request,
		use that rather than the 'url:..' specified.
			HTTP/1.1 302 Moved Temporarily$
			Date: Fri, 29 Oct 2010 00:17:19 GMT$
			Cache-Control: max-age=3$
			Location:
http://fippo.fip.fip/palio/html.run?_Instance=cms_csi&_PageID=1&_SessionID=1068051&_SessionKey=922432532&_CheckSum=328747502$

There can be up to 9 of these.
	eg	add-cookie:C1
		get-cookie-1:GET /
		get-cookie-2:POST /logon.pl
		get-cookie-data-2:logon=helpme&password=iamswimming

Rarely are any  get-cookie-http-1 fields needed as
	Host, Content-type, and Content-length are added automatically
	Referer is added if you have specified a 'referer'
		which you should if running 'http-version:1.1'
	Keep-alive is added if you secify 'keep-alive:yes'
	Others 'httphdr' fields should be specified as normal..

As a general rule, some Microsoft IIS sites (who else!) have problems if you
HTTP headers are in the wrong order. Basically, make sure your CONTENT* lines
are last.

Example 1
; ------------------------------------------------------
; we need to go and get a cookie for this service
; we will call it C1 - so the httphdr will be 'Cookie: (contents of C1)'
add-cookie:C1
; C1 will hold the contents of an incoming 'WebLogicSession=.....'
cookie-fiphdr-1:WebLogicSession
; this is the URL to hit (with parameters) to trigger the Cookie
get-cookie-1:GET /servlet/com.login.DispatchServlet?Login=&User=guest&Pwd=guest

Example 2
; ----------------------------------------------
; in this case we have 3 cookies C1, C2 and a fixed one 'b'
; C1 is SID=..
; C2 is ASP...=...
; add the fixed 'b=b' on the end
add-cookie:C1 ;C2 ;b=b
; just one grab at a cookie - and Logon and the same time
get-cookie-1:POST /login/Login.asp
; one logon string
get-cookie-data-1:u=%2Findex.asp%3F&l=letmein&p=ohpleaseplease&x=0&y=0
; ignore the 302 return - it is only trying to send us to index.asp
cookie-noredirect-1:
; Save the two cookies as C1 and C2
cookie-fiphdr-1:SID
cookie-fiphdr-2:ASPSESSIONIDASDQCAAD

This will POST - ie pretend to be a filled out html FORM - the logon back.

Note that the cookie-data is 'URI escaped' ie if it is a special chr - like
/?&+ - and is in the data bit, you must use the '%xx' notation (where xx is
the HEX value). But hopefully you would have seen that in your tcpdump/snoop
anyway.

-- Proxies Proxies Proxies Proxies Proxies Proxies Proxies Proxies Proxies
Proxies

When running through a proxy server, you will need :
	1. hostname of the proxy server
	2. port number on the proxy server if it is NOT port 80
	3. (optionally) a logon and password
	4. Is the proxy SQUID ?
		If so headers are slightly different.

If this information is NOT available, normally you can find it easily from any
PC or Mac on the internal network using a browser like Netscape or IExplorer.

Start a NEW copy of either of these.  - It must be a new copy to check on
logons etc.

Under 'Preferences' or 'Internet Options' there should be a 'Connections'
section and under that, the host name or ip address plus host name of any proxy
used.

Note that often the main Fip server is NOT running DNS and will not be able to
resolve external hostnames, so the IP address must be used in this case.

Enter these values in the Fip parameter file as :
	proxy-server:195.13.83.99	(no default)
	proxy-port:412			(this defaults to port 80)

Use the Browser to attempt to access a web site outside the firewall - like
'www.fingerpost.co.uk'.

If you are asked for a password to get through, you will probably need to add a
'proxy-logon' parameter too unless the keeper of the Firewall has made a hole
through just for you.

The data for 'proxy-logon' is in base64 in the format (logon) (colon)
(password).

Use 'sffb64' to generate this string :
	On a Sparc	echo -n "chris:magicman" | sffb64 -i
	On Linux	echo "chris:magicman" | sffb64 -i
	On Winnt	type "chris:magicman" | sffb64 -i

	proxy-logon:Y2hyaXM6bWFnaWNtYW4===

The actual 'You need to Logon, Pal' message is a '407 Authentication Required'
message.

-- Repeat Offenders -----------------

Some sites add a session-id into each and every link. And this Id changes on
each access.

To 'webwire' this appears to be a new file and so it is grabbed every time -
falsely.

There is an 'ignore-key' command to isolate and ignore the relavany parameter.
eg Take a site like :
	url:http://www.fingerdong.com/
	matchlinks:*&news=yes&newsid=*
	ignorelevel:1

which returns links like
	/en/pressrelease.php?date=20080910&news=yes&PHPSESSID=11bf21&newsid=7866

If value of PHPSESSID changes each access, they you will get a copy of newsid
7866 every time.

Use :
	ignore-key:PHPSESSID

Do NOT specify the '=' or '?' etc.

-- Others Others Others Others Others Others Others Others Others Others Others

--Where 'webwire' is used to drill down links, there is a wait of about 5
seconds between accesses which, hopefully, is enough time for other people to
use that server.

--Where a logon and password is requested as part of the Browser - ie a pop-up
from Netscape or IExplorer, NOT an HTML form - you will need to add a
'Authorization' line. This will be true if you get a message like :
	HTTP/1.0 999 Authorization failure
		... etc etc etc ...
	Assuming you know your logon and password :
	1. Use uuencode or sffb64 to generate a Base64 string
		echo -n "logon:passwd" | sffb64 -i
	2. Add an extra line to the parameter file with the result of the sffb64 line
using 'httphdr'.
		Syntax:	Authorization (colon) (spc) Basic (spc) (Base64 of logon:password)
(n FipSeq for NL)
		Eg	httphdr:Authorization: Basic AbGtGgbhpdOkOTE=n

-- Valid links are :
	- The HREF tag atttibute in A for Anchor	<a href="www.fingerpost.co.uk>
	- The SRC  tag attribute in FRAME		<frame src="ax1000.html">
	- The URL in a META/Refresh			<META HTTP-EQUIV="Refresh" CONTENT="0;
url=go4thAndMulitply.com">

-- For 'matchlinks', the term LINK is the contents of the <a href="THISONE">,
NOT the associated text
	ie matchlinks:*boonies*
		will find	<a href="/rubbo/boonies/tunies.html">This is a Wonderful Page</a>
		BUT not		<a href="/tunies.html">This is the boonies Wonderful Page</a>

-- Note that 'ignorelinks' refers to both Links and Forms.

-- If you want to ignore all links and only get forms, use a weirdo name in
mathclinks
		matchlinks:gobbLedeGook9981

-- What are reasonable HTTP headers ?
1. If you are using HTTP Version 1.1, you MUST add a line in the headers which
specifies the actual host you are trying to access (ie the REMOTE hostname or
IP address):
	httphdr:Host: www.theirsite.comn
or if DNS is a problem
	httphdr:Host: 123.456.789.012n

2. Most servers would like to know what you are and what you can do - so lie !
	Try this for starters :
	httphdr:Accept: 52/52n
	httphdr:Accept-Language: enn
	httphdr:User-Agent: Mozilla/4.0 (compatible; MSIE 4.01)n
Note the syntax is httphdr:(Keyword) (colon) (space) (Parameter) (NL)
	Keyword is case-INsensitive
	There MUST a Colon-Space beteween the Keyword and Parameter.
	The line MUST finish with a single NL (which webwire will handle correctly)
	as Double NLs mean end of header.


-- ValuesFile ValuesFile ValuesFile ValuesFile ValuesFile ValuesFile --

Take the case where you need to get the 10 foreign exchange rates every 20
minutes from a site like Yahoo.

The normal way would be to test using one forex rate and, when ready, just
duplicate that parameter file another 9 times, just changing the forex
name/search string in the 'url' or 'post'.

The classy way is to pput all the search values (ie the bits that change) into
a single 'values-file' and reference them using FipHdr fields W1 to W9.

To Do this :
If the original url is :
  http://finance.yahoo.com/m5?a=1&s=USD&t=LAK

1. Create a values-file in /fip/tables/webwire - lets call ir VALUES_4_FOREX
	This can have the normal Fip-style comments of ';' at the start of line
	;
	; Values file for Forex
	;
	USD|LAK
	USD|YEN
	USD|MYR
	; end of values file

2. In the WebWire parameter file - lets call it FOREX.
	;
	;	FoREX
	;
	port:8080
	url:http://finance.yahoo.com

	values-file:VALUES_4_FOREX

	values-get-url:/m5?a=1&s=W1&t=W2

... and let rip.....

Note that W1 is the first field, W2 the second etc. If you are already using W1
for something else, specify another FipHdr field to start on with the
'values-fiphdr' parameter.

Note that the FipHdr fields are useable for filename and other Fippy things.
  filename:Forex-W1-W2.fip

will give filenames (and/or FipHdr SN) for our example of
	Forex-USD-LAK.fip
	Forex-USD-YEN.fip
	Forex-USD-MYR.fip

-- Standard-FingerPost-Rant on bad HTML ----------------------
-- Using Webwire to pull off other file formats

Sometimes, 'webwire' seems to only grab part of a page and never returns
errors. Well, if you use a browser to look at the page and then 'View Source'
or 'View Frame Source', lo and behold there is probably a random </HTML> at
that point.

</HTML> is of course the End Tag of an HTML document. So we SHOULD stop there
really.

But a lot of web sites do not care how awful their stuff is - or maybe a
conversion program has been set up wrongly (a well-known news agency in New
York uses </html> in place of </image> to end pictures for example)

So use the keyword 'end-of-document' to track either nothing - just timeout -
or the REAL end of document.

If the data is NOT html - some XML variant for example - use 'end-of-document'
to track that.

By the way, did you know you can immunise yourself from fingerpost-rants; pls
contact the sales dept.

-- Wrinkles with Ports and RSS

Some RSS servers like to service the initial list from one port - but you have
to grab the data from another
	port:8080
	url:http://finance.yahoo.com

-------------------------------------------------------
Version Control
;005z34	10may05	hourly bandwidth stats files rather than per client
	;a 13may05 balance skiplists if changed
	;b-c 25jun05 added -M and -K
	;d-g 05aug05 added fiphdr:XX data:abcA3 and wait-end-timeout
	;h-k 04sep05 changed -x-X to force not default
	;l-m 07nov05 added 24hour+ skip files
	;n-p 25sep06 added ssl at last
	;q-t 17oct06 added skip-details-tag
	;u 29apr07 major change to linktag, added matchkeys and match-case-sensitive
	;v-w14 21may07 add rest of path if 3rd+ level and no starting '/' (w14 -
tweaks to stuff_cookie)
	;x1-6  8may08 added save-fiphdrs ;3 added -N newname ;6 bugette with VALUES
file and port != 80
		;7 added -e and -E errname ;8 balance fiphdr fields ;9 meta-files ;10-12
minor
		;13-14 note_balance_action ;15-16 spc in url ;17 added pretend-301:200 ;19
allow feed:
		;20-23 finally added basic-authentication: and redid ssl
		;24 bugette/modette - allow multiple spaces in mime headers
		;25 allow intergap of zero
		;26 bugette - save_metadata missing if one and only one found
		;27-29 25jun10 bugette when proxy is a Squid and host changes
	;y1-9 26jul10 added grab-on-tag/endtag (major release) ;10-11 6sep10 bugette
with 302-move and http://...
		;12-14 added matchlogon, bug (bg) with data-type:CSV, plus tom bug :
retry-404-max:3 retry-404-gap:1
		;15-17 14oct10 added skip-save-data and days:Z for weekdays
		;18 15nov10 added use-cookie-redirect: ; 19 able to parse VALUES-FILE: ;20
added nofiphdr
		;21-25 mess if too many 404 plus added -D and fiphdr-hash
		;26-27 16mar11 added repxml for fiphdr: / include fiphdr-file in start of
hdr..
		;28-29 31mar11 added zap-values-file:yes
		;30-32 poll.every secs bugette ;32 added need-proxy-cookie
		;33 6jul11 better skips handling now allow 15000 skips and zap olds with
different skipdetails
		;34 29jul11 added need-logon-token and cookie-host-X for rconnect
		;35-36 added dbl-dblqtes in links plus Bugette in Chunks and redid outque for
speedy
		;37-41 added CONNECT for proxy https plus started minitracking and sleep
between polls for XWEB
		;42 allow multiple spaces in custom tag link and added filter ;43
null_next_link added
		;43-45 added retry-404-error
	;z1-8 15mar12 added eventvalaidation and viewstate and level5* and json
		;9-10 allow multiple grabs, added level to grab-on-tag and matchlinks etc
		;11-12 redid 302 moved to handle full paths better ;13 ;14 bugettes -
proxy/do NOT output file for cookies
		;15 28feb13 tuning for level1template-file:
		;16  4apr13 bug in skips if no headline
		;17-23 11apr13 added trees, levels and keys to fiphdr:,  grab-on*tab: and
linktag:
		;24-28 17may13 added retry-500 kwds and better proxy handling ;27 added
level1mime and -I wireId
		;29-31 17mar14 added 404/500action=move, que and FipHdr ;31 modette-repxml
for all tags
		;32 14apr14 for custom logging ;33 4aug14 added -Z force DF ;34 bugette with
fiphdr.. key:
;004z	07jul04 tweaks...
	;b 01aug04 added fiphdr:....
	;c 10aug04 added levelXtemplate: where X is 2->4
	;d-k 01sep04 -9 speedy and timing stats (f-maxlevel and values bugette)
	;l-n 07oct04 added skps2, fixed one-file,
		fixed HTTP results with no messages
	;o 28oct04 redid skps2
	;p 01dec04 buglette with spaces in URLS - need to be stripped.
		plus lvl1file-lvl5file added
	;s 31dec04 added -x proxy-host, -X proxy-port plus -y/-Y
	;t-u 01feb05 added bandwidth-stats
	;v-w 19feb05 added -u testPid and -U singlelevel only and split into files
		plus bugette with Chunking
	;x-z 18apr05 added -O for rpt-offenders/small-diffs flag
;003z	15dec00 added one output file, tracking sents, only-get-if-modified
	;a 20dec00 added watch on XWEB
	;b/c 22jan01 allow hrefs to be NOT in dbl quotes
			plus added end-of-document
	;d/e 19mar01 started proxies
	;f 17sep01 proxies again
	;g 29oct01 proxies again
	;h 13dec01 minor mods - allow http:name:port in url and proxy
	;i 08jan02 values-fiphdr and bugette with values
	;j 08apr02 bug with one output file - core dump
	;k 01jan03 MACOSX
	;l-p 21jan04 added -h and allows secs for 'every'
		and 'no-data:'
	;q-u 08jun04 added matchlinks/ignorelinks/url and now FipSeq
	;u 27jun04 added -H html, -k ignore skipfile
	;w-z 30jun04 proxy-is-squid added
;002b	24oct00 added values-file
	; 06nov00 added 'every' and Chunks

(copyright) 2014 and previous years FingerPost Ltd.