ipxpdf This program extract elements - mainly text - from PDF files. It uses a parameter file in tables/setup. This can be selected from the DF FipHdr file and defaults to XPDF.FIP Keywords for the parameter file are : ; comment line newname: (FipSeq) name of the output file default: the same as input eg replace:Q1 SN .pdf="" newname:\Q1.article.\$v.txt supercede: yes/no Overwrite the output if it exists default: do not outque: output queue for the new file. default: spool/2go or -o switch doneque: done queue for the old file. default: none or the -d switch infoque: queue for the hints/stats/info file. default: none checkque: if a PDF errors - not readable, no pages etc - put the input file in this folder, so it can be reviewed/checked manually default: none extra-fiphdr: more fixed Fip Hdr fields to add to the file (before any new matched additions) default: none extra-fiphdr-file: (File in tables/setup) default: none Include the contents of this file in the FipHdr script: script to run against the New file. default: none eg ; clean up some of the crap... full path/filename is added to end script:/fip/bin/ipxchg -D xpdf_clean -1 want-data: yes/no/pdf Rip apart the text from the PDF ? want-data:yes - flag metadata and render PDF to text (default) want-data:no - flag metadata only and ignore all data want-data:pdf - flag metadata and preserve the PDF as data The default is YES for text (not PDF) but the default may be changed by the -D input switch use-sx: or use-external-file: if there is an SX FipHdr field with a path to the data file, use that rather than the data in the input file. fiphdr-for-page-width: (2 letter FipHdr field) Put the Page Width (Media or Crop Box) in this FipHdr zone - default: ignored. fiphdr-for-page-height: (2 letter FipHdr field) Put the Page Width (Media or Crop Box) in this FipHdr zone - default: ignored. fiphdr-for-pdf-version: (2 letter FipHdr field) Put the PDF version in this FipHdr zone - default: ignored. fiphdr-for-page-total: (2 letter FipHdr field) Put the no of pages in the document in this FipHdr zone - default: ignored. fiphdr-for-docinfo-total: (2 letter FipHdr field) Put the no of doc info elements in the document in this FipHdr zone - default: ignored. fiphdr-for-docinfo: (2 letter FipHdr field) Put ALL the doc info elements in the document in this FipHdr zone - default: ignored. They are separated by a pipe : eg AB:Producer-DynaPDF 2.5.4.557|Creator-Asura Version 9.6 (SR 3)|OneVisionQueueName-Q229_WORKFLOW_2_PAIRSCORCERER|Title-HA-A-LEI-15-08-13-p012.eps|OneVisionDongleID-_9WXs9sImmNuhtq9|OneVisionCreationDate-D:20130813184434+01'00'|OneVisionProducer-OneVision PDFengine (Windows Build 21.066.S)|OneVisionCreator-Asura Version 9.6 (SR 3)|Author-asuraadmin| log-line: extra logging information for the Fip log default: none Logging is done at the end of each page EN is filename EP is path S1 MAY be the size S2 is the pagenumber of pages generated from this input file show-changes:yes/all/no or a series of entries Show point size and font information inline default: no tags such as <font.Arial> <ptsize.8.04> are added no - display nothing (default) all - display all stye changes font - display font changes ptsize - display pointsize changes x - display x posn of line from left y - display y posn of line from bottom add-space-x:NO or (number of chrs) added-space-chr: (FipSeq single chr) default: SPC Where 'show-changes:no' or NOT displaying the 'x' position, add a space between blocks of text if the gap between them is >= (ptsize * add-space-x) This number can smaller than '1'. default: 1.0 (ie the start of the next block (on the right of the line) is more than a single chr width from the end of the last block) max-body-ptsize: (number in points) default: 15.0 gutter-x: (number in points) default: 6 when reading DOWN, what is the approx gutter between columns min-col-x: (number in points) default: 90.0 when reading DOWN, what is the approx col width for grouping elements read-direction:down/across Is the text in multiple columns across the page ? default: down If so, should the columns be read DOWN - like a magazine page or should the columns be read ACROSS - like a spreadsheet output-single-file:no/yes default: no if yes, ignore the PDF page end and continue to write in the same output file group-furniture:yes/no Group all Furniture items at the top of the output file default: no Furniture items are flagged by font - see below group-headings:yes/no Group all Headings at the top of the output file default: no Headings are flagged by font - see below symbol-font-default-char:* symbol-font-char:l<Bullet> symbol-font-char:L<bullet> If the font is flagged as a Symbol font (internal PDF setting), map the data to these strings. font:(Name) type:(type) min;(minPtSize) max:(maxPtSize) type can be body, head, caption or furniture font:IdentikalSansRegular type:body font:AGBook-Stencil type:head min:14 font:IdentikalSansBold type:head font:DIN-Regular type:caption font:MagistralA type:furniture force-single-caps: yes/no By default single letters are forced uppercase Use this when massive letterspacing produces lots of single letters ! Where sections of FipHdr fields are required or changes to the output style, use keywords : fixed, partial, combie, optional, repeat, newdate and/or style. (see The SysAdmin manual for more information). They are normally specified : fixed:QZ 1234543 partial:QT ST,3,2,U,<,> combie:QY ep|na,(0000000)a option:QE ep,11,7,s repeat:QK XK,-,3 or repeat:QP PK,,4,#X style:QS XN,%.03d The FipHdr of the incoming file can also be used to change the PDF_FIPHDR:(yes/no) Add/Dont add the FipHdr to the output file default: add PDF_OUTQUE:(FipSeq) Output folder to override the -o Input switch default: /fip/spool/2go Input Parameters are (all optional) : Either -1 : path/filename for single shot default: spooled The input file is NOT deleted If this does NOT start with a '/', it is assumed relative to the current path. Or -i : input queue default: spool/xpdf If this does NOT start with a '/', it is assumed under spool. -d : done folder for the input file in FipSeq default: none If this does NOT start with a '/', it is assumed under spool. -D : default for want-data -D no -D yes or -D pdf default: -D yes for text -L : do NOT log files in default: log -o : output queue default: spool/2go If this does NOT start with a '/', it is assumed under spool. -w : file wait for files arriving across a network. default: no wait -z : default parameter file in tables/setup default: tables/setup/XPDF.FIP -v : print version number and exit ---NOTES--- Version Control ;0z 10feb10 original version ;p 11sep13 added want-data and the 6 fiphdr-for.. ;q-r 2nov13 added read-direction and single-output-file ;s-t 9apr14 added force-single-caps ;u-y 14may14 allow unlimited number of lines and added add-space-x ;z 15feb17 dyna v4 (copyright) 2024 and previous years FingerPost Ltd.