Converting a Document from XML to Print (PDF,PostScript,etc.)

 

XML-to-Print Procedures:

  1. The following was done with operating system Debian 3.0 (woody). Some non-free software was used (Java).
         
  2. Make an XML document using a DocBook document-type-definition (DTD).
    1. XML:: Where to Begin? - original
    2. XML:: Where to Begin? - HTML

         
  3. Install XSL stylesheets package:
    apt-get install docbook-xsl-stylesheets
          ...there are two packages called "docbook-xsl", one with "-stylesheets" and one without.
  4. Install common Java package:
    apt-get install java-common
    See: /usr/share/doc/java-common/debian-java-faq/ch2.html
  5. Install Java runtime environment:
    1. To install blackdown.org's Java for Debian, edit /etc/apt/sources.list:
      deb ftp://metalab.unc.edu/pub/linux/devel/lang/java/blackdown.org/debian woody main
      deb ftp://metalab.unc.edu/pub/linux/devel/lang/java/blackdown.org/debian woody non-free
    2. dselect update
    3. apt-get install j2re1.3

         
  6. "... to get print, you need an XSLT engine..." such as XT written by James Clark. Link "http://www.jclark.com/xml/xt.html" now points to "http://www.blnz.com/xt/index.html". REFERENCE: /usr/share/doc/docbook-xsl-stylesheets/doc/index.html.
  7. Download "source and binaries" "http://www.blnz.com/xt/xt-20020426a-src.tgz". To use XT, you need "xt.jar" from the above and an XML parser in Java that supports SAX, such as XP.
  8. XP is at: "http://www.jclark.com/xml/xp/" specifically: "ftp://ftp.jclark.com/pub/xml/xp.zip".
  9. Change into a temporary directory (because I don't know how to fix a "java.io.FileNotFoundException" error) and do the following:
    1. Create a symbolic link to a test document: ln -s ~/Democracyu/postcast_msaccess_doc.xml ./test.xml
    2. Copy three JAR files to temporary directory:
      • xp.jar
      • xt.jar
      • sax.jar
    3. Create environment variable:
      • CLASSPATH=xp.jar:xt.jar:sax.jar
      • export CLASSPATH
    4. Create another environment variable to shorten Java syntax:
      • FO=/usr/share/sgml/docbook/stylesheet/xsl/nwalsh/fo/docbook.xsl
    5. There's got to be a way to specify where directory "dtd" is located inside the XML document. Lacking that knowledge, a work-around that solves a "java.io.FileNotFoundException" error is to link to the directory:
      • ln -s /usr/share/sgml/docbook/dtd/ dtd
  10. In the above temporary directory, process XML document to create FO using Java:
    java       com.jclark.xsl.sax.Driver       test.xml $FO >test.fo
         
  11. This is what you should see when the FO file is created successfully:
    file:/tmp/test.xml:1: Making portrait pages on USletter paper (8.5inx11in)
         
  12. The FO file is in an abstract format that cannot be read. It's not markup anymore and it's not suitable for viewing--it's an abstract representation of the document's content.
  13. To convert FO into a printable document using FOP, a formatting object processor available for free from Apache XML Project: http://xml.apache.org/dist/fop/fop-0.20.3-bin.tar.gz
    ... see Running FOP.
    1. JAVA_HOME=/usr
    2. export JAVA_HOME
    3. fop-0.20.3/fop.sh test.fo test.pdf
  14. Even though the above works using "character.fo" that comes with FOP, unfortunately the above results in an error:
  15. Notes: