
Shining a LAMP on XML

Through a series of demonstrations and hands-on exercises, this one-day workshop illustrates and teaches fundamental techniques for creating, managing, and disseminating XML. Participants who come to the workshop with a fundamental understanding of what XML is will go away learning methods for validating, transforming, indexing, and disseminating XML using open source software. Here is an outline of the day's specific topics:

  * Review of XML - The six rules for writing well-formed XML
  * Enumeration of selected DTD's and XML Schemas - XHTML, RDF, TEI, and others
  * Validation - Using xmllint to validate XML
  * Transformation - Using xsltproc to transform XML with XSL
  * Indexing and searching - Using swish-e to index and search XML
  * Dissemination - Using Linux, Apache, Perl/PHP, and MySQL to manage and disseminate content

This workshop could easily be called "Taking XML to the next level." It is primarily designed for people in libraries or other cultural heritage institutions such as museums or archives. Demonstrations and exercises involve full-text prose, images, various types of metadata, and standard approaches to dissemination such as plain ol' HTTP, OAI-PMH, and REST-ful Web Services. Participants are expected to have a prior knowledge of how to read and write well-formed XML files using their favorite plain text editor and Web browser. Some knowledge of Unix/Linux systems administration and some knowledge of programing are desirable but not necesssary. Participants will receive a CD containing the workshop's handout, exercises, as well as the sample data and code.


I. Review of XML - Six simple rules + 1
  A. XML documents always have one and only one root element
  B. Element names are case-sensitive
  C. Elements are always closed
  D. Elements must be correctly nested
  E. Elements' attributes must always be quoted
  F. There are only five entities defined by default
  G. Use namespaces to eliminate vocabulary clashes

II. Enumeration of selected DTD's and Schemas
  A. CIMI XML Schema for SPECTRUM
  B. DocBook
  C. EAD
  D. MARCXML
  E. METS
  F. MODS
  G. OEB
  H. RDF
  I. TEI
  J. XHTML
  K. VRA Core

III. Validating XML
  A. Validated XML goes beyond well-structured XML
  B. xmllint is a pretty good validation tool
  C. Exercises
     1. Vaidating against a SYSTEM DTD - validate against letter.dtd
     2. Validating a PUBLIC DTD - validate a locally written XHTML file
     3. Validating against a schema
     4. Fixing a broken XML document by hand

IV. Transforming XML
  A. XSL transforms XML documents into other things
  B. XSLT is XML language for transforming documents - XSLT basics
     1. XPath
     2. Elements
        a. xsl:apply-template
        b. xsl:attribute
        c. xsl:call-template
        d. xsl:choose
        e. xsl:for-each
        f. xsl:otherwise
        g. xsl:output
        h. xsl:param
        i. xsl:strip-space
        j. xsl:stylesheet
        k. xsl:template
        l. xsl:text
        m. xsl:value-of
        n. xsl:variable
        o. xsl:when
     3. Functions
        a. contains
        b. normalize-space
        c. translate
     4. Very useful entities
        a. tab (&#x9;)
        b. space (&#x20;)
        c. linefeed (&#xa;)
  C. xsltproc is one of the primier transformation tools
  D. Exercises
     1. Transforming MARC to XHTML
     2. Transforming MARC to MARCXML
     3. Transforming MARCXML to MODS (single to single)
     4. Transforming MARCXML to MODS, redux (single to many)
     5. Transforming MODS to SQL
     6. Transforming TEI to XHTML
     7. Fixing a broken XSLT file
     8. Writing your own XSLT file - MODS to XHTML
  E. XSLT as a part of modern browsers
  F. Exercises
     1. Displaying TEI files in your browser
     2. Displaying EAD files in your browser

V. Indexing and searching XML
  A. Swish-e, an all-around indexer
  B. Exercises
     1. Indexing/searching XHTML
     2. Indexing/searching TEI files
     3. Indexing/searching MODS data
     4. Indexing/searching EAD data

VI. Shining a LAMP on XML
  A. XML and MySQL
     1. Using the mysqldump command with the --xml option
     2. Using the mysql command with the --xml option
  B. XML and Perl
     3. Transforming XML with XSLT in Perl
     1. Creating an author/title index
     2. Repairing broken XML files in batch
  C. XML and Apache - Axkit (Water Collection)
  D. CGI interfaces to swish-e indexes of XML files
     1. Simple XHTML files (xhtml.cgi)
     2. Extracting properties (mods.cgi)
     3. XML files with stylesheets (ead.cgi)
     4. Transforming XML to XHTML on the fly (tei.cgi)
     

VII. Web Services
  A. Definition - REST-ful and SOAP-ful implementations
  B. OAI-PMH
  C. SRW/U

VIII. Hack Session

-- 
Eric Lease Morgan
circa July 5, 2004