Reverse detail from Kakelbont MS 1, a fifteenth-century French Psalter. This image is in the public domain. Daniel Paul O'Donnell

Forward to Navigation

Extracting a catalogue of element names from a collection of XML documents using XSLT 2.0

Posted: Sep 15, 2011 17:09;
Last Modified: May 23, 2012 19:05

---

We are trying to build a single stylesheet to work with the documents of two independent journals. In order to get a sense of the work involved, we wanted to create a catalogue of all elements used in the published articles. This means loading as input document directories’ worth of files and then going through extracting and sorting the elements across all the input documents.

Here’s the stylesheet that did it for us. It is probably not maximally optimised, but it currently does what we need. Any suggestions for improvements would be gratefully received.

Some notes:

  1. Our goal was to pre-build some templates for use in a stylesheet, so we formatted the elements names into xsl templates.
  2. Although you need to use this sheet with an input document, the input document is not actually transformed (the files we are extracting the element names from are loaded using the collection() function). So it doesn’t matter what the input document is as long as it is valid XML (we used the stylesheet itself)
<?xml version="1.0"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<!-- this output is because we are going to construct 
ready-made templates for each element -->
    <xsl:output method="text"/>

<!-- for pretty printing -->
    <xsl:variable name="newline">
        <xsl:text> 
        </xsl:text>
    </xsl:variable>

<!-- Load the files 
in the relevant directories -->
    <xsl:variable name="allFiles"
    select="collection(iri-to-uri('file:///some/path/to/the/directories?select=*.xml;
    recurse=yes'))"/>

<!-- Dump their content into a single big pile -->
    <xsl:variable name="everything">
        <xsl:copy-of select="$allFiles"/>
    </xsl:variable>

<!-- Build a key for all elements using their name -->
    <xsl:key name="elements" match="*" use="name()"/>

<!-- Match the root node of the input document
(since the files we are actually working on have been 
loaded using the using the collection() function, nothing 
is actually going to happen to this element) -->
    <xsl:template match="/">

       <!-- this is information required to turn the output into an 
              XSL stylesheet itself -->
        <xsl:text>&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            version="1.0"></xsl:text>
        <xsl:value-of select="$newline"/>
        <xsl:text>&lt;!--Summary of Elements --&gt;</xsl:text>
        <xsl:value-of select="$newline"/>
        <xsl:value-of select="$newline"/>

       <!-- this invokes the collection of all elements in all the files
       in the directory for further processing -->
        <xsl:for-each select="$everything">

           <!-- This makes sure we are dealing with the first named key -->
            <xsl:for-each 
            select="//*[generate-id(.)=generate-id(key('elements',name())[1])]">

               <!-- sort them -->
                <xsl:sort select="name()"/>

                <xsl:for-each select="key('elements', name())">

                   <!-- this makes sure that only the first instance 
                    of each element name is outputted -->
                    <xsl:if test="position()=1">
                        <xsl:text>&lt;xsl:template match="</xsl:text>
                        <xsl:value-of select="name()"/>
                        <xsl:text>"> </xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:text>&lt;!--</xsl:text>
                        <!-- this counts the remaining occurences -->
                        <xsl:value-of select="count(//*[name()=name(current())])"/>
                        <xsl:text> occurences</xsl:text>
                        <xsl:text>--&gt;</xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:text>&lt;/xsl:template></xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:value-of select="$newline"/>
                    </xsl:if>
                </xsl:for-each>
            </xsl:for-each>
        </xsl:for-each>
        <xsl:value-of select="$newline"/>
        <xsl:text>&lt;/xsl:stylesheet></xsl:text>
    </xsl:template>
</xsl:stylesheet>
----  

Commenting is closed for this article.

Back to content

Search my site

Sections

Current teaching

Recent changes to this site

Tags

anglo-saxon studies, caedmon, citation, citation practice, citations, composition, computers, digital humanities, digital pedagogy, exercises, grammar, history, moodle, old english, pedagogy, research, student employees, students, study tips, teaching, tips, tutorials, unessay, universities, university of lethbridge

See all...

Follow me on Twitter

At the dpod blog