Conventions, symbols, and encoding

[Skip to Content]

Contents

General Navigation

This chapter

[Beginning of Content]

Introduction

§ ii.1    The following sections describe conventions, symbols, and encoding practices used in Cædmon’s Hymn: a multimedia edition and archive. Editorial principles are discussed in a separate introductory chapter (Chapter 7: Editorial introductions).

§ ii.2    This chapter is divided into two major sections. The first, Conventions and symbols, describes the conventions, symbols, and abbreviations used in formatting the text for human readers (print and CD-ROM). The second, Encoding, describes conventions used in the edition’s SGML (computer) markup (CD-ROM only), concentrating particularly on aspects in which these conventions diverge from the standards of the TEI. This section is intended for readers interested in technical aspects of humanities computing. It assumes a basic knowledge of the conventions of the TEI and SGML or related languages.

Conventions and symbols

§ ii.3    Cædmon’s Hymn: a multimedia edition and archive uses a variety of symbols and abbreviations (Print and CD-ROM), and colour conventions (CD-ROM only) in its text, transcriptions, and editions.[1]

Typographical symbols and abbreviations

Manuscript sigla (Print and CD-ROM)

§ ii.4    The following sigla are used in Cædmon’s Hymn: a multimedia edition and archive to refer to individual manuscripts. On the CD-ROM, manuscripts containing a copy of the vernacular text of Cædmon’s Hymn are also coloured-coded to indicate their affiliation: purple for members of the Northumbrian aelda recension; teal for members of the Northumbrian eordu recension; green for members of the West-Saxon eorðan recension; navy for members of the West-Saxon ylda recension; and olive for members of the West-Saxon eorðe recension. Other manuscript sigla are displayed in black type.

Linguistic and palaeographic symbols (Print and CD-ROM)

§ ii.5    The following symbols are used in linguistic and palaeographic contexts:

<graphemic>

Pointed brackets indicate that a text or character is being transcribed (and discussed) graphemically (see Crystal 1987, 194-195). In such contexts, ligatures are indicated using a plus sign (+). The ligature & (i.e. Latin “et,” English “and”), for example, would be transcribed graphemically <e+t>. Commas are used to separate forms in a list.

/phonemic/

Slanted brackets indicate that the enclosed text is being transcribed phonemically (see Crystal 1987, 160-161). Phonemic transcriptions use the IPA (International Phonetics Association 1999). Commas are used to separate individual or groups of phonemes in a list.

[phonetic]

Square brackets indicate that the enclosed text is being transcribed phonetically (see Crystal 1987, 152-159 and 160-161). Phonetic transcriptions use the IPA (International Phonetics Association 1999). Commas are used to separate individual or groups of phones in a list.

Diplomatic and editorial symbols (Print and CD-ROM)

§ ii.6    The following symbols are used in editions and transcriptions. They are also used in other contexts when editorial or diplomatically transcribed forms are cited. On the CD-ROM, these symbols are further enhanced through the use of colour: red is used to indicate the presence of a correction; teal is used to indicate text that has been (physically) damaged in some way.

deletion

Strikethrough indicates the physical deletion of text in a witness. Deletion may be by any method (underlining, punctum delens, erasure, overwriting, etc). The precise method of deletion is usually indicated by a note in the transcription of the relevant witness. The deleted text is recorded whenever possible. If deleted text cannot be recovered, it is replaced by colons.

\addition/

Upward sloping brackets indicate that the enclosed text has been added above the manuscript line. If a caret was used, this is indicated: \addition/.

|addition|

Vertical brackets indicate that the enclosed text has been inserted between existing characters within the manuscript line. Insertion is distinguished from overwriting (i.e. the conversion of one character to another or the addition of a new character in the space left by a previously deleted form).

{addition}

Brackets indicate that the enclosed text has been added over some pre-existing form. This addition may involve the conversion of one letter to another (for example, the conversion of <o> to <d> by the addition of an ascender), or the addition of new text in the place of a previous erasure. The overwritten text is treated as a deletion.

/addition\

Downward sloping brackets indicate that the enclosed text has been added below the manuscript line.

addition| or |addition

A single vertical bar indicates that the text has been added at the beginning or end of a manuscript line. Text preceded by a single vertical bar has been added at the end of a manuscript line. Text followed by a single vertical bar has been added at the beginning of a manuscript line. Text between two vertical bars has been added “within the line” (i.e. between pre-existing letters or words).

damage

Underlining indicates that text has been damaged. When damaged text is unclear or illegible, additional symbols are used. On the CD-ROM, damaged text is also teal in colour.

unclear

Angle brackets indicate that the enclosed text is unclear for some physical reason (e.g. rubbing, flaking, staining, poorly executed script).

[supplied] or [emended]

Square brackets indicate that the enclosed text is being supplied or emended. “Supplied text” refers to the hypothetical restoration of original readings from a specific witness that have become lost or illegible due to some physical reason. “Emended text” refers to the replacement of legible text from extant witnesses by a modern editor or transcriber.

::

Colons represent text that is completely effaced or illegible. This symbol is not used in transcriptions and citations of the almost completely erased Bd. Here text has been simply supplied, based in most cases on the forms in W.

Textual apparatus symbols (CD-ROM only)

§ ii.7    As noted below, § 7.4, Cædmon’s Hymn: a multimedia edition and archive allows users of the CD-ROM to select from several different views of its textual apparatus. Most apparatus views follow traditional, print-based conventions; the exception is the Analytic All witnesses view (see § 7.7), where the following typographical symbols are used:

[orthographic group]

In the textual apparatus, square brackets indicate that the enclosed form(s) belong to a single orthographic group. An orthographic group may consist of a single unique spelling or a group of identical spellings. Readings within a single orthographic group are always substantively identical and show identical potential significance.

(substantively identical group)

In the textual apparatus, parentheses indicate that the enclosed forms are substantively identical. Substantively identical groups always include one or more orthographic groups.

{identical potential significance}

In the textual apparatus, brackets indicate that the enclosed forms show identical potential significance. Groups showing identical potential significance always consist of one or more substantively identical groups.

These symbols are always nested. The first word of the Northumbrian aelda recension (Nu), for example, appears in orthographically identical forms in both surviving witnesses; in the analytic view of the All witness apparatus, this agreement is represented as follows:

Nu] {([Nu M Nu P])}.

The first word of line 4b in the West-Saxon eorðan recension (or), on the other hand, shows a variety or orthographic, substantive, and (potentially) significant variants; in the analytic view of the All witness apparatus, this variation is represented as follows:

or] {([or T1 or| N][oór O Pre-Correction])}{([ord| B1 ord Ca][oór\d/ O Post-Correction])}{([ær| To])}

I believe this system is unique to this edition. While it can seem overly complex at first, particularly for lemmas showing little or no variation, its ability to provide a concise overview of the poem’s entire textual history also makes it quite powerful in more complex examples. I have found it to be the most useful apparatus view in my own textual studies.

Colour (CD-ROM only)

§ ii.8    Colour is used on the CD-ROM to indicate the presence of hyperlinks and to associate individual witnesses with specific recensions of Cædmon’s Hymn.

Blue

Blue text indicates the presence of an unclassified hyperlink. Unclassified hyperlinks include chapter and section cross-references, bibliographic cross-references, and links to notes or apparatus entries. The target of an unclassified hyperlink may be within the same document or in an external document. Hyperlinks change colour and become underlined whenever they are selected.

Purple

Purple text indicates an association with the Northumbrian aelda recension of Cædmon’s Hymn. The colour is used particularly with manuscript sigla in the introductory chapters and textual apparatus.

Teal

Teal text indicates an association with the Northumbrian eordu recension of Cædmon’s Hymn. The colour is used particularly with manuscript sigla in the introductory chapters and textual apparatus.

Teal is also used in diplomatic transcriptions to indicate text that is physically damaged or illegible.

Green

Green text indicates an association with the West-Saxon eorðan recension of Cædmon’s Hymn. The colour is used particularly with manuscript sigla in the introductory chapters and textual apparatus.

Olive

Olive text indicates an association with the West-Saxon eorðe recension of Cædmon’s Hymn. The colour is used particularly with manuscript sigla in the introductory chapters and textual apparatus.

Navy

Navy text indicates an association with the West-Saxon ylda recension of Cædmon’s Hymn. The colour is used particularly with manuscript sigla in the introductory chapters and textual apparatus.

Red

Red text indicates the presence of a correction.

Encoding

§ ii.9    Cædmon’s Hymn: a multimedia edition and archive is encoded in SGML following the TEI P4 standard for the encoding of text (Sperberg-McQueen and Lou Burnard 2001).[2] Encoding is for the most part strictly conformant with this standard. The project uses one element not found in the P4 guidelines (<EMEND>), adjusts the content models of three elements (<CELL>, <CORR>, and <W>), and adds attributes to one element (<W>). The edition is noteworthy also for its use of three elements: the TEI elements <CORR> and <RDGGRP>, and the non-TEI element <EMEND>. A final subsection discusses the encoding of primary sources in this project.

New elements

§ ii.10    Cædmon’s Hymn: a multimedia edition and archive adds one element to the TEI standard: <EMEND>. The new element is identical to (a modified form of) the standard TEI element <CORR> in content, attributes, and syntax. The main difference is conceptual: in this work <CORR> is used to encode physical alterations to the text as it appears in a source document; the <EMEND> element is used to encode alterations and corrections not involving the physical adaptation of a source document. An example of a correction is the (scribal) alteration of O ƿero to ƿera. An example of an emendation is the alteration of the nonsensical reading tida found in all manuscripts of the West-Saxon ylda recension to teode in the critical text.

Changes in content model

§ ii.11    Cædmon’s Hymn: a multimedia edition and archive alters the content models of three elements: <CELL>, <CORR>, and <W>. In the case of <W>, this change also involves a change in attributes.

<CELL>

§ ii.12    The content model of <CELL> has been modified to allow specialPara content (specifically lines and paragraphs).

<CORR>

§ ii.13    The content model of <CORR> has been modified to exclude CDATA (the project-specific element <EMEND> shares this model). This change is in keeping with the syntax used in encoding corrections and emendations described below (Syntax).

<W>

§ ii.14    The content model of <W> has been modified to allow the inclusion of textual critical elements such as <DAMAGE>, <CORR>, and <UNCLEAR>. This is to allow the encoding of words in diplomatic transcriptions, textual apparatus, and citations from individual manuscripts in the introductory chapters. <W> elements found in the main text of the critical editions are encoded in conformance with the TEI P4 content model. Two new attributes, basedOn and copiedFrom, have also been added to the element. These are discussed below.

Attributes

§ ii.15    The attributes basedOn and copiedFrom have been added to <W>. These elements take IDREF datatypes. basedOn is used to refer to the source of an editorially modified form. Modification may involve the removal of diplomatic information, alterations in the use of capital letters, or similar minor modifications. Most readings of the Northumbrian aelda recension, for example, are derived from M; the association between the editorial and witness readings is indicated by the use of the specific IDREF for the relevant manuscript form on the basedOn attribute of the editorial reading. copiedFrom is used to indicate identity with a form used elsewhere in the project. Citations in the introductory chapters from individual witnesses, readings, or apparatus entries, for example, always have the IDREF of the source reading on the copiedFrom attribute.

Syntax

Element Nesting

§ ii.16    Three elements in Cædmon’s Hymn: a multimedia edition and archive are always used with nested data: <CORR>, <EMEND>, and <RDGGRP>. In the case of the first two elements, this nesting involves a change in content model; in the case of the third, the nesting canonises a possible syntax.

<CORR>, <EMEND>

§ ii.17    The TEI element <CORR> and its project-specific cousin <EMEND> are used in a fashion at variance with that suggested by the TEI guidelines. In Cædmon’s Hymn: a multimedia edition and archive, corrections and emendations are understood as processes by which an original form is altered by deletion and/or addition. For this reason, <CORR> and <EMEND> tags always contain at least one <ADD> or <DEL> element. Thus, for example, the O reading ƿero{a} is encoded as follows (attributes have been simplified for demonstration purposes):

&#x01BF;er<corr> <del rend="overwriting">o</del><add place="overwriting">a</add></corr>

The emended West-Saxon ylda recension reading t[eo]d[e], similarly, is encoded as follows (attributes have been simplified for demonstration purposes):

t<emend><del>i</del> <add>eo</add></emend>d<emend> <del>a</del><add>e</add> </emend>

§ ii.18    This usage is an adaptation of the recommendations for the encoding of substitution given in Sperberg-McQueen and Lou Burnard 2001 § 18.1.5; it may indeed represent an improvement, since the nesting of <DEL> and <ADD> elements within <CORR> or <EMEND> makes their relationship explicit (cf. § 18.1.5) and anticipates some aspects of the proposed (P5) <CHOICE> element. A potential problem with this usage, however, involves the change in scope. In the TEI guidelines, <CORR> is understood as a “mirror image” of <SIC> and as sharing approximately the same scope as <ADD> and <DEL>. By seeing <CORR> as a process of addition and deletion rather than a state of correction, the modified syntax in this edition removes the equivalency with <SIC>; by nesting <ADD> and <DEL>, the project also requires the elements to have a difference in scope: <CORR> (and <EMEND>) are used to describe a process of correction or emendation; <ADD> and <DEL> are used to describe an actual state of deletion or addition.

Readings and reading groups

§ ii.19    With one exception, <RDG>s are always nested in three <RDGGRP> elements. The exception is the parallel witness apparatus (where <RDG> elements are collected in a single <RDGGRP>). Each reading group corresponds to a bracket in the analytic all witness view discussed above. The apparatus for or discussed above is thus encoded as in the following example ([<ex4b1.htm>])


Notes

[1]In keeping with accessibility guidelines, colour is used on the CD-ROM as an enhancement only: all information conveyed by colour is also encoded on the title attribute of elements in the xhtml output.

[2]The project was originally encoded according to the P3 standard. It has been corrected and validated against the P4 dtd, however.