When everyone’s super… On gaming the systemtags: digital humanities, history, peer review, scholarly publishing, universities
note: first published on the dpod blog
Syndrome: Oh, I’m real. Real enough to defeat you! And I did it without your precious gifts, your oh-so-special powers. I’ll give them heroics. I’ll give them the most spectacular heroics the world has ever seen! And when I’m old and I’ve had my fun, I’ll sell my inventions so that everyone can have powers. Everyone can be super! And when everyone’s super… [chuckles evilly] no one will be.
Here’s a funny little story about how a highly specialised journal gamed journal impact measurements:
The Swiss journal Folia Phoniatrica et Logopaedica has a good reputation among voice researchers but, with an impact factor of 0.655 in 2007, publication in it was unlikely to bring honour or grant money to the authors’ institutions.
Now two investigators, one Dutch and one Czech, have taken on the system and fought back. They published a paper called ‘Reaction of Folia Phoniatrica et Logopaedica on the current trend of impact factor measures’ (H. K. Schutte and J. G. Švec Folia Phoniatr. Logo.59, 281–285; 2007). This cited all the papers published in the journal in the previous two years. As ‘impact factor’ is defined as the number of citations to articles in a journal in the past two years, divided by the total number of papers published in that journal over the same period, their strategy dramatically increased Folia‘s impact factor this year to 1.439.
In the ‘rehabilitation’ category, shared with 26 other journals, Folia jumped from position 22 to position 13.
—“Tomáš Opatrný. Playing the system to give low-impact journal more clout:. Nature 455, 167 (11 September 2008).
Assessing (and hence demonstrating) impact is a difficult but important problem in contemporary academia.
For most of the last century, university researchers have been evaluated on their ability to “write something and get it into print… ‘publish or perish’” (as Logan Wilson put it as early as 1942 in The Academic Man: A Study in the Sociology of a Profession, one of the first print citations of the term).
As you might expect, the development of a reward system built on publication led to a general increase in number of publications. Studies of science publication suggest a growth rate in the number of scientific articles and journals of between 2 and 5% per year since 1907 (a rate that leads to doubling roughly every 15 years). There is also evidence for a particularly marked rise in numbers after the 1950s.
This kind of growth vitiates the original point of the metric. If everybody publishes all the time, then the simple fact of publication is no longer sufficient as a proxy for excellence. You could count the sheer number of publications—a measure that is in fact widely used in popular contexts to imply productivity—were it not so obviously open to abuse: unless you institute some kind of control over the type and quality of publication, a system that simply counts publications will lead inevitably to an increase in number, and a corresponding decrease in quality, originality, and length.
It is perhaps for this reason that modern peer review systems begin to be institutionalised in the course of the second half of the last century. In fact, while peer review is probably understood to be the sine qua non of university research, and while it is possible to trace sporadic examples of activity resembling peer review back into the classical period, peer review in its modern form in fact really only begins to take shape only in the period from the 1940s-1970s. Major scientific journals, including Science and The Journal of the American Medical Association, for example, begin to make systematic use of external reviewers only in the 1940s, partially as an apparent response to the growing number and specialisation of submissions.
As you might expect, the peer review/reward system has itself been gamed. In the same way a reward system built on counting publications leads inevitably to an increase in the number of publications, a reward system build on counting peer-reviewed publications leads, inevitably, to an increase in the number of peer-reviewed publications… and the size and number of the journals that publish them.
Journal impact measurements are a controversial response to the not-surprising fact that peer review has also become an insufficient proxy for excellence. It is still relatively early days in this area (though less so in the natural sciences) and there is as yet not a complete consensus as to how impact should be quantified. As a result, the measures can still take many forms, from lists of ranked journals, to citation counts, to circulation and aggregation statistics, to in the case of on-line journals even more difficult-to-interpret statistics such as bounce and exit rates.
Regardless of how the impact factor debate settles out, however, it is only a matter of time until it too is gamed. Indeed, as the example of Folia Phoniatrica et Logopaedica suggests, it even may not be a matter of time. If you count citations, researchers will start ensuring they get cited. If you rank journals, they will ensure their journals fit your ranking criteria. If you privilege aggregation, the aggregators will be flooded with candidates for aggregation. And it is not clear that commercial understandings of good web analytics are really appropriate for scholarly and scientific publishing.
But the Folia Phoniatrica et Logopaedica example is also interesting because I’m not sure it is a bad thing. I can’t independently assess Opatrný’s claim that the journal is well respected though faring badly in impact measurements, but it wouldn’t surprise me if he was right. And the fact that a single researcher in a single article was able to more than double his journal’s impact score simply by citing every paper published in the journal in the previous two years leaves me… quite happy for him. I doubt there are many people who would consider the article cited by Opatrný to be in some way fraudulent. Instead, I suspect most of us consider it evidence (at best) that there are still some bugs in the system and (at worst) of a successful reductio ad absurdum–similar in a certain sense to Alan Sokol’s submission to Social Text.
None of this means that impact metrics are an intrinsically bad thing. Or that peer review isn’t good. Or that researchers shouldn’t be expected to publish. In fact, in many ways, the introduction of these various metrics, and the emphasis they receive in academia, is very good. Peer review has become almost fully institutionalised in the humanities in the course of my career. When I was a graduate student in the early 1990s, most journals I submitted to did not have formal explanation of their review policies and many were probably not, strictly speaking, peer reviewed. But it was difficult to tell and nobody I knew even attempted to distinguish publications on their CVs on the basis of whether or not they were peer reviewed. We were taught to distinguish publications (and the primary metric was still number of publications) on the basis of genre: you separated reviews from encyclopedia entries from notes from lengthy articles. A review didn’t count for much, even if we could have shown it was peer reviewed, and a lengthy article in what “everybody knew” to be a top journal counted for a lot, whether it was peer reviewed or not.
By the time I was department chair, 10 years later, faculty members were presenting me with CVs that distinguished output on the basis of peer review status. In these cases, genre was less important that peer review status. Reviews that were peer-reviewed were listed above articles that weren’t and journals began being quite explicit about their reviewing policies. The journal I helped found, Digital Medievalist, began from its first issue with what we described as “ostentatious peer review”: we named the referees who recommended acceptance on every article, partially as a way of borrowing their prestige for what we thought was, at the time, a fairly daring experiment in open access publication.
But we did this also because we thought (and think) that peer review is a good thing. My peer reviewed articles are, in almost every case, without a doubt better written and especially better and more carefully argued than my non-peer-reviewed articles. I’ve had stupid comments from referees (though none as stupid as seems to be the norm on grant applications), but there is only one case I can think of where I really couldn’t see how satisfying what the referee wanted wouldn’t improve things.
And the same is true for publication frequency. On the whole, my experience is that people who publish more (within a given discipline) also tend to publish better. I don’t publish too badly for somebody in my discipline. But most of the people who publish more than me in that same discipline are people I’d like to emulate. It is possible to game publication frequency; but on the whole, even the people who (I think) game it are among our most productive and most interesting scholars anyway: they’d still be interesting and productive even if they weren’t good at spinning material for one article into three.
So what does it mean that Schutte and Švec were able to game the impact measure of their journal with such apparent ease? And what should we say in response to the great uproar (much of it in my view well-founded) about the introduction of journal ranking lists by the ESF and Australian governments in recent years? Obviously some journals simply are better than others–more prestigious, better edited, more influential, containing more important papers. And it is difficult to see how frequency of citation is a bad thing, even if its absence is not necessarily evidence something is not good or not important. I would still rather have a heavily cited article in the PMLA than an article nobody read in a journal nobody has ever heard of.
Perhaps the most important thing is that it suggests, as Barbossa says to Miss Turner in Pirates of the Caribbean concerning the “Pirates’ Code,” that these kind of metrics should really be considered “more what you’d call ‘guidelines’ than actual rules.” Journals (and articles), that have a high impact factor, lots of citations, and are heavily read, are probably to be celebrated. But impact, citations, and subscription are not in themselves sufficient proxies for quality: we should expect to find equally good articles, journals, and scholars to exist with lower numbers in all these areas. And more importantly, we should expect to find that any quantifiable criteria we do establish will almost immediately be gamed by researchers in the field: most people with PhD-level research positions got where they are, after all, because they were pretty good at producing what examiners wanted to hear.
The real issue, then, is that metrics like “impact” or “peer review” or even “quantity” are attempts to use quantitative values as substitutes for qualitative assessment. The only real way of assessing quality is through qualitative assessment: that is to say by assessing a work on its own merits in relation to the goals it sets itself in terms of audience, impact, and subject matter, including the reasonableness of these goals. An article by an author who is not famous, in an obscure field, in a on-line journal that has no subscribers, and is not frequently cited may or may not represent poor quality work–in much the same way as might a frequently cited article in a popular field in a journal that is published by a famous academic, in the journal of the main scholarly society in a discipline. What is (or should be) important to the assessor is how reasonably each author has defined his or her goals and how well the resulting work has done in relation to those goals.
And this is where academics’ ability to game any other system becomes a virtue. Since there is no single metric we can create that researchers as a group will not figure out how to exploit (and then in short order), we should accept that we will simply never be able to propose a quantitative measurement for assessing intrinsic quality. What we can rely on, however, is that researchers will, on the whole, try to present their work in its best light. By asking the researchers to explain how their work can be best assessed, and being willing to evaluate that both that explanation and the degree to which the work meets the proposed criteria, we can find a way of comparatively evaluating excellence. Journals, articles, and researchers, that define, then meet or exceed reasonable targets for their disciplines and types of work, are excellent. Those that don’t, aren’t.
And in the meantime, we’ll develop far more innovative measurements of quality.
Posted: Wednesday May 23, 2012. 19:29.
Last modified: Wednesday May 23, 2012. 21:15.
An Anglo-Saxon Timelinetags: anglo-saxon studies, computers, history, kings, medieval studies, students, study tips, timelines, xml
The following link is to an experiment in constructing a timeline of the Anglo-Saxon period: http://people.uleth.ca/~daniel.odonnell/Anglo-Saxon_Kings.xml It is very much a work in progress at the moment. The ultimate goal will be to have a synoptic oversight and index that will allow students to click on major events, persons, or cultural artefacts and then see how they fit in with other milestones.
At the moment, the chart only includes Kings. And even then still in fairly rough fashion.
Posted: Sunday July 20, 2008. 10:39.
Last modified: Wednesday May 23, 2012. 19:00.
Back to the future: What digital editors can learn from print editorial practice.tags: computers, database, digital humanities, editorial studies, history, queries, textual studies
A ersion of this essay was published in literary and Linguistic Computing
Digital Editing and Contemporary Textual Studies
The last decade or so has proven to be a heady time for editors of digital editions. With the maturation of the digital medium and its application to an ever increasing variety of cultural objects, digital scholars have been led to consider their theory and practice in fundamental terms (for a recent collection of essays, see Burnard, O’Keeffe, and Unsworth 2006). The questions they have asked have ranged from the nature of the editorial enterprise to issues of academic economics and politics; from problems of textual theory to questions of mise-en-page and navigation: What is an Edition? What kinds of objects can it contain? How should it be used? Must it be critical? Must it have a reading text? How should it be organised and displayed? Can intellectual responsibility be shared among editors and users? Can it be shared across generations of editors and users? While some of these questions clearly are related to earlier debates in print theory and practice, others involve aspects of the production of editions not relevant to or largely taken for granted by previous generations of print-based editors.
The answers that have developed to these questions at times have involved radical departures from earlier norms1. The flexibility inherent to the electronic medium, for example, has encouraged editors to produce editions that users can manipulate interactively, displaying or suppressing different types of readings, annotation, and editorial approaches, or even navigate in rudimentary three-dimensional virtual reality (e.g. Railton 1998-; Foys 2003; O’Donnell 2005a; Reed Klein 2001; Ó Cróinín nd). The relatively low production, storage, and publication costs associated with digital publication, similarly, have encouraged the development of the archive as the de facto standard of the genre: users of digital editions now expect to have access to all the evidence used by the editors in the construction of their texts (assuming, indeed, that editors actually have provided some kind of mediated text): full text transcriptions, high-quality facsimiles of all known witnesses, and tools for building alternate views of the underlying data (e.g. Kiernan 1999/2003; Robinson 1996). There have been experiments in editing non-textual objects (Foys 2003; Reed-Kline 2001), in producing image-based editions of textual objects (Kiernan 1999/2003), and in recreating digitally aspects of the sensual experience users might have had in consulting the original objects (British Library nd). There have been editions that radically decenter the reading text (e.g. Robinson 1996), and editions that force users to consult their material using an editorially imposed conceit (Reed-Kline 2001). Even elements carried over from traditional print practice have come in for experimentation and redesign: the representation of annotation, glossaries, or textual variation, for example, are rarely the same in any two electronic editions, even in editions published by the same press (see O’Donnell 2005b, § 5)2.
Much of the impetus behind this theoretical and practical experimentation has come from developments in the wider field of textual and editorial scholarship, particularly work of the book historians, new philologists, and social textual critics who came into prominence in the decade preceding the publication of the earliest modern digital editorial projects (e.g. McKenzie 1984/1999; McGann 1983/1992; Cerquiglini 1989; Nicols 1990; for a review see Greetham 1994, 339-343). Despite significant differences in emphasis and detail, these approaches are united by two main characteristics: a broad interest in the editorial representation of variance as a fundamental feature of textual production, transmission, and reception; and opposition to earlier, intentionalist, approaches that privileged the reconstruction of a hypothetical, usually single, authorial text over the many actual texts used and developed by historical authors, scribes, publishers, readers, and scholars. Working largely before the revolution in Humanities Computing brought on by the development of structural markup languages and popularity of the Internet, these scholars nevertheless often expressed themselves in technological terms, calling for changes in the way editions were printed and organised (see, for example, the call for a loose leaf edition of Chaucer in Pearsall 1985) or pointing to the then largely incipient promise of the new digital media for representing texts as multiforms (e.g. McGann 1994; Shillingsburg 1996).
Digital Editing and Print Editorial Tradition
A second, complementary, impetus for this experimentation has been the sense that the digital editorial practice is, or ought to be, fundamentally different from and even opposed to that of print. This view is found to a greater or lesser extent in both early speculative accounts of the coming revolution (e.g. McGann 1994; the essays collected in Finneran 1996 and Landow and Delaney 1993) and subsequent, more sober and experienced discussions of whether digital practice has lived up to its initial promise (e.g. Robinson 2004, 2005, 2006; Karlsson and Malm 2004). It is characterised both by a sense that many intellectual conventions found in print editions are at their root primarily technological in origin, and that the new digital media offer what is in effect a tabula rasa upon which digital editors can develop new and better editorial approaches and conventions to accommodate the problems raised by textual theorists of the 1980s and 1990s.
Of course in some cases, this sense that digital practice is different from print is justified. Organisational models such as the Intellectual Commons or Wiki have no easy equivalent in print publication (O’Donnell Forthcoming). Technological advances in our ability to produce, manipulate, and store images cheaply, likewise, have significantly changed what editors and users expect editions to tell them about the primary sources. The ability to present research interactively has opened up rhetorical possibilities for the representation of textual scholarship difficult or impossible to realise in the printed codex.
But the sense that digital practice is fundamentally different from print has been also at times more reactionary than revolutionary. If digital theorists have been quick to recognise the ways in which some aspects of print editorial theory and practice have been influenced by the technological limitations of the printed page, they have been also at times too quick to see other, more intellectually significant aspects of print practice as technological quirks. Textual criticism in its modern form has a history that is now nearly 450 years old (see Greetham 1994, 313); seen more broadly as a desire to produce “better” texts (however “better” is defined at the moment in question), it has a history stretching back to the end of the sixth century BCE and is “the most ancient of scholarly activities in the West” (Greetham 1994, 297). The development of the critical edition over this period has been as much an intellectual as a technological process. While the limitations of the printed page have undoubtedly dictated the form of many features of the traditional critical edition, centuries of refinement—by trial-and-error as well as outright invention—also have produced conventions that transcend the specific medium for which they were developed. In such cases, digital editors may be able to improve upon these conventions by recognising the (often unexpressed) underlying theory and taking advantage of the superior flexibility and interactivity of the digital medium to improve their representation.
The Critical Text in a Digital Age
Perhaps no area of traditional print editorial practice has come in for more practical and theoretical criticism than the provision of synthetic, stereotypically eclectic, reading texts3. Of course this criticism is not solely the result of developments in the digital medium: suspicion of claims to definitiveness and privilege is, after all, perhaps the most characteristic feature of post-structuralist literary theory. It is the case, however, that digital editors have taken to avoiding the critical text with a gusto that far outstrips that of their print colleagues. It is still not unusual to find a print edition with some kind of critical text; the provision of similarly critical texts in digital editions is far less common. While most digital projects do provide some kind of top-level reading text, few make any strong claims about this text’s definitiveness. More commonly, as in the early ground breaking editions of the Canterbury Tales Project (CTP), the intention of the guide text is, at best, to provide readers with some way of organising the diversity without making any direct claim to authority (Robinson nd):
We began… work [on the CTP] with the intention of trying to recreate a better reading text of the Canterbury Tales. As the work progressed, our aims have changed. Rather than trying to create a better reading text, we now see our aim as helping readers to read these many texts. Thus from what we provide, readers can read the transcripts, examine the manuscripts behind the transcripts, see what different readings are available at any one word, and determine the significance of a particular reading occurring in a particular group of manuscripts. Perhaps this aim is less grand than making a definitive text; but it may also be more useful.
There are some exceptions to this general tendency—both in the form of digital editions that are focussed around the provision of editorially mediated critical texts (e.g. McGillivray 1997; O’Donnell 2005a) and projects, such as the Piers Plowman Electronic Archive (PPEA), that hope ultimately to derive such texts from material collected in their archives. But even here I think it is fair to say that the provision of a synthetic critical text is not what most digital editors consider to be the really interesting thing about their projects. What distinguishes the computer from the codex and makes digital editing such an exciting enterprise is precisely the ability the new medium gives us for collecting, cataloguing, and navigating massive amounts of raw information: transcriptions of every witness, collations of every textual difference, facsimiles of every page of every primary source. Even when the ultimate goal is the production of a critically mediated text, the ability to archive remains distracting4.
In some areas of study, this emphasis on collection over synthesis is perhaps not a bad thing. Texts like Piers Plowman and the Canterbury Tales have such complex textual histories that they rarely have been archived in any form useful to the average scholar; in such cases, indeed, the historical tendency—seen from our post-structuralist perspective—has been towards over-synthesis. In these cases, the most popular previous print editions were put together by editors with strong ideas about the nature of the textual history and/or authorial intentions of the works in question. Their textual histories, too, have tended to be too complex for easy presentation in print format (e.g. Manley and Rickert 1940). Readers with only a passing interest in these texts’ textual history have been encouraged implicitly or explicitly to leave the question in the hands of experts.
The area in which I work, Old English textual studies, has not suffered from this tendency in recent memory, however. Editions of Old English texts historically have tended to be under- rather than over-determined, even in print (Sisam 1993; Lapidge 1994, 1991). In most cases, this is excused by the paucity of surviving witnesses. Most Old English poems (about 97% of the known canon) survive in unique manuscripts (O’Donnell 1996a; Jabbour 1968; Sisam 1953). Even when there is more primary material, Anglo-Saxon editors work in a culture that resists attempts at textual synthesis or interpretation, preferring parallel-text or single-witness manuscript editions whenever feasible and limiting editorial interpretation to the expansion of abbreviations, word-division, and metrical layout, or, in student editions, the occasional normalisation of unusual linguistic and orthographic features (Sisam 1953). One result of this is that print practice in Anglo-Saxon studies over the last century or so has anticipated to a great extent many of the aspects that in other periods distinguish digital editions from their print predecessors.
Cædmon’s Hymn: A Case Study
The scholarly history of Cædmon’s Hymn, a text I have recently edited for the Society of Early English and Norse Electronic Texts series (O’Donnell 2005a), is a perfect example of how this tendency manifests itself in Old English studies. Cædmon’s Hymn is the most textually complicated poem of the Anglo-Saxon period, and, for a variety of historical, literary, and scholarly reasons, among the most important: it is probably the first recorded example of sustained poetry in any Germanic language; it is the only Old English poem for which any detailed account of its contemporary reception survives; and it is found in four recensions and twenty-one medieval manuscripts, a textual history which can be matched in numbers, but not complexity, by only one other vernacular Anglo-Saxon poem (the most recent discussion of these issues is O’Donnell 2005a).
The poem also has been well studied. Semi-diplomatic transcriptions of all known witnesses were published in the 1930s (Dobbie 1937)5. Facsimiles of the earliest manuscripts of the poem (dating from the mid-eighth century) have been available from various sources since the beginning of the twentieth century (e.g. Dobiache-Rojdestvensky 1928) and were supplemented in the early 1990s by a complete collection of high quality black and white photos of all witnesses in Fred C. Robinson and E.G. Stanley ‘s Old English Poems from Many Sources (1991). Articles and books on the poem’s transmission and textual history have appeared quite regularly for over a hundred years. The poem has been at the centre of most debates about the nature of textual transmission in Anglo-Saxon England since at least the 1950s. Taken together, the result of this activity has been the development of an editorial form and history that resembles contemporary digital practice in everything but its medium of production and dissemination. Indeed, in producing a lightly mediated, witness- and facsimile-based archive, constructed over a number of generations by independent groups of scholars, Cædmon’s Hymn textual criticism even anticipates several recent calls for the development of a new digital model for collective, multi-project and multi-generational editorial work (e.g. Ore 2004; Robinson 2005).
The print scholarly history of the poem anticipates contemporary digital practice in another way as well: until recently, Cædmon’s Hymn had never been the subject of a modern critical textual edition. The last century has seen the publication of a couple of student editions of the poem (e.g. Pope and Fulk 2001; Mitchell and Robinson 2001), and some specialised reconstructions of one of the more corrupt recensions (Cavill 2000, O’Donnell 1996b, Smith 1938/1978, Wuest 1906). But there have been no critical works in the last hundred years that have attempted to encapsulate and transmit in textual form what is actually known about the poem’s transmission and recensional history. The closest thing to a standard edition for most of this time has been a parallel text edition of the Hymn by Elliot Van Kirk Dobbie (1942). Unfortunately, in dividing this text into Northumbrian and West-Saxon dialectal recensions, Dobbie produced an edition that ignored his own previous and never renounced work demonstrating that such dialectal divisions were less important that other distinctions that cut across dialectal lines (Dobbie 1937)6.
The Edition as Repository of Expert Knowledge
The problem with this approach—to Cædmon’s Hymn or any other text—should be clear enough. On the one hand the poem’s textual history is, by Anglo-Saxon standards, quite complex and the subject of intense debate by professional textual scholars. On the other, the failure until recently to provide any kind of critical text representing the various positions in the debate has all but hidden the significance of this research—and its implications for work on other aspects of the Hymn_—from the general reader. Instead of being able to take advantage of the expert knowledge acquired by editors and textual scholars of the poem over the last hundred years, readers of _Cædmon’s Hymn instead have been forced either to go back to the raw materials and construct their own texts over and over again or rely on a standard edition that misrepresents its own editor’s considered views of the poem’s textual history.
This is not an efficient use of these readers’ time. As Kevin Kiernan has argued, the textual history of Cædmon’s Hymn is not a spectacle for casual observers (Kiernan 1990), and most people who come to study Cædmon’s Hymn are not interested in collating transcriptions, deciphering facsimiles, and weighing options for grouping the surviving witnesses. What they want is to study the poem’s sources and analogues, its composition and reception, its prosody, language, place in the canon, significance in the development of Anglo-Saxon Christianity, or usefulness as an index in discussions of the position of women in Anglo-Saxon society—that is, all the other things we do with texts when we are not studying their transmission. What these readers want—and certainly what I want when I consult an edition of a work I am studying for reasons other than its textual history—is a text that is accurate, readable, and hopefully based on clearly defined and well-explained criteria. They want, in other words, to be able to take advantage of the expert knowledge of those responsible for putting together the text they are consulting. If they don’t like what they see, or if the approach taken is not what they need for their research, then they may try to find an edition that is better suited to their particular needs. But they will not—except in extreme cases I suspect—actually want to duplicate the effort required to put together a top-quality edition.
The Efficiency of Print Editorial Tradition
The failure of the print editors of Cædmon’s Hymn over the last hundred years to provide a critical-editorial account of their actual knowledge of the poem is very much an exception that proves the rule. For in anticipating digital approaches to textual criticism and editorial practice, textual scholars of Cædmon’s Hymn have, ironically, done a much poorer job of supplying readers with information about their text than the majority of their print-based colleagues have of other texts in other periods.
This is because, as we shall see, the dissemination of expert knowledge is something that print-based editors are generally very good at. At a conceptual level, print approaches developed over the last several hundred years to the arrangement of editorial and bibliographic information in the critical edition form an almost textbook example for the parsimonious organisation of information about texts and witnesses. While there are technological and conventional limitations to the way this information can be used and presented in codex form, digital scholars would be hard pressed to come up with a theoretically more sophisticated or efficient organisation for the underlying data.
Normalisation and Relational Database Design
Demonstrating the efficiency of traditional print practice requires us to make a brief excursion into questions of relational database theory and design7. In designing a relational database, the goal is to generate a set of relationship schemas that allow us to store information without unnecessary redundancy but in a form that is easily retrievable (Silberschatz, Korth, and Sudarshan 2006, 263). The relational model organises information into two-dimensional tables, each row of which represents a relationship among associated bits of information. Complex data commonly requires the use of more than one set of relations or tables. The key thing is to avoid complex redundancies: in a well designed relational database, no piece of information that logically follows from any other should appear more than once8.
The process used to eliminate redundancies and dependencies is known as normalisation. When data has been organised so that it is free of all such inefficiencies, it is usually said to be in third normal form. How one goes about doing this can be best seen through an example. The following is an invoice from a hypothetical book store (adapted from Krishna 1992, 32):
|Name:||Jane J. Smith|
|Address:||323 Fifteenth Street S., Lethbridge, Alberta T1K 5X3.|
|0-670-03151-8||Pinker, Stephen||The Blank Slate: The Modern Denial of Human Nature||$35.00||1||$35.00|
|0-8122-3745-5||Burrus, Virginia||The Sex Lives of Saints: An Erotics of Ancient Hagiography||$25.00||2||$50.00|
|0-7136-0389-5||Dix, Dom Gregory||The Shape of the Liturgy||$55.00||1||$55.00|
Describing the information in this case in relational terms is a three step process. The first step involves identifying what is that is to be included in the data model by extracting database field names from the document’s structure. In the following, parentheses are used to indicate information that can occur more than once on a single invoice:
Invoice: invoice_number, customer_id, customer_name, customer_address, (ISBN, author, title, price, quantity, item_total), grand_total
The second step involves extracting fields that contain repeating information and placing them in a separate table. In this case, the repeating information involves bibliographical information about the actual books sold (
ISBN, author, title, price, quantity, item_total). The connection between this new table and the
invoice table made explicit through the addition of an
invoice_number key that allows each book to be associated with a specific invoice9:
Invoice: invoice_number, customer_id, customer_name, customer_address, grand_total
Invoice_Item: invoice_number, ISBN, author, title, price, quantity, item_total
The final step involves removing functional dependencies within these two tables. In this database, for example, information about a book’s
item_price are functionally dependent on its
ISBN: for each
ISBN, there is only one possible
customer_id is associated with only one
customer_address. These dependencies are eliminated by placing the dependent material in two new tables, Customer and Book, which are linked to rest of the data by the
ISBN keys respectively.
At this point the data is said to be in third normal form: we have four sets of relations, none of which can be broken down any further:
Invoice: invoice_number, customer_id, grand_total
Invoice_Item: invoice_number, ISBN, quantity, item_total
Customer: customer_id, customer_name, customer_address
Book: ISBN, author, title, price
Normalising Editorial Data
The normalisation process becomes interesting when one applies it to the type of information editors commonly collect about textual witnesses. The following, for example, is a simplified version of a sheet I used to record basic information about each manuscript witness to Cædmon’s Hymn:
|Shelf-Mark:||B1 Cambridge, Corpus Christi College 41|
|Scribe:||Second scribe of the main Old English text.|
|Location:||Copied as part of the main text of the Old English translation of the Historia ecclesiastica (p. 332 [f. 161v]. line 6)|
|Recension:||West-Saxon eorðan recension|
|Text:|| Nuweherigan sculon
heofonrices weard metodes mihte
&hismod ge þanc weorc wuldor godes
From the point of view of the database designer, this sheet has what are essentially fields for the manuscript sigil, date, scribe, location, and, of course, the text of the poem in the witness itself, something that can be seen, on analogy with our book store invoice, as itself a repeating set of (largely implicit) information: manuscript forms, normalised readings, grammatical and lexical information, metrical position, relationship to canonical referencing systems, and the like.
As with the invoice from our hypothetical bookstore, it is possible to place this data in normal form. The first step, once again, is to extract the relevant relations from the manuscript sheet and, in this case, the often unstated expert knowledge an editor typically brings to his or her task. This leads at the very least to the following set of relations10:
Manuscript: shelf_mark, date, scribe, location, (ms_instance, canonical_reading, dictionary_form, grammatical_information, translation)
Extracting the repeating information about individual readings, leaves us with two tables linked by the key
Manuscript: shelf_mark, date, scribe, location
bq(code). Text: shelf_mark, ms_instance, canonical_reading,
bq(code). dictionary_form, grammatical_information, translation
And placing the material in third normal form generates at least one more:
Manuscript: shelf_mark, date, scribe, location
Text: shelf_mark, ms_instance, canonical_reading
Glossary: canonical_reading, dictionary_form, grammatical_information, translation
At this point, we have organised our data in its most efficient format. With the exception of the
canonical_reading keys, no piece of information is repeated in more than one table, and all functional dependencies have been eliminated. Of course in real life, there would be many more tables, and even then it would be probably impossible—and certainly not cost effective—to treat all editorial knowledge about a given text as normalisable data.
What is significant about this arrangement, however, is the extent to which our final set of tables reflects the traditional arrangements of information in a stereotypical print edition: a section up front with bibliographic (and other) information about the text and associated witnesses; a section in the middle relating manuscript readings to editorially privileged forms; and a section at the end containing abstract lexical and grammatical information about words in the text. Moreover, although familiarity and the use of narrative can obscure this fact in practice, much of the information contained in these traditional sections of a print edition actually is in implicitly tabular form: in structural terms, a glossary are best understood as the functional equivalent of a highly structured list or table row, with information presented in a fixed order from entry to entry. Bibliographical discussions, too, often consist of what are in effect, highly structured lists that can easily be converted to tabular format: one cell for shelf-mark, another for related bibliography, provenance, contents, and the like11.
Database Views and the Critical Text
This analogy between the traditional arrangement of editorial matter in print editions and normalised data in a relational database seems to break down, however, in one key location: the representation of the abstract text. For while it is possible to see the how the other sections of a print critical edition might be rendered in tabular form, the critical text itself—the place where editors present an actual reading as a result of their efforts—is not usually presented in anything resembling the non-hierarchical, tabular form a relational model would lead us to expect. In fact, the essential point of the editorial text—and indeed the reason it comes in for criticism from post-structuralists—is that it eliminates non-hierarchical choice. In constructing a reading text, print editors impose order on the mass of textual evidence by privileging individual readings at each collation point. All other forms—the material that would make up the Text table in a relational database—is either hidden from the reader or relegated, and even then usually only as a sample, to appearance in small type at the bottom of the page in the critical apparatus. Although it is the defining feature of the print critical edition, the critical text itself would appear to be the only part that is not directly part of the underlying, and extremely efficient, relational data model developed by print editors through the centuries.
But this does not invalidate my larger argument, because we build databases precisely in order to acquire this ability to select and organise data. If the critical text in a print edition is not actually a database table, it is a database view—that is to say a “window on the database through which data required for a particular user or application can be accessed” (Krishna 1992, 210). In computer database management systems, views are built by querying the underlying data and building new relations that contain one or more answers from the results. In print editorial practice, editors build critical texts by “querying” their knowledge of textual data at each collation point in a way that produces a single editorial reading. In this understanding, a typical student edition of a medieval or classical text might be understood as a database view built on the query “select the manuscript or normalised reading at each collation point that most closely matches paradigmatic forms in standard primers.” A modern-spelling edition of Shakespeare can be understood as the view resulting from a database query that instructs the processor to replace Renaissance spellings for the selected forms with their modern equivalents. And an edition like the Kane-Donaldson Piers Plowman can be understood as a view built on basis of a far more complex query derived from the editors’ research on metre, textual history, and scribal practice. Even editorial emendations are, in this sense, simply the result of a query that requests forms from an unstated “normalised/emended equivalent” column in the editors’ intellectual understanding of the underlying textual evidence: “select readings from the database according to criteria x; if the resulting form is problematic, substitute the form found in the normalised/emended_equivalent column.”12.
How Digital Editors can Improve on Print Practice
If this understanding of the critical text and its relationship to the data model underlying print critical practice is correct, then digital editors can almost certainly improve upon it. One obvious place to start might seem to lie in the formalising and automating the process by which print editors process and query the data upon which their editions are based. Such an approach, indeed, would have two main advantages: it would allow us to test others’ editorial approaches by modelling them programatically; and it would allows us to take advantage of the inherent flexibility of the digital medium by providing users with access to limitless critical texts of the same work. Where, for economic and technological reasons, print editions tend to offer readers only a single critical approach and text, digital editions could now offer readings a series of possible approaches and texts built according to various selection criteria. In this approach, users would read texts either by building their own textual queries, or by selecting pre-made queries that build views by dynamically modelling the decisions of others—a Kane-Donaldson view of Piers Plowman, perhaps, or a Gabler reading text view of Ulysses.
This is an area of research we should pursue, even though, in actual practice, we are still a long way from being able to build anything but the simplest of texts in this manner. Certain processes can, of course, be automated and even improved upon electronically—we can use computers to collate readings from different witnesses, derive manuscript stemma, automatically normalise punctuation and spelling, and even model scribal performance (see Ciula 2005; O’Donnell 2005c). And it is easy to see how it we might be able to build databases and queries so that we could model human editorial decisions in relatively simple cases—reproducing the flawed dialectal texts of Cædmon’s Hymn discussed above, perhaps, or building simple student editions of small poems.
Unfortunately, such conceptually simple tasks are still at the extreme outer limits of what it is currently possible, let alone economically reasonable, to do. Going beyond this and learning to automate higher-level critical decisions involving cultural, historical, or literary distinctions, is beyond the realm of current database design and artificial intelligence even for people working in fields vastly better funded than textual scholarship. Thus, while it would be a fairly trivial process to generate a reading text based on a single witness from an underlying relational database, building automatically a best text edition—that is to say, an edition in which a single witness is singled out automatically for reproduction on the basis of some higher-level criteria—is still beyond our current capabilities. Automating other distinctions of the type made every day by human editors—distinguishing between good and bad scribes, assessing difficilior vs. facilior readings, or weighing competing evidence of authorial authorisation—belong as yet to the realm of science fiction.13.
This doesn’t let us off the hook, however. For while we are still far away from being able to truly automate our digital textual editions, and we do need to find some way of incorporating expert knowledge into digital editions that are becoming ever more complex. The more evidence we cram into our digital editions, the harder it becomes for readers to make anything of them. No two witnesses to any text are equally reliable, authentic, or useful for all purposes at all times. In the absence of a system that can build custom editions in response to naïve queries—“build me a general interest text of Don Juan”, “eliminate unreliable scribes”, or even “build me a student edition“—digital editors still need to provide readers with explicit expert guidance as to how the at times conflicting data in their editions is to be assessed. In some cases, it is possible to use hierarchical and object-oriented data models to encode these human judgements so that they can be generated dynamically (see note 14 above). In other cases, digital editors, like their print predecessors, will simply have to build critical texts of their editions the old fashioned way, by hand, or run the risk or failing to pass on the expert knowledge they have built up over years of scholarly engagement with the primary sources.
It is here, however, that digital editors can improve theoretically and practically the most on traditional print practice. For if critical reading texts are, conceptually understood, the equivalent of query-derived database views, then there is no reason why readers of critical editions should not be able to entertain multiple views of the underlying data. Critical texts, in other words—as post-structuralist theory has told us all along—really are neither right nor wrong: they are simply views of a textual history constructed according to different, more or less explicit, selection criteria. In the print world, economic necessity and technological rigidity imposed constraints on the number of different views editors could reasonably present to their readers—and encouraged them in pre post-structuralist days to see the production of a single definitive critical text as the primary purpose of their editions. Digital editors, on the other hand, have the advantage of a medium that allows the inclusion much more easily of multiple critical views, a technology in which the relationship between views and data is widely known and accepted, and a theoretical climate that encourages an attention to variance. If we are still far from being at the stage in which we can produce critical views of our data using dynamic searches, we are able even now to hard-code such views into our editions in unobtrusive and user-friendly ways.14. By taking advantage of the superior flexibility inherent in our technology and the existence of a formal theory that now explains conceptually what print editors appear to have discovered by experience and tradition, we can improve upon print editorial practice by extending it to the point that it begins to subvert the very claims to definitiveness we now find so suspicious. By being more like our print predecessors, by ensuring that our expert knowledge is carefully and systematically encoded in our texts, we can, ironically, use the digital medium to offer our readers a greater flexibility in how they use our work.
And so in the end, the future of digital editing may lie more in our past than we commonly like to consider. While digital editorial theory has tended to define its project largely in reaction to previous print practice, this approach underestimates both the strength of the foundation we have been given to build upon and the true significance of our new medium. For the exciting thing about digital editing is not that it can do everything differently, but rather that it can do some very important things better. Over the course of the last half millennium, print editorial practice has evolved an extremely efficient intellectual model for the organisation of information about texts and witnesses—even as, in the last fifty years, we have become increasingly suspicious of the claims to definitiveness this organisation was often taken to imply. As digital editors, we can improve upon the work of our predecessors by first of all recognising and formalising the intellectual strength of the traditional editorial model and secondly reconciling it to post-structuralist interest in variation and change by implementing it far more fully and flexibly than print editors themselves could ever imagine. The question we need to answer, then, is not whether we can do things differently but how doing things differently can improve on current practice. But we won’t be able to answer this question until we recognise what current practice already does very very well.
Bart, Patricia R. 2006. Controlled experimental markup in a TEI-conformant setting. Digital Medievalist 2.1 <http://www.digitalmedievalist.org/article.cfm?RecID=10>.
British Library, nd. Turning the Pages. <http://www.bl.uk/onlinegallery/ttp/ttpbooks.html>.
Cavill, Paul. 2000. The manuscripts of Cædmon’s Hymn. Anglia 118: 499-530.
Cerquiglini, Bernard. 1989. Éloge de la variante: Histoire critique de la philologie. Paris: Éditions de Seuil.
Ciula, Arianna. 2005. Digital palaeography: Using the digital representation of medieval script to support palaeographic analysis. Digital Medievalist 1.1 <http://www.digitalmedievalist.org/article.cfm?RecID=2>
Dobbie, Elliott Van Kirk. 1937. The manuscripts of Cædmon’s Hymn and Bede’s Death Song with a critical text of the Epistola Cuthberti de obitu Bedæ. Columbia University Studies in English and Comparative Literature, 128. New York: Columbia University Press.
───, ed. 1942. The Anglo-Saxon minor poems. The Anglo-Saxon Poetic Records, a Collective Edition, 6. New York: Columbia University Press.
Dobiache-Rojdestvensky, O. 1928. Un manuscrit de Bède à Léningrad. Speculum 3: 314-21.
Finneran, Richard J., ed. 1996. The literary text in the digital age. Ann Arbor: University of Michigan Press.
Foys, Martin K., ed. 2003. The Bayeux Tapestry: Digital Edition. Leicester: SDE.
Greetham, D.C. 1994. Textual Scholarship. New York: Garland.
Jabbour, A. A. 1968. The memorial transmission of Old English poetry: a study of the extant parallel texts. Unpublished PhD dissertation, Duke University.
Karlsson, Lina and Linda Malm. 2004. Revolution or remediation? A study of electronic scholarly editions on the web. HumanIT 7: 1-46.
Kiernan, Kevin S. 1990 Reading Cædmon’s Hymn with someone else’s glosses. Representations 32: 157-74.
───, ed. 1999/2003. The electronic Beowulf. Second edition. London: British Library.
Krishna, S. 1992. Introduction to database and knowledge-base systems. Singapore: World Scientific.
Landow, George P. and Paul Delaney, eds. 1993. The digital word: text-based computing in the humanities. Cambridge, MA, MIT Press.
Lapidge, Michael. 1991. Textual criticism and the literature of Anglo-Saxon England. Bulletin of the John Rylands University Library. 73:17-45.
───. 1994. On the emendation of Old English texts. Pp. 53-67 in: D.G. Scragg and Paul Szarmach (ed.), The editing of Old English: Papers from the 1990 Manchester conference.
Manly, John M. and Edith Rickert. 1940. The text of the Canterbury tales. Chicago: University of Chicago Press.
McGann, Jerome J. 1983/1992. A critique of modern textual criticism. Charlottesville: University of Virginia Press.
───. 1994. Rationale of the hypertext. <http://www/iath.virginia.edu/public/jjm2f/rationale.htm>
McGillivray, Murray, ed. 1997. Geoffrey Chaucer’s Book of the Duchess: A hypertext edition. Calgary: University of Calgary Press.
McKenzie, D.F. 1984/1999. Bibliography and the sociology of texts. Cambridge: Cambridge University Press.
Mitchell, Bruce and Fred C. Robinson, eds. 2001. A guide to Old English. 6th ed. Oxford: Blackwell.
Nicols, Stephen G. Jr., ed. 1990. Speculum 65.
Ó Cróinín, Dáibhí. nd. The Foundations of Irish Culture AD 600-850. Website. <http://www.foundationsirishculture.ie/>.
O’Donnell, Daniel Paul. 1996a. Manuscript Variation in Multiple-Recension Old English Poetic Texts: The Technical Problem and Poetical Art. Unpubl. PhD Dissertation. Yale University.
───. 1996b. A Northumbrian version of “Cædmon’s Hymn” (eordu recension) in Brussels, Bibliothèque Royale MS 8245-57 ff. 62r2-v1: Identification, edition and filiation. Beda venerabilis: Historian, monk and Northumbrian, eds. L. A. J. R. Houwen and A. A. MacDonald. Mediaevalia Groningana, 19. 139-65. Groningen: Egbert Forsten.
───. 2005a. Cædmon’s Hymn: A multimedia study, edition, and archive. SEENET A.8. Cambridge: D.S. Brewer.
───. 2005b. O Captain! My Captain! Using Technology to Guide Readers Through an Electronic Edition. Heroic Age 8. <http://www.heroicage.org/issues/8/em.html>
───. 2005c. The ghost in the machine: Revisiting an old model for the dynamic generation of digital editions. HumanIT 8 (2005): 51-71.
───. Forthcoming. If I were “You”: How Academics Can Stop Worrying and Learn to Love “the Encyclopedia that Anyone Can Edit.” Heroic Age 10.
Ore, Espen S. 2004. Monkey Business—or What is an Edition? Literary and Linguistic Computing 19: 35-44.
Pearsall, Derek. 1985. Editing medieval texts. Pp. 92-106 in Textual criticism and literary interpretation. Ed. Jerome J. McGann. Chicago: U Chicago.
Pope, John C. and R. D. Fulk, eds. 2001. Eight Old English poems. 3rd ed. New York: W. W. Norton.
Railton, Stephen, ed. 1998-. Uncle Tom’s Cabin and American Culture. Charlottesville: University of Virginia. Institute for Advanced Technology in the Humanities. <http://www.iath.virginia.edu/utc/>.
Reed Kline, Naomi, ed. 2001. A Wheel of Memory: The Hereford Mappamundi. Ann Arbor: University of Michigan Press
Robinson, Fred C. and E. G. Stanley, eds. 1991. Old English verse texts from many sources: a comprehensive collection. Early English Manuscripts in Facsimile, 23. Copenhagen: Rosenkilde & Bagger.
Robinson, Peter. nd. New Methods of Editing, Exploring, and Reading the Canterbury Tales. <http://www.cta.dmu.ac.uk/projects/ctp/desc2.html>.
───, ed. 1996. The Wife of Bath’s Prologue on CD-ROM. Cambridge, Cambridge University Press.
───. 2004. Where are we with electronic scholarly editions, and where to we want to be? Jahrbuch für Computerphilologie Online at <http://computerphilologie.uni-muenchen.de/ejournal.html>. Also available in print: Jahrbuch für Computerphilologie. 123-143.
───. 2005. Current issues in making digital editions of medieval texts—or, do electronic scholarly editions have a future? Digital Medievalist 1.1 <http://www.digitalmedievalist.org/article.cfm?RecID=6>
───. 2006. The Canterbury Tales and other medieval texts. In Burnard, O’Brian O’Keefe, and Unsworth. New York: Modern Language Association of America.
Shillingsburg, Peter L. 1996 Electronic editions. Scholarly editing in the computer age: Theory and practice. Third edition.
Silberschatz, Avi, Hank Korth, and S. Sudarshan. 2006. Database system concepts. New York: McGraw-Hill.
Sisam, Kenneth. 1953. Studies in the history of Old English literature. Oxford: Clarendon Press.
Smith, A.H., ed. 1938/1978. Three Northumbrian poems: Cædmon’s Hymn, Bede’s Death Song, and the Leiden Riddle. With a bibliography compiled by M. J. Swanton. Revised ed. Exeter Medieval English Texts. Exeter: University of Exeter Press.
Wuest, Paul. 1906. Zwei neue Handschriften von Cædmons Hymnus. ZfdA 48: 205-26.
1 In a report covering most extant, web-based scholarly editions published in or before 2002, Lina Karlsson and Linda Malm suggest that most digital editors up to that point had made relatively little use of the medium’s distinguishing features: “The conclusion of the study is that web editions seem to reproduce features of the printed media and do not fulfil the potential of the Web to any larger extent” (2004 abstract).
2 As this list suggests, my primary experience with actual practice is with digital editions of medieval texts. Recent theoretical and practical discussions, however, suggest that little difference is to be found in electronic texts covering other periods.
3 Synthetic here is not quite synonymous with eclectic as used to describe the approach of the Gregg-Bower’s school of textual criticism. Traditionally, an eclectic text is a single, hypothetical, textual reconstruction (usually of the presumed Authorial text) based on assumption of divided authority. In this approach, a copy text is used to supply accidental details of spelling and punctuation and (usually) to serve as a default source for substantive readings that affect the meaning of the abstract artistic work. Readings from this copy text are then corrected by emendation or, preferably, from forms found in other historical witnesses. In this essay, synthetic is used to refer to a critical text that attempts to summarise in textual form an editorial position about an abstract work’s development at some point in its textual history. All eclectic texts are therefore synthetic, but not all synthetic texts are eclectic: a best text (single witness) edition is also synthetic if, as the name implies, an editorial claim is being made about the particular reliability, historical importance, or interest of the text as represented in the chosen witness. A diplomatic transcription, however, is not synthetic: the focus there is on reporting the details of a given witness as accurately as possible. For a primer on basic concepts in textual editing, excluding the concept of the synthetic text as discussed here, see Greetham 1994.
4 It is indeed significant that the PPEA —the most ambitious digital critical edition of a medieval text that I am aware of—is at this stage in its development publishing primarily as an archive: the development of critical texts of the A-, B-, and C-text traditions has been deferred until after the publication of individual edition/facsimiles of the known witnesses (Bart 2006).
5 Transcriptions, editions, facsimiles, and studies mentioned in this paragraph in many cases have been superseded by subsequent work; readers interested in the current state of Cædmon’s Hymn should begin with the bibliography in O’Donnell 2005a.
6 While there is reason to doubt the details of Dobbie’s recensional division, his fundamental conclusion that dialect did not play a crucial role in the poem’s textual development remains undisputed. For recent (competing) discussions of the Hymn’s transmission, see O’Donnell 2005a and Cavill 2000.
7 There are other types of databases, some of which are at times more suited to representation of information encoded in structural markup languages such as XML, and to the type of manipulation common in textual critical studies (see below, note 14). None of these other models, however, express information as parsimoniously as does the relational model (see Silberschatz, Korth, and Sudarshan 2006, 362-365).
8 This is a rough rather than a formal definition. Formally, a well-designed relational database normally should be in either third normal form or Boyce-Codd normal form (BCNF). A relation is said to be in third normal form when a) the domains of all attributes are atomic, and b) all non-key attributes are fully dependent on the key attributes (see Krishna 1992, 37). A relation is said to be in BCNF if whenever a non-trivial functional dependency → A holds in R, X is a superkey for R (Krishna 1992, 38). Other normal forms exist for special kinds of dependencies (Silbertschatz, Korth, Sudarshan 2006, 293-298).
9 In actual fact, the model for a real bookstore invoice would be more complex, since the example here does not take into account the possibility that there might be more than one copy of any ISBN in stock. A real bookstore would need additional tables to allow it to keep track of inventory.
10 In actual practice, the model would be far more complex and include multiple levels of repeating information (words within lines and relationships to canonical reference systems, for example). This example also assumes that the word is the basic unit of collation; while this works well for most Old English poetry, it may not for other types of literature.
11 Of course, critical editions typically contain far more than bibliographic, textual, and lexical/grammatical information. This too can be modelled relationally, however, although it would be quixotic to attempt to account for the infinite range of possible material one might include in a critical edition in this essay. Thus cultural information about a given text or witnesses is functionally dependent on the specific text or witness in question. Interestingly, the more complex the argumentation becomes, the less complex the underlying data model appears to be: a biographical essay on a text’s author, for example, might take up but a single cell in one of our hypothetical tables.
12 The critical apparatus in most print and many digital editions is itself also usually a view of an implicit textual database, rather than the database itself. Although it usually is presented in quasi-tabular form, it rarely contains a complete accounting for every form in the text’s witness base.
13 This is not to say that it is impossible to use data modelling to account for these distinctions—simply that we are far from being able to derive them arbitrarily from two dimensional relational databases, however complex. Other data models, such as hierarchical or object-oriented databases can be used to build such distinctions into the data itself, though this by definition involves the application of expert knowledge. In O’Donnell 2005a, for example, the textual apparatus is encoded as a hierarchical database. This allows readers to in effect query the database, searching for relations pre-defined as significant, substantive, or orthographic by the editor. See O’Donnell 2005a, §§ ii.7, ii.19, 7.2-9.
14 In the case of my edition of Cædmon’s Hymn, this takes the form of multiple critical texts and apparatus: several reconstructions of the poem’s archetypal form, and various critical views of the poem’s five main recensions and collations. The criteria used to construct these views is indicated explicitly in the title of each page and explained in detail in the editorial introductions. The individual editions were extracted from an SGML encoded text using stylesheets—in essence hard-wired database queries reflecting higher-level editorial decisions—but presented to the reader as a series of progressively abstract views. In keeping with the developing standard for digital textual editions, the edition also allows users direct access to the underlying transcriptions and facsimiles upon which it is based. The result is an edition that attempts to combine the best of the digital and print worlds: the archiving function common to most electronic editions (and traditionally the focus of Cædmon’s Hymn textual research in print), with the emphasis on the presentation of expert knowledge characteristic of traditional print editorial practice.
Posted: Friday February 9, 2007. 18:36.
Last modified: Wednesday May 23, 2012. 19:54.
If I were “You”: How Academics Can Stop Worrying and Learn to Love “the Encyclopedia that Anyone Can Edit”tags: crowd sourcing, digital humanities, history, peer review, social networks, universities, wikipedia
Original Publication Information: Forthcoming in Heroic Age (2007). http://www.heroicage.org/.
Time Magazine and the Participatory Web
So now it is official: Time magazine thinks the Wikipedia is here to stay.
In its December 2006 issue, Time named “You” as its “Person of the Year” (Grossman 2006). But it didn’t really mean “you“—either the pronoun or the person reading this article. It meant “us“—members of the participatory web, the “Web 2.0,” the community behind YouTube, FaceBook, MySpace, WordPress,… and of course the Wikipedia.
In its citation, Time praised its person of the year “for seizing the reins of the global media, for founding and framing the new digital democracy, for working for nothing and beating the pros at their own game.” It suggested that the new web represented
an opportunity to build a new kind of international understanding, not politician to politician, great man to great man, but citizen to citizen, person to person.
Actually, as this suggests, Time didn’t really mean “us” either. At least not if by “us” we mean the professional scholars, journalists, authors, and television producers (that is to say the “pros”) who used to have more-or-less sole responsibility for producing the content “you” (that is to say students, readers, and audiences) consumed. In fact, as the citation makes clear, Time actually sees the new web as being really a case of “you” against “us“—a rebellion of the amateurs that has come at the expense of the traditional experts:
It’s a story about community and collaboration on a scale never seen before. It’s about the cosmic compendium of knowledge Wikipedia and the million-channel people’s network YouTube and the online metropolis MySpace. It’s about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes.
This sense that the participatory web represents a storming of the informational Bastille is shared by many scholars in our dealings with the representative that most closely touches on our professional lives—the Wikipedia, “the encyclopedia that anyone can edit”. University instructors (and even whole departments) commonly forbid students from citing the Wikipedia in their work (Fung 2007). Praising it on an academic listserv is still a reliable way of provoking a fight. Wikipedia founder Jimmy Wales’s suggestion that college students should not cite encyclopaedias, including his own, as a source in their work is gleefully misrepresented in academic trade magazines and blogs (e.g. Wired Campus 2006).
And none of this is having any effect. Eighteen months ago, I had yet to see a citation from the Wikipedia in a student’s essay. This past term, it was rare to find a paper that did not cite it and several of my students asked for permission to research and write new entries for the Wikipedia instead of submitting traditional papers. Other elements of the participatory web mentioned by Time are proving equally successful: politicians, car companies, and Hollywood types now regularly publish material on YouTube or MySpace alongside or in preference to traditional media channels. This past summer, the story of LonelyGirl15 and her doomed relationship to DanielBeast on YouTube became what might be described as the first “hit series” to emerge from the new medium: it attracted millions of viewers on-line, was discussed in major newspapers, and, after it was revealed to be a “hoax” (it was scripted and produced using professional writers, actors, and technicians), its “star” made the requisite appearance on Jay Leno’s Tonight show (see LonelyGirl15).
Why the Participatory Web Works
The participatory web is growing so quickly in popularity because it is proving to be a remarkably robust model. Experiments with the Wikipedia have shown that deliberately planted false information can be corrected within hours (Read 2006). A widely cited comparison of select articles in the Wikipedia and the Encyclopaedia Britannica by the journal Nature showed that the Wikipedia was far more accurate than many had suspected: in the forty-two articles surveyed, the Wikipedia was found to have an average of four mistakes per article to Britannica’s three (Giles 2006). In fact even just Googling web pages can produce surprisingly useful research results—a recent study showed that diagnoses of difficult illness built by entering information about the symptoms into the search engine Google were accurate 60% of the time (Tang and Hwee Kwoon Ng 2006). In some circumstances, the participatory web actually may prove to be more useful than older methods of professional content creation and dissemination: an article in the Washington Post recently discussed how the United States intelligence community is attempting to use blogs and wikis to improve the speed and quality of information reported to analysts, agents, and decision-makers (Ahrens 2006).
Why Don’t We Like It
Given this popularity and evidence of effectiveness both as a channel of distribution and a source of reasonably accurate and self-correcting information, the academic community’s opposition to the Wikipedia may come at first as something of a surprise. What is it that makes “us” so dislike “you”?
One answer is that the Wikipedia and other manifestations of the participatory web do not fit very well with contemporary academic models for quality control and professional advancement. Professional academics today expect quality scholarship to be peer-reviewed and contain a clear account of intellectual responsibility. Authorship attributions are commonly found with forms of intellectual labour, such as book reviews and encyclopaedia entries, that were published without attribution as little as fifty years ago. Some scholarly journals are naming referees who recommend acceptance; readers for journals that have traditionally used anonymous reviews are frequently asking for their names to be revealed.
This emphasis on review and responsibility has obvious advantages. While peer-review is far from a perfect system—there have been enough hoaxes and frauds across the disciplines in the last twenty years to demonstrate its fallibility—it is surely better than self-publication: I imagine most scholars benefit most of the time from the comments of their readers. In my experience, the interest of good acquisition and copy-editor invariably improves the quality of a final draft.
Moreover, peer-review and clear attribution have an important role in the academic economy: they are the main (and usually only) currency with which researchers are paid by the presses and journals that publish them. In the professional academe, our worth as scholars depends very much on where our work appears. A long article in a top journal or a monograph published at a major University press is evidence that our research is regarded highly. Articles in lesser journals, or lesser forms of dissemination such as book reviews, conference papers, and encyclopaedia entries published under our names are less important but can still be used as evidence of on-going professional activity (see, for example, Department of English, University of Colorado ). While it is not quite money in the bank, this transference of prestige and recognition is an important element in most universities’ systems for determining rank and pay.
An article in the Wikipedia is not going to get anybody tenure. Because they are written collectively and published anonymously, Wikipedia articles do not highlight the specific intellectual contributions of individual contributors—although, in contrast to medical and scientific journals with their perennial problem of “co-authors” who lend names to articles without actually contributing any research (for a discussion of one example, see Bails 2006), it is possible to trace specific intellectual responsibility for all contributions to any entry in the Wikipedia using the history and compare features. And while the Wikipedia does have a formal certification process—articles can be submitted for “peer-review” and selected for “feature” status—this process is optional and not very selective: authors or readers nominate articles for peer-review and certification after they have already been published to the web and the reviewing body consists of simply those interested users who happen to notice that an article has been put forward for review and are willing to comment on the relevant discussion page (see Wikipedia: Peer Review). While this body might include respected experts in the field, it also certainly includes amateurs whose main interest is the Wikipedia itself. It also, almost equally certainly, includes people whose knowledge of the topic in question is ill-informed or belongs to the lunatic fringe.
Why We Can’t Do Anything About It
Given these objections, it is not surprising that some of us in the professional academic community are trying to come up with some alternatives—sites that combine desirable aspects of the Wikipedia model (such as its openness to amateur participation) with other elements (such a expert-review and editorial control) taken from the world of the professional academy. One new project that attempts to do this is the Citizendium, a project which, beginning as a fork (i.e. branch) of the original Wikipedia, intends to bring it under more strict editorial control: in this project, “Editors“—contributors with advanced degrees—are to be recruited to serve as area experts and help resolve disputes among contributors while “Constables“—“a set of persons of mature judgment“—will be “specially empowered to enforce rules,… up to and including the ejection of participants from the project” (Citizendium 2006). Other, though far more specialised, attempts to merge the openness of wiki-based software with more strict editorial control and peer-review are also increasingly being proposed by scholarly projects and commercial scholarly publishers.
Few if any of these projects are likely to succeed all that well. While the addition of formal editorial control and an expert-based certification system brings their organisation more closely into line with traditional academic expectations, the economics remain suspect. On the one hand, such projects will find it difficult to generate enough prestige from their peer-review process to compete for the best efforts of professional scholars with more traditional, invitation-only, encyclopaedias such as the Britannica or collections published by the prestigious academic presses. On the other hand, they are also unlikely to be able to match the speed and breadth of content-development found at more free-wheeling, community-developed projects of the participatory web.
In fact, the Wikipedia itself is the successful offshoot of a failed project of exactly this sort. The ancestor of the Wikipedia was the Nupedia, an on-line, open-source (though non-wiki) project whose goal was to develop an on-line, peer-reviewed and professionally written encyclopaedia (see History of Wikipedia, Nupedia, Wikipedia, and Sanger 2005). The editorial board was subject to strict review and most participants were expected to have a Ph.D. or equivalent. The review process involved seven steps: five analogous to those found traditional academic publishing (assigning to an editor, finding a reader, submitting for review, copy-editing, and final pre-publication approval) and two borrowed from the world of open source software (a public call for reviews, and a public round of copy-editing). Begun in March 2000, the project ultimately collapsed in September 2003 due to a lack of participation, slow time-to-publication, and conflicts between professional contributors and editors and members of the public in the open review and copy-editing parts of the review process. In its relatively brief existence, the project managed to approve only twenty-four peer-reviewed articles for publication. At its suspension, seventy-four were still in various stages of review. After the project as a whole was suspended, the successful articles were rolled into the Wikipedia. Relatively few can be found in their original form today.
The Wikipedia was originally established as a support structure for the Nupedia’s open processes—as a place where participants in the larger project could collaborate in the creation of material for the “official” project and contribute to their review and copy-editing. The wiki-based project was proposed on the Nupedia’s mailing list on January 2, 2001 and rejected almost immediately by participants for much the same reasons it is frowned upon by professional academics today. It was reestablished as a separate project with its own domain name by January 10. Almost immediately, it began to best its “mother” project: within a year the Wikipedia had published 20,000 articles and existed in 18 different languages; by the Nupedia’s suspension in the fall of 2003, the Wikipedia had published 152,000 articles in English and was found in twenty-six different languages (Multilingual Statistics). By October 30th, 2006, there were over 1.4 million articles in English alone.
The contrasting fates of the Nupedia and the Wikipedia illustrate the central problem that faces any attempt to impose traditional academic structures on projects designed for the participatory web: the strengths and weaknesses of wiki-based and traditional academic models are almost directly out of phase. The Wikipedia has been successful in its quest to develop a free, on-line encyclopaedia of breadth and accuracy comparable to that of its print-based competitors because the barrier to participation is so low. Because anybody can edit the Wikipedia, millions do. And it is their collective contribution of small amounts of effort that enables the growth and success of the overall project.
The Nupedia, on the other hand, failed because its use of traditional academic vetting procedures raised the bar to mass participation by amateurs but did not make the project significantly more attractive to professionals. Academics who need prestige and authorial credit for their professional lives are still going to find it difficult to use participation in the Nupedia (or, now, the Citizendium) on our CVs. Even in fields where collaboration is the norm, scholars need to be able to demonstrate intellectual leadership rather than mere participation. A listing as first author is far more valuable than second or third. And second or third author in a traditional venue is infinitely preferable to professional academics to membership in an relatively undifferentiated list of contributors to an on-line encyclopaedia to which amateurs contribute. The most prestigious journals, presses, and encyclopaedias all enforce far higher standards of selectivity than the mere evidence of an earned Ph.D. suggested by Nupedia and or “eligibility” for “a tenure track job” preferred by the Citizendium. No project that hopes to remain open to free collaboration by even a select group well-informed amateurs or marginally qualified is going to be able to compete directly with already existing, traditional publications for the best original work of professional scholarly researchers, no matter how stringent the review process. But by raising the bar against relatively casual participation by large numbers of amateurs, such projects also risk vitiating the “many hands make light work” principle that underlies the explosive success of the Wikipedia and similar participatory projects.
A New Model of Scholarship: The Wikipedia as Community Service
If I am correct in thinking that attempts to create alternatives to the Wikipedia by combining aspects of traditional academic selectivity and review with a wiki-based open collaboration model are doomed to failure, then the question becomes what “we” (the professional University teachers and researchers who are so suspicious of the original Wikipedia) are to do with what “you” (the amateurs who contribute most of the Wikipedia’s content) produce.
It is clear that we can’t ignore it: no matter what we say in our syllabi, students will continue to use the Wikipedia in their essays and projects—citing it if we allow them to do so, and plagiarising from it if we do not. Just as importantly, the Wikipedia is rapidly becoming the public’s main portal to the subjects we teach and research: popular journalists now regularly cite the Wikipedia in their work and the encyclopaedia commonly shows up on the first page of Google searches. While it may not be in any specific scholar’s individual professional interest to take time away from his or her refereed research in order to contribute to a project that provides so little prestige, it is clearly in our collective interest as a profession to make sure that our disciplines are well represented in the first source to which our students and the broader public turn when they want to find out something about the topics we actually research.
But perhaps this shows us the way forward. Perhaps what we need is to see the Wikipedia and similar participatory sites less as a threat to our way of doing things than a way of making what we do more visible to the general public. The fictional romance between LonelyGirl15 and DanielBeast on YouTube did not threaten the makers of commercial television. But it did give prominence to a medium that makers of commercial television now use regularly to attract audiences to their professional content in the traditional media. In our case, the Wikipedia is less an alternative to traditional scholarship (except perhaps as this is represented in print encyclopaedias) than it is a complement—something that can be used to explain, show off, and broaden the appeal of the work we do in our professional lives.
In fact, the important thing about the Wikpedia is that it has been built almost entirely through the efforts of amateurs—that is to say people who are not paid to conduct research in our disciplines but do so anyway because it is their hobby. While it can certainly be disheartening to see the occasional elementary mistake or outlandish theory in a Wikipedia entry, we should not ignore the fact that the entry itself exists because people were interested enough in what we do to try and imitate it in their spare time. Given the traditional lack of respect shown scholarly research by governments and funding agencies for much of the last century, we should be rejoicing in this demonstration of interest—in much the same way scientists judging a science fair are able to see past the many relatively trivial experiments on display and recognise the event’s importance as a representation of popular interest in what they do.
This recognition of the extent to which the Wikipedia has engaged the imagination of the general public and turned it to the amateur practice of scholarship suggests what I think may prove to be the best way of incorporating it into the lives of professional academics: since the Wikipedia appears unable to serve as a route to professional advancement for intrinsic reasons, perhaps we should begin to see contributions to it by professional scholars as a different type of activity altogether—as a form community service to be performed by academics in much the same way lawyers are often expected to give back to the public through their pro bono work. A glance at almost any discussion page on the Wikipedia will show that the Wikipedians themselves are aware of the dangers posed to the enterprise by the inclusion of fringe theories, poor research, and contributions by people with insufficient disciplinary expertise. As certified experts who work daily with the secondary and primary research required to construct good Wikipedia entries, we are in a position to contribute to the construction of individual articles in a uniquely positive way by taking the time to help clean up and provide balance to entries in our professional areas of interest. In doing so, we can both materially improve the quality of the Wikipedia and demonstrate the importance of professional scholars to a public whose hobby touches very closely on the work we are paid to do—and whose taxes, by and large, support us.
And who knows, maybe “we” could even join “you” in accepting Time Magazine’s nomination for person of the year.
Ahrens, Frank 2006. “A Wikipedia Of Secrets.” Washington Post. Sunday, November 5: F07. Online edition, URL: http://www.washingtonpost.com/wp_dyn/content/article/2006/11/03/AR2006110302015.html
Bails, Jennifer.2006. “Schatten’s hand in bogus paper detailed.” Pittsburg Tribune-Review, January 11. http://www.pittsburghlive.com/x/tribune-review/trib/regional/s_412326.html
Bergstein, Brian. “Microsoft Offers Cash for Wikipedia Edit.” Washington Post, January 23. http://www.washingtonpost.com/wp-dyn/content/article/2007/01/23/AR2007012301025.html
Citizendium 2006. “Citizendium’s Fundamental Policies.” Citizendium (citation from version 1.4, October 11) http://www.citizendium.org/fundamentals.html
Department of English, University of Colorado . “Department of English guidelines for promotion.” Department Handbook. http://www.colostate.edu/Depts/English/handbook/guidepro.htm
Fung, Brian, 2007. “Wikipedia distresses History Department.” middleburycampus.com. Online. URL: http://media.www.middleburycampus.com/media/storage/paper446/news/2007/01/24/News/Wikipedia.Distresses.History.Department-2670081.shtml
Giles, Jim. 2005. “Internet encyclopaedias go head to head.” email@example.com. “ http://www.nature.com/news/2005/051212/full/438900a.html”: http://www.nature.com/news/2005/051212/full/438900a.html
Grossman, Lev. 2006. “Time’s Person of the Year: You.” Time. Wednesday, Dec. 13. Online Edition. URL: http://www.time.com/time/magazine/article/0%2C9171%2C1569514%2C00.html .
History of Wikipedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=History_of_Wikipedia&oldid=104389205 (accessed January 31, 2007).
Lonelygirl15. Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Lonelygirl15&oldid=104136723 (accessed January 31, 2007).
Multilingual Statistics. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia:Multilingual_statistics&oldid=97805501 (accessed February 2, 2007).
Nupedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Nupedia&oldid=103617050 (accessed January 31, 2007).
Read, Brock. 2006. “Can Wikipedia Ever Make the Grade?” The Chronice of Higher Education October 27. URL: http://chronicle.com/temp/reprint.php?%20id=z6xht2rj60kqmsl8tlq5ltqcshc5y93y
Sanger, Larry J. 2005. “The Early History of Nupedia and Wikipedia: A Memoir.” Part I http://features.slashdot.org/article.pl?sid=05/04/18/164213&tid=95&tid=149&tid=9 Part II: http://features.slashdot.org/article.pl?sid=05/04/19/1746205&tid=95.
Tang, Hangwi and Jennifer Hwee Kwoon Ng. 2006. “Googling for a diagnosis—use of Google as a diagnostic aid: internet based study” BMJ 333)7570): 1143-1145. URL: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1676146: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1676146
Wikipedia: Peer Review. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia:Peer_review&oldid=104637689 (accessed January 31, 2007).
Wikipedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia&oldid=104645649 (accessed January 31, 2007).
Wired Campus 2006. “Wikipedia Founder Discourages Academic Use of His Creation.” Chronicle of Higher Education. June 12. URL: http://chronicle.com/wiredcampus/article/1328/wikipedia-founder-discourages-academic-use-of-his-creation
Posted: Friday February 2, 2007. 22:07.
Last modified: Wednesday May 23, 2012. 19:56.
The Doomsday Machine, or, "If you build it, will they still come ten years from now?": What Medievalists working in digital media can do to ensure the longevity of their researchtags: computers, digital humanities, editorial studies, history, information machines, internet, sustainability
Original Publication Information: Heroic Age 7 (2004). http://www.heroicage.org/issues/7/ecolumn.html.
Yes, but the… whole point of the doomsday machine… is lost… if you keep it a secret!
It is, perhaps, the first urban myth of humanities computing: the Case of the Unreadable Doomsday Machine. In 1986, in celebration of the 900th anniversary of William the Conqueror’s original survey of his British territories, the British Broadcasting Corporation (BBC) commissioned a mammoth 2.5 million electronic successor to the Domesday Book. Stored on two 12 inch video laser discs and containing thousands of photographs, maps, texts, and moving images, the Domesday Project was intended to provide a high-tech picture of life in late 20th century Great Britain. The project’s content was reproduced in an innovative early virtual reality environment and engineered using some of the most advanced technology of its day, including specially designed computers, software, and laser disc readers (Finney 1986).
Despite its technical sophistication, however, the Domesday Project was a flop by almost any practical measure. The discs and specialized readers required for accessing the project’s content turned out to be too expensive for the state-funded schools and public libraries that comprised its intended market. The technology used in its production and presentation also never caught on outside the British government and school system: few other groups attempted to emulate the Domesday Project’s approach to collecting and preserving digital material, and no significant market emerged for the specialized computers and hardware necessary for its display (Finney 1986, McKie and Thorpe 2003). In the end, few of the more than one million people who contributed to the project were ever able to see the results of their effort.
The final indignity, however, came in March 2003 when, in a widely circulated story, the British newspaper The Observer reported that the discs had finally become “unreadable”:
16 years after it was created, the £2.5 million BBC Domesday Project has achieved an unexpected and unwelcome status: it is now unreadable.
The special computers developed to play the 12in video discs of text, photographs, maps and archive footage of British life are — quite simply — obsolete.
As a result, no one can access the reams of project information — equivalent to several sets of encyclopedias — that were assembled about the state of the nation in 1986. By contrast, the original Domesday Book — an inventory of eleventh-century England compiled in 1086 by Norman monks — is in fine condition in the Public Record Office, Kew, and can be accessed by anyone who can read and has the right credentials. ‘It is ironic, but the 15-year-old version is unreadable, while the ancient one is still perfectly usable,’ said computer expert Paul Wheatley. ‘We’re lucky Shakespeare didn’t write on an old PC.’ (McKie and Thorpe 2003)
In fact, the situation was not as dire as McKie and Thorpe suggest. For one thing, the project was never actually “unreadable,” only difficult to access: relatively clean copies of the original laser discs still survive, as do a few working examples of the original computer system and disc reader (Garfinkel 2003). For another, the project appears not to depend, ultimately, on the survival of its obsolete hardware. Less than ten months after the publication of the original story in The Observer, indeed, engineers at Camileon, a joint project of the Universities of Leeds and Michigan, were able to reproduce most if not quite all the material preserved on the original 12 inch discs using contemporary computer hardware and software (Camileon 2003a; Garfinkel 2003).
The Domesday Project’s recent history has some valuable, if still contested, lessons for librarians, archivists, and computer scientists (see for example the discussion thread to Garfinkel 2003; also Camileon 2003b). On the one hand, the fact that engineers seem to be on the verge of designing software that will allow for the complete recovery of the project’s original content and environment is encouraging. While it may not yet have proven itself to be as robust as King William’s original survey, the electronic Domesday Project now at least does appear have been saved for the foreseeable future-even if “foreseeable” in this case may mean simply until the hardware and software supporting the current emulator itself becomes obsolete.
On the other hand, however, it cannot be comforting to realise that the Domesday Project required the adoption of such extensive and expensive restoration measures in the first place less than two decades after its original composition: the discs that the engineers at Camileon have devoted the last ten months to recovering have turned out to have less than 2% the readable lifespan enjoyed by their eleventh-century predecessor. Even pulp novels and newspapers published on acidic paper at the beginning of the last century have proved more durable under similarly controlled conditions.1 While viewed in the short term, digital formats do appear to offer a cheap method of preserving, cataloguing, and especially distributing copies of texts and other cultural material, their effectiveness and economic value as a means of long-term preservation has yet to be demonstrated completely.
These are, for the most part, issues for librarians, archivists, curators, computer scientists, and their associations: their solution will almost certainly demand resources, a level of technical knowledge, and perhaps most importantly, a degree of international cooperation far beyond that available to most individual humanities scholars (Keene 2003). In as much as they are responsible for the production of an increasing number of electronic texts and resources, however, humanities scholars do have an interest in ensuring that the physical record of their intellectual labour will outlast their careers. Fortunately there are also some specific lessons to be learned from the Domesday Project that are of immediate use to individual scholars in their day-to-day research and publication.
1. Do not write for specific hardware or software.
Many of the preservation problems facing the Domesday Project stem from its heavy reliance on specific proprietary (and often customized) hardware and software. This reliance came about for largely historical reasons. The Domesday Project team was working on a multimedia project of unprecedented scope, before the Internet developed as a significant medium for the dissemination of data.2 In the absence of suitable commercial software and any real industry emphasis on inter-platform compatibility or international standards, they were forced to custom-build or commission most of their own hardware and software. The project was designed to be played from a specially-designed Phillips video-disc player and displayed using custom-built software that functioned best on a single operating platform: the BBC Master, a now obsolete computer system which, with the related BBC Model B, was at the time far more popular in schools and libraries in the United Kingdom than the competing Macintosh, IBM PC, or long forgotten Acorn systems.3
With the rise of the internet and the development of well-defined international standard languages such as Standard General Markup Language (SGML), HyperText Markup Language (HTML), eXtensible Markup Language (XML), and Hypermedia/Time-based Structuring Language (HyTime), few contemporary or future digital projects are likely to be as completely committed to a single specific hardware or software system as the Domesday Project. This does not mean, however, that the temptation to write for specific hardware or software has vanished entirely. Different operating systems allow designers to use different, often incompatible, shortcuts for processes such as referring to colour, assigning fonts, or referencing foreign characters (even something as simple as the Old English character thorn can be referred to in incompatible ways on Windows and Macintosh computers). The major internet browsers also all have proprietary extensions and idiosyncratic ways of understanding supposedly standard features of the major internet languages. It is very easy to fall into the trap of adapting one’s encoding to fit the possibilities offered by non-standard extensions, languages, and features of a specific piece of hardware or software.
The very real dangers of obsolescence this carries with it can be demonstrated by the history of the Netscape
As with the Domesday Project, however, projects that relied on these proprietary extensions for anything other than the most incidental effects were doomed to early obsolescence: the
2. Maintain a distinction between content and presentation
A second factor promoting the early obsolescence of the Domesday Project was its emphasis on the close integration of content and presentation. The project was conceived of as a multimedia experience and its various components-text, video, maps, statistical information-often acquired meaning from their interaction, juxtaposition, sequencing, and superimposition (Finney 1986, “Using Domesday”; see also Camileon 2003b). In order to preserve the project as a coherent whole, indeed, engineers at Camileon have had to reproduce not only the project’s content but also the look and feel of the specific software environment in which it was intended to be searched and navigated (Camileon 2003b).
Here too the Domesday Project designers were largely victims of history. Their project was a pioneering experiment in multimedia organisation and presentation and put together in the virtual absence of now standard international languages for the design and dissemination of electronic documents and multimedia projects — many of which, indeed, were in their initial stages of development at the time the BBC project went to press.4
More importantly, however, these nascent international standards involved a break with the model of electronic document design and dissemination employed by the Domesday Project designers. Where the Domesday Project might be described as an information machine — a work in which content and presentation are so closely intertwined as to become a single entity — the new standards concentrated on establishing a theoretical separation between content and presentation (see Connolly 1994 for a useful discussion of the distinction between “programmable” and “static” document formats and their implications for document conversion and exchange). This allows both aspects of an electronic to be described separately and, for the most part, in quite abstract terms which are then left open to interpretation by users in response to their specific needs and resources. It is this flexibility which helped in the initial popularization of the World Wide Web: document designers could present their material in a single standard format and, in contrast to the designers of the Domesday Project, be relatively certain that their work would remain accessible to users accessing it with various software and hardware systems — whether this was the latest version of the new Mosaic browser or some other, slightly older and non-graphical interface like Lynx (see Berners-Lee 1989-1990 for an early summary of the advantages of multi-platform support and a comparison with early multi-media models such as that adopted by the Domesday Project). In recent years, this same flexibility has allowed project designers to accommodate the increasingly large demand for access to internet documents from users of (often very advanced) non-traditional devices: web activated mobile phones, palm-sized digital assistants, and of course aural screen readers and Braille printers.
In theory, this flexibility also means that where engineers responsible for restoring the Domesday Project have been forced to emulate the original software in order to recreate the BBC designer’s work, future archivists will be able to restore current, standards-based, electronic projects by interpreting the accompanying description of their presentation in a way appropriate to their own contemporary technology. In some cases, indeed, this restoration may not even require the development of any actual computer software: a simple HTML document, properly encoded according to the strictest international standards, should in most cases be understandable to the naked eye even when read from a paper printout or text-only display.
In practice, however, it is still easy to fall into the trap of integrating content and presentation. One common example involves the use of table elements for positioning unrelated or sequential text in parallel “columns” on browser screens (see Chisholm, Vanderheiden, et al. 2000, § 5). From a structural point of view, tables are a device for indicating relations among disparate pieces of information (mileage between various cities, postage prices for different sizes and classes of mail, etc.). Using tables to position columns, document designers imply in formal terms the existence of a logical association between bits of text found in the same row or column — even if the actual rationale for this association is primarily aesthetic. While the layout technique, which depends on the fact that all current graphic-enabled browsers display tables by default in approximately the same fashion, works well on desktop computers, the same trick can produce nonsensical text when rendered on the small screen of a mobile phone, printed by a Braille output device, or read aloud by an aural browser or screen-reader. Just as importantly, this technique too can lead to early obsolescence or other significant problems for future users. Designers of a linguistic corpus based on specific types of pre-existing electronic documents, for example, might be required to devote consider manual effort to recognising and repairing content arbitrarily and improperly arranged in tabular format for aesthetic reasons.
3. Avoid unnecessary technical innovation
A final lesson to be learned from the early obsolescence of the Domesday Project involves the hidden costs of technical innovation. As a pioneering electronic document, the Domesday Project was in many ways an experiment in multimedia production, publication, and preservation. In the absence of obvious predecessors, its designers were forced to develop their own technology, organisational outlines, navigation techniques, and distribution plans (see Finney 1986 and Camileon 2003a for detailed descriptions). The fact that relatively few other projects adopted their proposed solutions to these problems — and that subsequent developments in the field led to a different focus in electronic document design and dissemination — only increased the speed of the project’s obsolescence and the cost and difficulty of its restoration and recovery.
Given the experimental status of this specific project, these were acceptable costs. The Domesday Project was never really intended as a true reference work in any usual sense of the word.5 Although it is full of information about mid-1980s Great Britain, for example, the project has never proved to be an indispensable resource for study of the period. While it was inspired by William the Conqueror’s great inventory of post-conquest Britain, the Domesday Project was, in the end, more an experiment in new media design than an attempt at collecting useful information for the operation of Mrs. Thatcher’s government.
We are now long past the day in which electronic projects can be considered interesting simply because they are electronic. Whether they are accessing a Z39.50 compliant library catalogue, consulting an electronic journal on JSTOR, or accessing an electronic text edition or manuscript facsimile published by an academic press, users of contemporary electronic projects by-and-large are now more likely to be interested in the quality and range of an electronic text’s intellectual content than the novelty of its display, organisation or technological features (Nielsen 2000). The tools, techniques, and languages available to producers of electronic projects, likewise, are now far more standardised and helpful than those available to those responsible for electronic incunabula such as the Domesday Project.
Unfortunately this does not mean that contemporary designers are entirely free of the dangers posed by technological experimentation. The exponential growth of the internet, the increasing emphasis on compliance with international standards, and the simple pace of technological change over the last decade all pose significant challenges to the small budgets and staff of many humanities computing projects. While large projects and well-funded universities can sometimes afford to hire specialized personnel to follow developments in computing design and implementation and freeing other specialists to work on content development, scholars working on digital projects in smaller groups, at less well-funded universities, or on their own often find themselves responsible for both the technological and intellectual components of their work. Anecdotal evidence suggests that such researchers find keeping up with the pace of technological change relatively difficult — particularly when it comes to discovering and implementing standard solutions to common technological problems (Baker, Foys, et al. 2003). If the designers of the Domesday Project courted early obsolescence because their pioneering status forced them to design unique technological solutions to previously unresolved problems, many contemporary humanities projects appear to run same risk of obsolescence and incompatibility because their inability to easily discover and implement best practice encourages them to continuously invent new solutions to already solved problems (HATII and NINCH 2002, NINCH 2002-2003, Healey 2003, Baker, Foys, et al. 2003 and O’Donnell 2003).
This area of humanities computing has been perhaps the least well served by the developments of the last two decades. While technological changes and the development of well-designed international standards have increased opportunities for contemporary designers to avoid the problems which led to the Domesday Project’s early obsolescence, the absence of a robust system for sharing technological know-how among members of the relevant community has remained a significant impediment to the production of durable, standards-based projects. Fortunately, however, improvements are being made in this area as well. While mailing lists such humanist-l and tei-l long have facilitated exchange of information on aspects of electronic project design and implementation, several new initiatives have appeared over the last few years which are more directly aimed at encouraging humanities computing specialists to share their expertise and marshal their common interests. The Text Encoding Initiative (TEI) has recently established a number of Special Interest Groups (SIGs) aimed at establishing community practice in response to specific types of textual encoding problems. Since 1993, the National Initiative for a Networked Cultural Heritage (NINCH) has provided a forum for collaboration and development of best practice among directors and officers of major humanities computing projects. The recently established TAPoR project in Canada and the Arts and Humanities Data Service (AHDS) in the United Kingdom likewise seek to serve as national clearing houses for humanities computing education and tools. Finally, and aimed more specifically at medievalists, the Digital Medievalist Project (of which I am currently director) is seeking funding to establish a “Community of Practice” for medievalists engaged in the production of digital resources, through which individual scholars and projects will be able to pool skills and practice acquired in the course of their research (see Baker, Foys, et al. 2003). Although we are still in the beginning stages, there is increasing evidence that humanities computing specialists are beginning to recognise the extent to which the discovery of standardised implementations and solutions to common technological problems is likely to provide as significant a boost to the durability of electronic resources as the development of standardised languages and client-side user agents in the late 1980s and early 1990s. We can only benefit from increased cooperation.
The Case of the Unreadable Doomsday Machine makes for good newspaper copy: it pits new technology against old in an information-age version of nineteenth-century races between the horse and the locomotive. Moreover, there is an undeniable irony to be found in the fact that King William’s eleventh-century parchment survey has thus far proven itself to be more durable than the BBC’s 1980s computer program.
But the difficulties faced by the Domesday Project and its conservators are neither necessarily intrinsic to the electronic medium nor necessarily evidence that scholars at work on digital humanities projects have backed wrong horse in the information race. Many of the problems which led to the Domesday Project’s early obsolescence and expensive restoration can be traced to its experimental nature and the innovative position it occupies in the history of humanities computing. By paying close attention to its example, by learning from its mistakes, and by recognising the often fundamentally different ways in which contemporary humanities computing projects differ from such digital incunabula, scholars can contribute greatly to the likelihood that their current projects will remain accessible long after their authors reach retirement age.
1 See the controversy between Baker 2002 and [Association of Research Libraries] 2001, both of whom agree that even very acidic newsprint can survive “several decades” in carefully controlled environments.
2 The first internet browser, “WorldWideWeb,” was finished by Tim Berners-Lee at CERN (Conseil Européen pour la Recherche Nucléaire) on Christmas Day 1990. The first popular consumer browser able to operate on personal computer systems was the National Center for Supercomputing Applications (NCSA) Mosaic (a precursor to Netscape), which appeared in 1993. See [livinginternet.com] 2003 and Cailliau 1995 for brief histories of the early browser systems. The first internet application, e-mail, was developed in the early 1970s ([www.almark.net] 2003); until the 1990s, its use was restricted largely to university researchers and the U.S. military.
3 Camileon 2003; See McMordie 2003 for a history of the Acorn platform.
fn4. SGML, the language from which HTML is derived, was developed in the late 1970s and early 1980s but not widely used until the mid-to-late 1980s ([SGML Users’ Group] 1990). HyTime, a multimedia standard, was approved in 1991 ([SGML SIGhyper] 1994).
fn5. This is the implication of Finney 1986, who stresses the project’s technically innovative nature, rather than its practical usefulness, throughout.
- Association of Research Libraries, Washington DC. “Q and A in response to Nicholson Baker’s Double Fold.” Web Page, September 4, 2001 [accessed December 29, 2003]. Available at: http://www.arl.org/preserv/bakerQA.html.
- Baker, Nicholson. Double fold: libraries and the assault on paper. New York: Vintage Books; 2002.
- Baker, Peter, Martin Foys, Murray McGillivray, Daniel Paul O’Donnell, Roberto Rosselli Del Turco and Elizabeth Solopova. “The Digital Medievalist Project: A Community of Practice for Medievalists working with digital media.” Web Page, September 15, 2003 [accessed December 29, 2003]. Available at: http://www.digitalmedievalist.org.
- Berners-Lee, Tim. “The original proposal of the WWW, HTMLized.” Web Page, 1989 [accessed December 29, 2003]. Available at: http://www.w3.org/History/1989/proposal.html.
- Cailliau, Robert. “A little history of the World Wide Web.” Web Page, 1995 [accessed December 29, 2003]. Available at: http://www.w3.org/History.html.
- Camileon (a). “BBC Domesday.” Web Page, 2003a [accessed December 29, 2003]. Available at: http://www.si.umich.edu/CAMILEON/domesday/domesday.html.
- Camileon (b). “Preserving BBC Domesday: Frequently Asked Questions.” Web Page, 2003b [accessed December 29, 2003]. Available at: http://www.si.umich.edu/CAMILEON/domesday/faq.html.
- Chisholm, Wendy, Gregg Vanderheiden, Ian Jacobs, [W3C], and [WAI]. “HTML Techniques for Web Content Accessibility: Guidelines 1.0.” Web Page, November 6, 2000 [accessed December 29, 2003]. Available at: http://www.w3.org/TR/2000/NOTE-WCAG10-HTML-TECHS-20001106/.
- Connolly, Daniel W. “Toward a formalism for communication on the Web.” Web Page, 1994 [accessed December 29, 2003]. Available at: http://www.w3.org/MarkUp/html-spec/html-essay.html.
- Finney, Andy. “The Domesday Project.” Web Page, 1986. [accessed December 29, 2003]. Available at: http://www.atsf.co.uk/dottext/domesday.html.
- Garfinkel, Simson. “The Myth of Doomed Data: The handwringing about obsolete formats is misguided. The digital files we create today will be around for a very, very long time. Technology Review [On-Line Edition]. December 3, 2003 [accessed December 29, 2003]. Available at: http://www.technologyreview.com/articles/wo_garfinkel120303.asp?p=1
- HATII and NINCH. “NINCH guide to good practice, version 1.0.” Web Page, 2002 [accessed December 29, 2003]. Available at: http://www.nyu.edu/its/humanities//ninchguide/.
- Healey, Antonette di Paolo. “The Dictionary of Old English: the next generation(s).” Unpublished lecture. ISAS Conference; Phoenix, AZ. 2003.
- Keene, Suzanne. “Now you see it, now you won’t.” Web Page, 2003 [accessed December 29, 2003]. Available at: http://www.suzannekeene.info/conserve/digipres/index.htm.
- livinginternet.com. “Web browser History.” Web Page [accessed December 29, 2003]. Available at: http://livinginternet.com/w/wi_browse.htm.
- McKie, Robin and Vanessa Thorpe. “Digital Domesday Book lasts 15 years not 1000.” The Observer [On-Line Edition]. March 3, 2003. [Accessed December 29, 2003]. Available at: http://observer.guardian.co.uk/uk_news/story/0,6903,661093,00.html
- McMordie, Robert. “Technical history of Acorn (version 0.6 beta).” Web Page [accessed December 29, 2003]. Available at: http://www.mcmordie.co.uk/acornhistory/index.shtml.
- Netscape Communications Corporation. “Dynamic HTML in Communicator.” Web Page, 1997 [accessed December 29, 2003]. Available at: http://developer.netscape.com/docs/manuals/communicator/dynhtml/layers3.htm.
- Nielsen, Jakob. Designing Web usability. Indianapolis: New Riders; 2000.
- NINCH. “Why does the cultural community need Best Practices?” Web Page, 2002 [accessed December 29, 2003]. Available at: http://www.ninch.org/programs/practice/why.html.
- O’Donnell, Daniel Paul. “Texts and the Single Scholar: Is the morning after worth the night before?” Unpublished lecture. Thirty-eighth International Congress on Medieval Studies, Kalamazoo, MI. May 8, 2003.
- Sahrmann, Josh. “An introduction to Netscape layers.” Web Page, May 16, 1998 [accessed December 29, 2003]. Available at: http://tech.irt.org/articles/js087/.
- SGML SIGhyper. “HyTime and SMDL – History.” Web Page, June 28, 1994 [accessed December 29, 2003]. Available at: http://www.sgmlsource.com/history/hthist.htm.
- SGML Users’ Group. “A Brief History of the Development of SGML.” Web Page, 1990 [accessed December 29, 2003]. Available at: http://www.sgmlsource.com/history/sgmlhist.htm.
- www.almark.net. “History of e-mail.” Web Page [accessed December 29, 2003]. Available at: http://www.almark.net/Internet/html/slide5.html.
Posted: Friday December 15, 2006. 13:44.
Last modified: Wednesday May 23, 2012. 20:08.
Disciplinary impact and technological obsolescence in digital medieval studiestags: digital humanities, editorial studies, history, html, internet, sgml, sustainability, xml
Forthcoming in The Blackwell Companion to the Digital Humanities, ed. Susan Schriebman and Ray Siemens. 2007.
Final Draft. Do not quote without permission of the author.
In May 2004, I attended a lecture by Elizabeth Solopova at a workshop at the University of Calgary on the past and present of digital editions of medieval works1. The lecture looked at various approaches to the digitisation of medieval literary texts and discussed a representative sample of the most significant digital editions of English medieval works then available: the Wife of Bath’s Prologue from the Canterbury Tales Project (Robinson and Blake 1996), Murray McGillivray’s Book of the Duchess (McGillivray 1997), Kevin Kiernan’s Electronic Beowulf (Kiernan 1999), and the first volume of the Piers Plowman Electronic Archive (Adams et al. 2000). Solopova herself is an experienced digital scholar and the editions she was discussing had been produced by several of the most prominent digital editors then active. The result was a master class in humanities computing: an in-depth look at mark-up, imaging, navigation and interface design, and editorial practice in four exemplary editions.
From my perspective in the audience, however, I was struck by two unintended lessons. The first was how easily digital editions can age: all of the CD-ROMs Solopova showed looked quite old fashioned to my 2004 eyes in the details of their presentation and organisation and only two, Kiernan’s Beowulf and McGillivray’s Book of the Duchess, loaded and displayed on the overhead screen with no difficulties or disabled features.
For the purposes of Solopova’s lecture these failures were not very serious: a few missing characters and a slightly gimpy display did not affect her discussion of the editions’ inner workings and indeed partially illustrated her point concerning the need to plan for inevitable technological obsolescence and change at all stages of edition design. For end-users consulting these editions in their studies or at a library, however, the problems might prove more significant: while well-designed and standards-based editions such as these can be updated in order to accommodate technological change, doing so requires skills that are beyond the technological capabilities of most humanities scholars; making the necessary changes almost certainly requires some post-publication investment on the part of the publisher and/or the original editors. Until such effort is made, the thought and care devoted by the original team to editorial organisation and the representation of textual detail presumably is being lost to subsequent generations of end users.
The second lesson I learned was that durability was not necessarily a function of age or technological sophistication. The editions that worked more-or-less as intended were from the middle of the group chronologically and employed less sophisticated technology than the two that had aged less well: they were encoded in relatively straightforward HTML (although Kiernan’s edition makes sophisticated use of Java and SGML for searching) and rendered using common commercial web browsers. The projects that functioned less successfully were encoded in SGML and were packaged with sophisticated custom fonts and specialised rendering technology: the Multidoc SGML browser in the case of the Piers Plowman Electronic Archive and the Dynatext display environment in the case of the Canterbury Tales Project. Both environments were extremely advanced for their day and allowed users to manipulate text in ways otherwise largely impossible before the development and widespread adoption of XML and XSL-enabled browsers.
Neither of these lessons seems very encouraging at first glance to medievalists engaged in or investigating the possibilities of using digital media for new projects. Like researchers in many humanities disciplines, medievalists tend to measure scholarly currency in terms of decades, not years or months. The standard study of the Old English poem Cædmon’s Hymn before my recent edition of the poem (O’Donnell 2005a) was published nearly 70 years ago. Reference works like Cappelli’s Dizionario di abbreviature latine ed italiane (first edition, 1899) or Ker’s Catalogue of manuscripts containing Anglo-Saxon (first edition, 1959) also commonly have venerable histories. In the case of the digital editions discussed above—especially those already showing evidence of technological obsolescence—it is an open question whether the scholarship they contain will be able to exert nearly the same long-term influence on their primary disciplines. Indeed, there is already some evidence that technological or rhetorical problems may be hindering the dissemination of at least some of these otherwise exemplary projects’ more important findings. Robinson, for example, reports that significant manuscript work by Daniel Mosser appearing in various editions of the Canterbury Tales Project is cited far less often than the importance of its findings warrant (Robinson 2005: § 11).
The lesson one should not draw from these and other pioneering digital editions, however, is that digital projects are inevitably doomed to early irrelevance and undeserved lack of disciplinary impact. The history of digital medieval scholarship extends back almost six decades to the beginnings of the Index Thomisticus by Roberto Busa in the mid 1940s (see Fraser 1998 for a brief history). Despite fundamental changes in focus, tools, and methods, projects completed during this time show enough variety to allow us to draw positive as well as negative lessons for future work. Some digital projects, such as the now more than thirty-year-old Dictionary of Old English (DOE), have proven themselves able to adapt to changing technology and have had an impact on their disciplines—and longevity—as great as the best scholarship developed and disseminated in print. Projects which have proven less able to avoid technological obsolescence have nevertheless also often had a great effect on our understanding of our disciplines, and, in the problems they have encountered, can also offer us some cautionary lessons (see Keene n.d. for a useful primer in conservation issues and digital texts).
Premature obsolescence: The failure of the Information Machine
Before discussing the positive lessons to be learned from digital medieval projects that have succeeded in avoiding technological obsolescence or looking ahead to examine trends that future digital projects will need to keep in mind, it is worthwhile considering the nature of the problems faced by digital medieval projects that have achieved more limited impact or aged more quickly than the intrinsic quality of their scholarship or relevance might otherwise warrant—although in discussing projects this way, it is important to realise that the authors of these often self-consciously experimental projects have not always aimed at achieving the standard we are using to judge their success: longevity and impact equal to that of major works of print-originated and disseminated scholarship in the principal medieval discipline.
In order to do so, however, we first need to distinguish among different types of obsolescence. One kind of obsolescence occurs when changes in computing hardware, software, or approach render a project’s content unusable without heroic efforts at recovery. The most famous example of this type is the Electronic Domesday Book, a project initiated by the BBC in celebration of the nine hundredth anniversary of King William’s original inventory of post-conquest Britain (Finney 1986-2006; see O’Donnell 2004 for a discussion). The shortcomings of this project have been widely reported: it was published on video disks that could only be read using a customised disk player; its software was designed to function on the BBC Master personal computer—a computer that at the time was more popular in schools and libraries in the United Kingdom than any competing system but is now hopelessly obsolete. Costing over £ 2.5 million, the project was designed to showcase technology that it was thought might prove useful to schools, governments, and museums interested in producing smaller projects using the same innovative virtual reality environment. Unfortunately, the hardware proved too expensive for most members of its intended market and very few people ended up seeing the final product. For sixteen years, the only way of accessing the project was via one of a dwindling number of the original computers and disk readers. More recently, after nearly a year of work by an international team of engineers, large parts of the project’s content finally has been converted for use on contemporary computer systems.
The Domesday project is a spectacular example of the most serious kind of technological obsolescence, but it is hardly unique. Most scholars now in their forties and fifties probably have disks lying around their studies containing information that is for all intents and purposes lost due to technological obsolescence—content written using word processors or personal computer database programmes that are no longer maintained, recorded on difficult to read media, or produced using computers or operating systems that ultimately lost out to more popular competitors. But the Domesday project did not become obsolete solely because it gambled on the wrong technology: many other digital projects of the time, some written for main-frame computers using languages and operating systems that that are still widely understood, have suffered a similar obsolescence even though their content theoretically could be recovered more easily.
In fact the Domesday Book project also suffered from an obsolescence of approach—the result of a fundamental and still ongoing change in how medievalists and others working with digital media approach digitisation. Before the second half of the 1980s, digital projects were generally conceived of information machines_—programs in which content was understood to have little value outside of its immediate processing context. In such cases, the goal was understood to be the sharing of results rather than content. Sometimes, as in the case of the Domesday Book, the goal was the arrangement of underlying data in a specific (and closed) display environment; more commonly, the intended result was statistical information about language usage and authorship or the development of indices and concordances (see for example, the table of contents in Patton and Holoien 1981, which consists entirely of database, concordance, and statistical projects). Regardless of the specific processing goal, this approach tended to see data as raw material rather than an end result2. Collection and digitisation were done with an eye to the immediate needs of the processor, rather than the representation of intrinsic form and content. Information not required for the task at hand was ignored. Texts encoded for use with concordance or corpus software, for example, commonly ignored capitalisation, punctuation, or mise-en-page. Texts encoded for interactive display were structured in ways suited to the planned output (see for example the description of database organisation and video collection in Finney 1986-2006). What information was recorded was often indicated using _ad hoc and poorly documented tokens and codes whose meaning now can be difficult or impossible to recover (see Cummings 2006).
The problem with this approach is that technology ages faster than information: data that require a specific processing context in order to be understood will become unintelligible far more rapidly than information that has been described as much as possible in its own terms without reference to a specific processing outcome. By organising and encoding their content so directly to suit the needs of a specific processor, information machines like the Domesday Project condemned themselves to relatively rapid technological obsolescence.
Content as end-product: Browser-based projects
The age of the information machine began to close with the development and popular acceptance of the first Internet browsers in the early 1990s. In an information machine, developers have great control over both their processor and how their data is encoded. They can alter their encoding to suit the needs of their processors and develop or customise processors to work with specific instances of data. Developers working with browsers, however, have far less control over either element: users interact with projects using their own software and require content prepared in ways compatible with their processor. This both makes it much more difficult for designers to produce predictable results of any sophistication and requires them to adhere to standard ways of describing common phenomena. It also changes the focus of project design: where once developers focussed on producing results, they now tend to concentrate instead on providing content.
This change in approach explains in large part the relative technological longevity of the projects by McGillivray and Kiernan. Both were developed during the initial wave of popular excitement at the commercialisation of the Internet. Both were designed to be used without modification by standard Internet web browsers operating on the end-users’ computer and written in standard languages using a standard character set recognised by all Internet browsers to this day. For this reason—and despite the fact that browsers available in the late 1990s were quite primitive by today’s standards—it seems very unlikely that either project in the foreseeable future will need anything like the same kind of intensive recovery effort required by the Domesday Project: modern browsers are still able to read early HTML-encoded pages and Java routines, and are likely to continue to do so, regardless of changes in operating system or hardware, as long as the Internet exists in its current form. Even in the unlikely event that technological changes render HTML-encoded documents unusable in our lifetime, conversion will not be difficult. HTML is a text-based language that can easily be transformed by any number of scripting languages. Since HTML-encoded files are in no way operating system or software dependent, future generations—in contrast to the engineers responsible for converting the Electronic Domesday Book—will be able to convert the projects by Kiernan and McGillivray to new formats without any need to reconstruct the original processing environment.
The separation of content from processor did not begin with the rise of Internet browsers. HTML, the language which made the development of such browsers possible, is itself derived from work on standardised structural mark-up languages in the 1960s through the 1980s. These languages, the most developed and widely used at the time being Standard General Mark-up Language (SGML), required developers to make a rigid distinction between a document’s content and its appearance. Content and structure were encoded according to the intrinsic nature of the information and interests of the encoder using a suitable standard mark-up language. How this mark-up was to be used and understood was left up to the processor: in a web browser, the mark-up could be used to determine the text’s appearance on the screen; in a database program it might serve to delimit it into distinct fields. For documents encoded in early HTML (which used a small number of standard elements), the most common processor was the web browser, which formatted content for display for the most part without specific instructions from the content developer: having described a section of text using an appropriate HTML tag as 〈i〉 (italic) or 〈b〉 (bold), developers were supposed for the most part to leave decisions about specific details of size, font, and position up to the relatively predictable internal stylesheets of the user’s browser (though of course many early webpages misused structural elements like 〈table〉 to encode appearance).
SGML was more sophisticated than HTML in that it described how mark-up systems were to be built rather than their specific content. This allowed developers to create custom sets of structural elements that more accurately reflected the qualities they wished to describe in the content they were encoding. SGML languages like DocBook were developed for the needs of technical and other publishers; the Text Encoding Initiative (TEI) produced a comprehensive set of structural elements suitable for the encoding of texts for use in scholarly environments. Unfortunately, however, this flexibility also made it difficult to share content with others. Having designed their own sets of structural elements, developers could not be certain their users would have access to software that knew how to process them.
The result was a partial return to the model of the information machine: in order to ensure their work could be used, developers of SGML projects intended for wide distribution tended to package their projects with specific (usually proprietary) software, fonts, and processing instructions. While the theoretical separation of content and processor represented an improvement over that taken by previous generations of digital projects in that it treated content as having intrinsic value outside the immediate processing context, the practical need to supply users with special software capable of rendering or otherwise processing this content tended nevertheless to tie the projects’ immediate usefulness to the lifespan and weaknesses of the associated software. This is a less serious type of obsolescence, since rescuing information from projects that suffer from it involves nothing like the technological CPR required to recover the Domesday Project. But the fact that it must occur at all almost certainly limits these projects’ longevity and disciplinary impact. Users who must convert a project from one format to another or work with incomplete or partially broken rendering almost certainly are going to prefer texts and scholarship in more convenient formats.
XML, XSLT, Unicode, and related technologies
Developments of the last half-decade have largely eliminated the problem these pioneering SGML-based projects faced in distributing their projects to a general audience. The widespread adoption of XML, XSLT, Unicode, and similarly robust international standards on the Internet means that scholars developing new digital projects now can produce content using mark-up as flexible and sophisticated as anything possible in SGML without worrying that their users will lack the necessary software to display and otherwise process it. Just as the projects by Kiernan and McGillivray were able to avoid premature technological obsolescence by assuming users would make use of widely available Internet browsers, so to designers of XML-based projects can now increase their odds of avoiding early obsolescence by taking advantage of the ubiquitousness of the new generation of XML-, XSLT-, and Unicode-aware Internet clients3.
Tools and community support
The fact that these technologies have been so widely accepted in both industry and the scholarly world has other implications beyond making digital projects easier to distribute, however. The establishment of robust and stable standards for structural mark-up has also encouraged the development of a wide range of tools and organisations that also make such projects easier to develop.
Perhaps the most striking change lies in the development of tools. When I began my SGML-based edition Cædmon’s Hymn in 1997, the only SGML-aware and TEI-compatible tools I had at my disposal were GNU-Emacs, an open source text editor, and the Panorama and later Multidoc SGML browsers (what other commercial tools and environments were available were far beyond the budget of my one scholar project). None of these were very user friendly. Gnu-Emacs, though extremely powerful, was far more difficult to set up and operate than the word processors, spreadsheets, and processors I had been accustomed to use up to that point. The Panorama and Multidoc browsers used proprietary languages to interpret SGML that had relatively few experienced users and a very limited basis of support. There were other often quite sophisticated tools and other kinds of software available, including some—such as TACT, Collate, TUSTEP, and various specialised fonts like Peter Baker’s original Times Old English—that were aimed primarily at medievalists or developers of scholarly digital projects. Almost all of these, however, required users to encode their data in specific and almost invariably incompatible ways. Often, moreover, the tool itself also was intended for distribution to the end user—once again causing developers to run the risk of premature technological obsolescence.
Today, developers of new scholarly digital projects have access to a far wider range of general and specialised XML-aware tools. In addition to GNU-Emacs—which remains a powerful editor and has become considerably more easy to set up on most operating systems—there are a number of full-featured, easy to use, open source or relatively inexpensive commercial XML-aware editing environments available including Oxygen, Serna, and Screem. There are also quite a number of well-designed tools aimed at solving more specialised problems in the production of scholarly projects. Several of these, such as Anastasia and Edition Production and Presentation Technology (EPPT), have been designed by medievalists. Others, such as the University of Victoria’s Image Markup Tool and other tools under development by the TAPoR project, have been developed by scholars in related disciplines.
More significantly, these tools avoid most of the problems associated with those of previous decades. All the tools mentioned in the previous paragraph (including the commercial tools) are XML-based and have built-in support for TEI XML, the standard structural markup language for scholarly projects (this is also true of TUSTEP, which has been updated continuously). This means both that they can often be used on the same underlying content and that developers can encode their text to reflect their interests or the nature of the primary source rather than to suit the requirements of a specific tool. In addition, almost all are aimed at the developer rather than the end user. With the exception of Anastasia and EPPT, which all involve display environments, none of the tools mentioned above is intended for distribution with the final project. Although these tools—many of which are currently in the beta stage of development—ultimately will become obsolete, the fact that almost all are now standards compliant means that the content they produce almost certainly will survive far longer.
A second area in which the existence of stable and widely recognised standards has helped medievalists working with digital projects has been in the establishment of community-based support and development groups. Although Humanities Computing, like most other scholarly disciplines, has long had scholarly associations to represent the interests of their members and foster exchanges of information (e.g. Association for Literary and Linguistic Computing [ALLC]; Society for Digital Humanities / Société pour l‘étude des médias interactifs [SDH-SEMI]), the last half-decade has also seen the rise of a number of smaller formal and informal Communities of Practice aimed at establishing standards and providing technological assistance to scholars working in more narrowly defined disciplinary areas. Among the oldest of these are Humanist-l and the TEI—both of which pre-date the development of XML by a considerable period of time. Other community groups, usually more narrow in focus and generally formed after the development of XML, Unicode, and related technologies, include MENOTA (MEdieval and NOrse Text Archive), publishers of the Menota handbook: Guidelines for the encoding of medieval Nordic primary sources; MUFI (Medieval Unicode Font Initiative), an organisation dedicated to the development of solutions to character encoding issues in the representation of characters in medieval Latin manuscripts; and the Digital Medievalist, a community of practice aimed at helping scholars meet the increasingly sophisticated demands faced by designers of contemporary digital projects, that organises a journal, wiki, and mailing list devoted to the establishment and publication of best practice in the production of digital medieval resources.
These tools and organisations have helped reduce considerably the technological burden placed on contemporary designers of digital resources. As Peter Robinson has argued, digital projects will not come completely into their own until “the tools and distribution… [are] such that any scholar with the disciplinary skills to make an edition in print can be assured he or she will have access to the tools and distribution necessary to make it in the electronic medium” (Robinson 2005: abstract). We are still a considerable way away from this ideal and in my view unlikely to reach it before a basic competence in Humanities computing technologies is seen as an essential research skill for our graduate and advanced undergraduate students. But we are also much farther along than we were even a half-decade ago. Developers considering a new digital project can begin now confident that they will be able to devote a far larger proportion of their time to working on disciplinary content—their scholarship and editorial work—than was possible even five years ago. They have access to tools that automate many jobs that used to require special technical know-how or support. The technology they are using is extremely popular and well-supported in the commercial and academic worlds. And, through communities of practice like the Text Encoding Initiative, Menota, and the Digital Medievalist Project, they have access to support from colleagues working on similar problems around the globe.
Future Trends: Editing non-textual objects
With the development and widespread adoption of XML, XSLT, Unicode, and related technologies, text-based digital medieval projects can be said to have emerged from the incunabula stage of their technological development. Although there remain one or two ongoing projects that have resisted incorporating these standards, there is no longer any serious question as to the basic technological underpinnings of new text-based digital projects. We are also beginning to see a practical consensus as to the basic generic expectations for the “Electronic edition”: such editions almost invariably include access to transcriptions and full colour facsimiles of all known primary sources, methods of comparing the texts of individual sources interactively, and, in most cases, some kind of guide, reading, or editorial text. There is still considerable difference in the details of interface (Rosselli Del Turco 2006), mise en moniteur, and approach to collation and recension. But on the whole, most developers and presumably a large number of users seem to have an increasingly strong sense of what a text-based digital edition should look like.
Image, Sound, and Animation: Return of the information machine?
Things are less clear when digital projects turn to non-textual material. While basic and widely accepted standards exist for the encoding of sounds and 2D and 3D graphics, there is far less agreement as to the standards that are to be used in presenting such material to the end user. As a result, editions of non-textual material often have more in common with the information machines of the 1980s than contemporary XML-based textual editions. Currently, most such projects appear to be built using Adobe’s proprietary Flash and Shockwave formats (e.g. Foys 2003; Reed Kline 2001). Gaming applications, 3D applications, and immersive environments use proprietary environments such as Flash and Unreal Engine or custom-designed software. In each case, the long-term durability and cross-platform operability of projects produced in these environments is tied to that of the software for which they are written. All of these formats require proprietary viewers, none of which are shipped as a standard part of most operating systems. As with the BBC Domesday Project, restoring content published in many of these formats ultimately may require restoration of the original hard- and software environment.
Using technology to guide the reader: Three examples4
Current editions of non-textual material resemble information machines in another way, as well: they tend to be over-designed. Because developers of such projects write for specific processors, they—like developers of information machines of the 1980s—are able to control the end-user’s experience with great precision. They can place objects in precise locations on the user’s screen, allow or prevent certain types of navigation, and animate common user tasks.
When handled well, such control can enhance contemporary users’ experience of the project. Martin Foy’s 2003 edition of the Bayeux Tapestry, for example, uses Flash animation to create a custom-designed browsing environment that allows the user to consult the Bayeux Tapestry as a medieval audience might—by moving back and forth apparently seamlessly along its 68 metre length. The opening screen shows a section from the facsimile above a plot-line that provides an overview of the Tapestry’s entire contents in a single screen. Users can navigate the Tapestry scene-by-scene using arrow buttons at the bottom left of the browser window, centimetre by centimetre using a slider on the plot-line, or by jumping directly to an arbitrary point on the tapestry by clicking on the plot-line at the desired location. Tools, background information, other facsimiles of the tapestry, scene synopses, and notes are accessed through buttons at the bottom left corner of the browser. The first three types of material are presented in a separate window when chosen; the last two appear under the edition’s plot-line. Additional utilities include a tool for making slideshows that allows users to reorder panels to suit their own needs.
If such control can enhance a project’s appearance, it can also get in the way—encouraging developers to include effects for their own sake, or to control end-users’ access to the underlying information unnecessarily. The British Library Turning the Pages series, for example, allows readers to mimic the action of turning pages in an otherwise straightforward photographic manuscript facsimile. When users click on the top or bottom corner of the manuscript page and drag the cursor to the opposite side of the book, they are presented with an animation showing the page being turned over. If they release the mouse button before the page has been pulled approximately 40% of the way across the visible page spread, virtual “gravity” takes over and the page falls back into its original position.
This is an amusing toy and well suited to its intended purpose as an “interactive program that allows museums and libraries to give members of the public access to precious books while keeping the originals safely under glass” (British Library n.d.). It comes, however, at a steep cost: the page-turning system uses an immense amount of memory and processing power—the British Library estimates up to 1 GB of RAM for high quality images on a stand alone machine—and the underlying software used for the Internet presentation, Adobe Shockwave, is not licensed for use on all computer operating systems (oddly, the non-Shockwave Internet version uses Windows Media Player, another proprietary system that shares the same gaps in licensing). The requirement that users drag pages across the screen, moreover, makes paging through an edition unnecessarily time- and attention-consuming: having performed an action that indicates that they wish an event to occur (clicking on the page in question), users are then required to perform additional complex actions (holding the mouse button down while dragging the page across the screen) in order to effect the desired result. What was initially an amusing diversion rapidly becomes a major and unnecessary irritation.
More intellectually serious problems can arise as well. In A Wheel of Memory: The Hereford Mappamundi (Reed Kline 2001), Flash animation is used to control how the user experiences the edition’s content—allowing certain approaches and preventing others. Seeing the Mappamundi “as a conceit for the exploration of the medieval collective memory… using our own collective rota of knowledge, the CD-ROM” (§ I [audio]), the edition displays images from the map and associated documents in a custom-designed viewing area that is itself in part a rota. Editorial material is arranged as a series of chapters and thematically organised explorations of different medieval Worlds: World of the Animals, World of the Strange Races, World of Alexander the Great, etc. With the exception of four numbered chapters, the edition makes heavy use of the possibilities for non-linear browsing inherent in the digital medium to organise its more than 1000 text and image files.
Unfortunately, and despite its high production values and heavy reliance on a non-linear structural conceit, the edition itself is next-to-impossible to use or navigate in ways not anticipated by the project designers. Text and narration are keyed to specific elements of the map and edition and vanish if the user strays from the relevant hotspot: because of this close integration of text and image it is impossible to compare text written about one area of the map with a facsimile of another. The facsimile itself is also very difficult to study. The customised viewing area is of a fixed size (I estimate approximately 615×460 pixels) with more than half this surface given over to background and navigation: when the user chooses to view the whole map on screen, the 4 foot wide original is reproduced with a diameter of less than 350 pixels (approximately 1/10 actual size). Even then, it remains impossible to display the map in its entirety: in keeping with the project’s rota conceit, the facsimile viewing area is circular even though the Hereford map itself is pentagonal: try as I might, I never have been able to get a clear view of the border and image in the facsimile’s top corner.
Future standards for non-textual editions?
It is difficult to see at this point how scholarly editions involving non-textual material ultimately will evolve. Projects that work most impressively right now use proprietary software and viewers (and face an obvious danger of premature obsolescence as a result); projects that adhere to today’s non-proprietary standards for the display and manipulation of images, animation, and sound currently are in a situation analogous to that of the early SGML-based editions: on the one hand, their adherence to open standards presumably will help ensure their data is easily converted to more popular and better supported standards once these develop; on the other hand, the lack of current popular support means that such projects must supply their own processing software—which means tying their short term fate to the success and flexibility of a specific processor. Projects in this field will have emerged from the period of their technological infancy when designers can concentrate on their content, safe in the assumption that users will have easy access to appropriate standards-based processing software on their own computers.
Collaborative content development
The development of structural markup languages like HTML were crucial to the success of the Internet because they allowed for unnegotiated interaction between developers and users. Developers produce content assuming users will be able to process it; users access content assuming it will be suitable for use with their processors. Except when questions of copyright, confidentiality, or commerce intervene, contact between developers and users can be limited to little more than the purchase of a CD-ROM or transfer of files from server to browser.
The last few years have seen a movement towards applying this model to content development as well. Inspired by the availability of well-described and universally recognised encoding standards and encouraged no doubt by the success of the Wikipedia and the open source software movement, many projects now are looking for ways to provide for the addition and publication of user-contributed content or the incorporation of work by other scholars. Such contributions might take the form of notes and annotations, additional texts and essays, links to external resources, and corrections or revision of incorrect or outdated material.
An early, pre-wiki, model of this approach is the Online Reference Book for Medieval Studies (ORB). Founded in 1995 and run by a board of section editors, ORB provides a forum for the development and exchange digital content by and for medievalists. Contributors range from senior scholars to graduate students and interested amateurs; their contributions belong of a wide variety of genres: encyclopaedia-like articles, electronic primary texts, on-line textbooks and monographs, sample syllabi, research guides, and resources for the non-specialist. Despite this, the project itself is administered much like a traditional print-based encyclopaedia: it is run by an editorial board that is responsible for soliciting, vetting, and editing contributions before they are published.
More recently, scholars have been exploring the possibilities of a different, unnegotiated approach to collaboration. One model is the Wikipedia—an on-line reference source that allows users to contribute and edit articles with little editorial oversight. This approach is frequently used on a smaller scale for the construction of more specialised reference works: the Digital Medievalist, for example, is using wiki software to build a community resource for medievalists who use digital media in their research, study, or teaching. Currently, the wiki contains descriptions of projects and publications, conference programmes, calls for papers, and advice on best practice in various technological areas.
Other groups, such as a number of projects at the Brown Virtual Humanities Lab, are working on the development of mechanisms by which members of the community can make more substantial contributions to the development of primary and secondary sources. In this case, users may apply for permission to contribute annotations to the textual database, discussing differences of opinion or evidence in an associated discussion form (Armstrong and Zafrin 2005; Riva 2006).
A recent proposal by Espen Ore suggests an even more radical approach: the design of unnegotiated collaborative editions—i.e. projects that are built with the assumption that others will add to, edit, and revise the core editorial material: texts, introductory material, glossaries and apparatus (Ore 2004). In a similar approach, the Visionary Rood Project has proposed building its multi-object edition using an extensible architecture that will allow users to associate their own projects with others to form a matrix of interrelated objects, texts, and commentary (Karkov, O’Donnell, Rosselli Del Turco, et al 2006). Peter Robinson has recently proposed the development of tools that would allow this type of editorial collaboration to take place (Robinson 2005).
These approaches to collaboration are still very much in their earliest stages of development. While the technology already exists to enable such community participation in the development of intellectual content, questions of quality control, intellectual responsibility, and especially incentives for participation remain very much unsettled. Professional scholars traditionally achieve success—both institutionally and in terms of reputation—by the quality and amount of their research publications. Community-based collaborative projects do not easily fit into this model. Project directors cannot easily claim intellectual responsibility for the contributions of others to “their” projects—reducing their value in a profession in which monographs are still seen as a standard measure of influence and achievement. And the type of contributions open to most participants—annotations, brief commentary, and editorial work—are difficult to use in building a scholarly reputation: the time when a carefully researched entry on the Wikipedia or annotation to an on-line collaborative edition will help scholars who are beginning or building their careers is still a long way away (see O’Donnell 2006 who discusses a number of the economic issues involved in collaborative digital models).
Digital scholarship in Medieval Studies has long involved finding an accommodation between the new and the durable. On the one hand, technology has allowed scholars to do far more than was ever possible in print. It has allowed them to build bigger concordances and more comprehensive dictionaries, to compile detailed statics about usage and dialectal spread, and to publish far more detailed collations, archives, and facsimiles. At the same time, however, the rapidly changing nature of this technology and its associated methods has brought with it the potential cost of premature obsolescence. While few projects, perhaps, have suffered quite so spectacularly as the BBC’s Domesday Book, many have suffered from an undeserved lack of attention or disciplinary impact due to technological problems. The emphasis on information as a raw material in the days before the development of structural mark-up languages often produced results of relatively narrow and short-term interest—often in the form of information machines that could not survive the obsolescence of their underlying technology without heroic and costly efforts at reconstruction. Even the development of early structural markup languages like SGML did not entirely solve this problem: while theoretically platform-independent and focussed on the development of content, SGML-based projects commonly required users to acquire specific and usually very specialised software for even the most basic processing and rendition.
Of the projects published in the initial years of the internet revolution, those that relied on the most widely supported technology and standards—HTML and the ubiquitous desktop Internet browsers—survived the best. The editions by Kiernan and McGillivray showcased by Solopova in her lecture that summer still function well—even if their user interfaces now look even more old fashioned two years on.
In as much as the new XML and Unicode-based technologies combine the flexibility and sophistication of SGML with the broad support of early HTML, text-based medieval digital scholarship is now leaving its most experimental period. There remain economic and rhetorical issues surrounding the best ways of delivering different types of scholarly content to professional and popular audiences; but on the whole the question of the core technologies required has been settled definitively.
The new areas of experimentation in medieval digital studies involve editions of non-textual material and the development of new collaborative models of publication and project development. Here technology both has even more to offer the digital scholar and carries with it even greater risks. On the one hand, the great strides made in computer-based animation, gaming, and 3-D imaging in the commercial world offer projects the chance to deal with material never before subject to the kind of thorough presentation now possible. We already have marvellous editions of objects—maps, tapestries, two dimensional images—that allow the user to explore their subjects in ways impossible in print. In the near future we can expect to see a greater use of 3D and gaming technology in the treatment of sculpture, archaeological digs, and even entire cities. With the use of wikis and similar types of collaborative technologies, such projects may also be able to capture much more of the knowledge of the disciplinary experts who make up their audiences.
For projects dealing with non-textual objects, the risk is that the current necessity of relying on proprietary software intended for the much shorter-term needs of professional game designers and computer animators will lead to the same kind of premature and catastrophic obsolescence brought on by the equally-advanced-for-its-day Domesday Project. Sixteen years from now, animation design suites like Director (the authoring suite used for producing Shockwave files) and gaming engines like Unreal engine (an authoring engine used to produce current generations of video games) are likely to be different from and perhaps incompatible with current versions in a way that XML authoring technologies and processors will not. While we can hope that reconstruction will not be as difficult as it proved to be in the case of the Domesday Project, it seems likely that few of today’s non-textual editions will still be working without problems at an equivalent point in their histories, two decades from now.
In the case of experimentation with collaborative software, the challenge is more economic and social than technological. In my experience, most professional scholars initially are extremely impressed by the possibilities offered by collaborative software like wikis and other forms of annotation engines—before almost immediately bumping up against the problems of prestige and quality control that currently make them infeasible as channels of high level scholarly communication. Indeed at one recent conference session I attended (on the future of collaborative software, no less!) the biggest laugh of the morning came when one of the speakers confessed to having devoted most of the previous month to researching and writing a long article for the Wikipedia on his particular specialism in Medieval Studies.
That current text-based digital editions seem likely to outlive the technology that produced them can be attributed to the pioneering efforts of the many scholars responsible for editions like those by Adams, Kiernan, McGillivray, and Robinson discussed by Solopova in her lecture. The current generation of scholars producing editions of non-textual objects and experimenting with collaborative forms of scholarship and publication are now filling a similar role. The solutions they are developing may or may not provide the final answers; but they certainly will provide a core of experimental practice upon which the final answers most certainly will be built.
1 The focus of this chapter is on theoretical and historical problems that have affected digital scholarship in Medieval studies in the past and likely to continue to do so for the foreseeable future. Scholars seeking more specific advice on technological problems or best practice have access to numerous excellent Humanities Computing societies, mailing lists, and internet sites. For some specific suggestions, see the section “Community Support,” pp. 000-000, below. I thank Roberto Rosselli Del Turco for his help with this article.
2 Exceptions to this generalisation prove the rule: pre-Internet age projects, such as the Dictionary of Old English (DOE) or Project Gutenberg that concentrated more on content than processing have aged much better than those that concentrated on processing rather than content. Both the DOE and Project Gutenberg, for example, have successfully migrated to HTML and now XML. The first volume of the DOE was published on microfiche in 1986—the same year as the BBC’s Domesday Book; on-line and CD-ROM versions were subsequently produced with relatively little effort. Project Gutenberg began with ASCII text in 1971.
3 Not all developers of XML-encoded medieval projects have taken this approach. Some continue to write for specific browsers and operating systems (e.g. Muir 2004a); others have developed or are in the process of developing their own display environments (e.g. Anastasia, Elwood [see Duggan and Lyman 2005: Appendix]). The advantage of this approach, of course, is that—as with information machines like the BBC Domesday Book—developers acquire great control over the end user’s experience (see for example McGillivray 2006 on Muir 2004b); the trade off, however, is likely to be more rapid than necessary technological obsolescence or increased maintenance costs in the future.
4 The discussion in this section has been adapted with permission from a much longer version in O’Donnell 2005b.
References and Further Reading
Organisations and Support
- Digital Medievalist. An international web-based Community of Practice for medievalists working with digital media. Operates a mailing list, peer-reviewed journal, and Wiki [http://www.digitalmedievalist.org/].
- Humanist-l. An international electronic seminar on humanities computing and the digital humanities [http://www.princeton.edu/humanist/].
- MENOTA (MEdieval and NOrse Text Archive), publishers of the Menota handbook: Guidelines for the encoding of medieval Nordic primary sources [http://www.menota.org/]
- MUFI (Medieval Unicode Font Initiative), an organisation dedicated to the development of solutions to character encoding issues in the representation of characters in medieval Latin manuscripts [http://gandalf.aksis.uib.no/mufi/].
- TEI (Text Encoding Initiative). An international and interdisciplinary standard that enables libraries, museums, publishers, and individual scholars to represent a variety of literary and linguistic texts for online research, teaching, and preservation. Also operates a mailing list [http://www.tei-c.org/].
- Adams, Robert, Hoyt N. Duggan, Eric Eliason, Ralph Hanna III, John Price-Wilkin, and Thorlac Turville-Petre (2000). Corpus Christi College Oxford MS 201 (F) [CD-ROM]. Ann Arbor: University of Michigan Press.
- Armstrong, Guyda and Vika Zafrin (2005). “Towards the electronic Esposizioni: the challenges of the online commentary”. Digital Medievalist 1.1 [Online Journal]. http://www.digitalmedievalist.org/article.cfm?RecID=1.
- British Library Board (n.d.). “Turning the Pages: Welcome” [Webpage]. http://www.armadillosystems.com/ttp_commercial/home.htm.
- Cummings, James (2006). “Liturgy, Drama, and the Archive: Three conversions from legacy formats to TEI XML”. Digital Medievalist 2.1 [Online Journal]. http://www.digitalmedievalist.org/article.cfm?RecID=11.
- Duggan, Hoyt N. (2005). “A Progress Report on The Piers Plowman Electronic Archive_” with a contribution by Eugene W. Lyman. _Digital Medievalist 1.1 [Online Journal]. http://www.digitalmedievalist.org/article.cfm?RecID=3
- Finney, Andy (1986-2006). “The Domesday Project” [Website]. http://www.atsf.co.uk/dottext/domesday.html.
- Foys, Martin K. (2003). The Bayeux Tapestry: Digital Edition [CD-ROM]. Leicester: SDE.
- Fraser, Michael (1998). “The Electronic Text and the Future of the Codex I: The History of the Electronic Text” [Unpublished Lecture]. History of the Book Seminar, Oxford University. January 1998. http://users.ox.ac.uk/~mikef/pubs/hob_fraser_1998.html
- Karkov, Catherine, Daniel Paul O’Donnell, Roberto Rosselli Del Turco, James Graham, and Wendy Osborn. 2006. “The Visionary Cross Project” [Webpage]. http://www.visionarycross.org/.
- Keene, Suzanne (n.d.). “Now You See It, Now You Won’t” [Webpage]. http://www.suzannekeene.info/conserve/digipres/index.htm.
- Kiernan, Kevin S. (1999). Electronic Beowulf [CD-ROM]. London: British Library.
- McGillivray, Murray (2006). [Review of Muir 2004b]. Digital Medievalist 2.1. http://www.digitalmedievalist.org/article.cfm?RecID=14
- McGillivray, Murray (1997). Geoffrey Chaucer’s Book of the Duchess: A Hypertext Edition [CD-ROM]. Calgary: University of Calgary Press.
- Muir, Bernard James (2004a). The Exeter anthology of Old English poetry: an edition of Exeter Dean and Chapter MS 3501. Rev. 2nd [CD-ROM] Edition. Exeter: Exeter University Press.
- Muir, Bernard James (2004b). A digital facsimile of Oxford, Bodleian Library MS. Junius 11. Software by Nick Kennedy. Bodleian Library Digital Texts 1. Oxford: Bodleian Library.
- O’Donnell, Daniel Paul (2006). “Why Should I Write for Your Wiki: Towards a New Economics of Academic Publishing.” Unpublished Lecture: “New Technologies and Renaissance Studies IV: Publication and New Forms of Collaboration”, 52nd Annual Meeting of the Renaissance Society of America, San Francisco CA. March 23.
- O’Donnell, Daniel Paul (2005a). Cædmon’s Hymn : A Multimedia Study, Archive and Edition. Society for early English and Norse electronic texts A.7. Cambridge and Rochester: D.S. Brewer in association with SEENET and the Medieval Academy.
- O’Donnell, Daniel Paul (2005b). “O Captain! My Captain! Using Technology to Guide Readers Through an Electronic Edition.” Heroic Age 8 [Online Journal]. http://www.mun.ca/mst/heroicage/issues/8/em.html.
- O’Donnell, Daniel Paul (2004). “The Doomsday Machine, or, ‘If you build it, will they still come ten years from now?’: What Medievalists working in digital media can do to ensure the longevity of their research.” Heroic Age 7 [Online Journal]. http://www.mun.ca/mst/heroicage/issues/7/ecolumn.html.
- Ore, Espen S. (2004). “Monkey Business—or What is an Edition?” Literary and Linguist Computing. 19: 35-4.
- Patton, Peter C. and Renee A. Holoien, ed. (1981). Computing in the Humanities Lexington, Mass.: Lexington Books.
- Reed Kline, Naomi (2001). A Wheel of Memory: The Hereford Mappamundi [CD-ROM]. Ann Arbor: University of Michigan Press.
- Riva, Massimo (2006). “Online Resources for Collaborative Research: The Pico Project at Brown University”. Unpublished Lecture: “New Technologies and Renaissance Studies IV: Publication and New Forms of Collaboration”, 52nd Annual Meeting of the Renaissance Society of America, San Francisco CA. March 23.
- Robinson, Peter (2005). “Current Issues in Making Digital Editions of Medieval Texts—or, Do Electronic Scholarly Editions have a Future?” Digital Medievalist 1.1 [Online journal]. http://www.digitalmedievalist.org/article.cfm?RecID=6.
- Robinson, Peter and N. F. Blake (1996). The Wife of Bath’s Prologue on CD-ROM. Canterbury Tales Project. Cambridge: Cambridge University Press.
- Rosselli Del Turco, Roberto. “After the Editing Is Done: Designing a Graphic User Interface for Digital Editions” [Unpublished lecture]. Delivered at: Session 640 “Digital Publication”, 41st International Congress on Medieval Studies, Western Michigan University, May 6.
Posted: Friday December 15, 2006. 13:17.
Last modified: Wednesday May 23, 2012. 20:09.