A First Law of Humanities Computing?

Posted: Jan 25, 2015 14:01;
Last Modified: Jun 03, 2020 10:06
Keywords:

The law

A little more than a decade ago, when I was working on my “electronic edition” of Cædmon’s Hymn, I developed a formulation that I have since come (only semi-jokingly) to consider something of a law about the use of computing in the Humanities (now updated!):

The application of computation to humanities problems inevitably requires an examination of first principles including the fundamental social, political, and economic rationale ~~why you are doing what you want to to in the first place~~ for the underlying (analogue) activity.

What I mean by this is that you can never just copy a technique from the pre-digital humanities into the digital space. If you try, you will inevitably find yourself thinking before long about fundamental questions of why, what, and how: why you want to do whatever it is you are doing, what it actually is that you are trying to accomplish, and how the thing you are trying to accomplish actually does what it is you think it does.

The law
An example: The Critical Apparatus
- Cædmon’s Hymn
Other examples
Origins
Predicative value
Further work

An example: The Critical Apparatus

This is perhaps easiest to see from some examples.

Cædmon’s Hymn

The first one I can think of, and the one that prompted my thinking on this in the first place, is the idea of a critical apparatus. The problem I was facing at the time was how to encode the textual apparatus for my digital edition of Cædmon’s Hymn. I had decided early on the in the process that this was going very much to be a “born digital” edition—i.e. that it was going to take full advantage of the power of computation (such as was available to me at the time) and that my humanities research techniques and methods would be optimised for that milieu.

Amongst other things, this meant, for example, that I wanted to generate my edition entirely from the original transcriptions (see also a later discussion of this HumanIT 8.1 [2005]): if an edition was an evidence-based mediated representation of a primary text (a first principle I began to develop as I thought more about it), then presumably that meant that its critical, scholarly text was, in some way, a processed version of the evidence of its witnesses (i.e. the surviving documents that contain its text). In theory, at least, this should mean that one could develop critical readings by processing the transcriptions: selecting specific readings as lemmas and varying the level of witness-specific and or abstract information one displayed about them.

The technology was not (and indeed is not) there at the time to allow me to do this entirely: capturing editorial decision-making in algorithmic form would be an AI task of the highest order. And I was working at the time in Standard Generalized Markup Language, a pre-XML markup language that was not nearly as well supported for even basic manipulations like textual transformations.

But there was one place where I thought I could experiment with this approach: the critical apparatus. Traditionally, a critical apparatus is the list of textual variants for individual words in a reading text that appear at the bottom of the page in research editions of literary and other works (an example, from an edition of Tennyson, can be found here). Because the print page is not dynamic, these apparatus were almost always selective: they would list only those readings the editors in question considered significant (usually described as “substantive”), organised (usually) in terms of families of different textual witnesses. Since the digital screen is dynamic, and since I had a full set of detailed transcriptions, I thought that I could improve on this by building a dynamic apparatus, one that allowed the user to select different levels of granularity in viewing the variants—from a full parallel text of every single reading to a highly selective list of only variants that affected sense or metre (a discussion of my ideas on this can be found in Literature Compass 7.2 [2010]).

And this is where what I’m calling the First Law of Humanities Computing came in. Because before I could automate this, I discovered that I needed to figure out what a critical apparatus actually was: how it was intellectually structured, what its hidden methodological premises were, and why it was where it was on the page (the results of this work can be found in Literary and Linguistic Computing 24.1 [2009]).

This is not something print editors really ever needed to concern themselves with. In print, the critical apparatus appears at the bottom of the page, probably above the footnotes, and its content is decided on a relatively ad hoc basis: you come up with some principle of organisation (what you do is entirely up to you and your reviewers) and then type the variants out. But more importantly, you are not a slave to your principle. If, when you type out the variants, you find one that is so interesting you’d like to show it, even though you normally wouldn’t, you are perfectly free to print it out: an ad hoc decision like this has no effect on the rest of your apparatus (there is no algorithm pumping out the variants automatically) and the only danger is that your reader is some how confused by the inconsistency. Even extremely methodological editions (such as the Talbot and Donaldson edition of Piers Plowman, which developed a relatively algorithmic theory of usus scribendi to understand the variation of its tradition) are free to be ad hoc in print in a way that is not possible in a digital processing environment.

In contrast, my original idea of automating the apparatus, forced me to think extremely fundamentally about the nature of the edition: about the relationship between the reading text and its textual evidence, between the reading text and the apparatus, and between the apparatus and the body of textual evidence upon which the apparatus is based. As I argued in the LLC article cited above, my conclusion was that both the reading text of an edition and its textual apparatus were, intellectually speaking, really processed view of an implicit textual database: in the case of the text, the view was the result of a query that was programmed to result in a single reading for each word in the text (e.g. a student edition might be understood as a query that requested “the most sensible form that matches the paradigms in standard textbooks”); in the case of the apparatus, it was the result of a query that asked for “all readings that differ in terms of sense or metre,” perhaps, or “all readings that differ in terms of spelling from the manuscript I’ve identified as the lemma.”

Other examples

For our purposes here, it doesn’t really matter if I am correct in my understanding of the nature of an edition (though I still find my arguments convincing). The point is that it was the application of computing that required me to think fundamentally about what a reading text and critical apparatus was and their relationship to each other in ways I never would have if I were working in a print medium.

I think there are other examples of this law at work: John Unsworth’s paper on “Scholarly Primitives,” for example, seems to me to be a perfect example of its effect as does Espen Ore’s Monkey Business: or What is an edition. I’m sure there are others that range farther that I know at the moment.

Origins

If I’m right about this “law,” then I suspect it must some how be related to the concept from Media Studies of “remediation”: i.e. the process by which works are refashioned when they move from one medium to another. It isn’t strictly-speaking an example of remediation (or at least not always), because it doesn’t always involve representation: Unsworth, after all, is talking about basic methodological principles.

It also must have to do with the contrast between the way digital and non-digital media function. In a non-digital world, we can work by analogy and intuition. The traditional critical edition and critical apparatus I was examining in developing my edition of Cædmon’s Hymn was the result of nearly 400 years of analogic and intuitive development. Although there are methodological studies of textual criticism (quite a lot, actually, particularly from the twentieth century), the actual form of the edition and apparatus developed far more organically: people learned by looking at other editions and imitating roughly what they saw and thought worked; at no point (that I am aware of, at least), did somebody sit down and actually design ex nihilo the processes and devices we have gradually come to accept as “the edition.”

In the digital world, however, we work by design and algorithm. It is possible to simply imitate print on the screen (this is, after all, what the PDF does). But in such cases, you are not really taking fundamental advantage of the power of the processor: a PDF is a little better than a print page in the sense that you can search it; but otherwise, it doesn’t really improve on it very much. Once you decide, however, to really use the power of the processor—by designing an output format that can adapt to different shapes and sizes of displays, for example—you end up having to work far more deliberately than you could in print. The modern Web Page, for example, involves some fundamental thinking about structure and appearance (and more importantly, the separation of the two) that simply was never required in print.

Predicative value

If I’m right that there is a Law of Humanities Computing, then the law should also have predicative value. This is something only experience will show, but I think it does. The past couple of days, for example, the executive at Force11.org has been engaged in a discussion of scientific authorship and credit. Initially, this involved thinking how one might compute this better—i.e. better indicate what people have done, increase the granularity of the scientific byline, etc. But as the discussion progressed, we also began to find ourselves discussing far more fundamental issues: “what is authorship in a scientific context?” “why do we care about attribution?” “how are bylines actually used in institutional contexts.”

Further work

This law began as something of a joke. I’ve been using it with my students for most of the last decade, but referred to it in a research context for the first time probably the year before last at the I Seminário Internacional em Humanidades Digitais no Brasil at the Universidade de São Paulo I still can’t make up my mind if it is real or not… or if anybody else has already described or explained it.

I’d be really interested in further examples or alternate formulations!

Daniel Paul O'Donnell