Reverse detail from Kakelbont MS 1, a fifteenth-century French Psalter. This image is in the public domain. Daniel Paul O'Donnell

Forward to Navigation

Soup to nuts: A recent piece of my writing that technology allows you to follow from idea to completion.

Posted: Oct 27, 2016 17:10;
Last Modified: Jun 25, 2017 16:06

---

I was discussing writing and editing with a student the other day, and somehow the question of how I worked came up. As it turns out, I have a very recent example where you can pretty much follow the entire process from start to finish.

In showing all my work like this, I’m not making any claims about the quality of my own writing or the efficacy of my method. It is just the case that in this case, modern technology allows me to show the entire process I happened to use in writing a specific piece that people can read in its final form. For some students, I suspect that’s useful.

If you are interested, here are the relevant links to my recent Globe and Mail Op-Ed on “preferred pronouns” and the entire history of its drafting (because I wrote it in Google Docs, you can follow the whole history from start to finish). If you want to follow the revision history, you can find it under “File>See revision history” or by using alt-ctl-shift-h.

It looks like it took me a little more than about 12-14 hours to write, though I don’t remember how long I spent on the notes. It came in in its published form at about 800 words. 12-14 hours is a little long for me for writing an op-ed—I usually do them in about 1 day (so say 6 hours or so). But I found this one hard to write.

Tools that I used were the following:

Here are the different versions:


----  

Cædmon Citation Network - Week 14

Posted: Sep 02, 2016 18:09;
Last Modified: Sep 02, 2016 18:09

---

Hi all!

I spent this week putting information into the newly updated database. It works much faster than it did before, and is very intuitive to use. Dan mentioned that he would like to see some screenshots, so please enjoy the following images:

Here we see the front page of the database, with two text boxes, one for the Source and one for the Reference.

Options will pop up after you begin typing which makes adding sources and references super quick.

The Location box allows you to type the page number on which you found the reference in your source material (I simply type the number without any “p.” or “pg” preceding it) and the drop down box allows you to choose whether the reference is a Text Quote, Text Mention, Scholarly Reference, or Other Reference.

Clicking on the “View Entries” link allows you to view all of the entries that you have made. They are listed from oldest to newest in one big list.

So far I have had zero problems with the database, however I have been coming across a few snags with regards to gathering references from the sources. To use this first article by Lenore Abraham as an example, it is not noted anywhere which edition of Bede’s “History of the English Church and People” that she uses, she just simply gives the title. I am not sure how to figure this out, but feel that it is important to know as the edition cited is the most important piece of information that we are attempting to gather. I am concerned that a lot of other articles might omit this information as well, but I suppose we shall see as the collection continues. I was also curious as to whether or not we count the “about the author” blurbs when adding references. The beginnings of articles will occasionally list other pieces the author has published and I am not sure whether or not to count these as references. My initial instinct was to ignore them, as they do not necessarily have anything to do with the article in question, and if they are important they will be cited again further on, however I thought I would bring it up to be sure.

I am excited to continue collecting information. I will be back in Lethbridge for school on Tuesday, so I can start requesting inter-library loans again and keep our project rolling!

Until next week,

Colleen

----  

Cædmon Citation Network - Week 12+13

Posted: Aug 23, 2016 10:08;
Last Modified: Aug 23, 2016 10:08

---

Hi all!

Summer is winding to a close, and our project continues to progress. The database is working, and is currently being made faster for even easier use. Books and articles are still being collected and scanned, and I am trying to split my time between scanning sources and collecting data.

At our last meeting Dan and I went over the exact specifications for the references I am collecting. Information is sorted into four types:

Text Quotes (TQ)

Text Mentions ™

Scholarly References (SR)

Other References (OR)

Text Quotes and Text Mentions come from editions, facsimiles, translations, and manuscripts, and only refer to Cædmon’s Hymn itself. Quotes are direct quotations from the poem, while mentions are references to other editions.

Scholarly References will consist of references made to anything other than Cædmon’s own words. This can include books and articles about the hymn or other topics, as well as supplementary text from the editions of the hymn.

Other References is simply a catch-all category for anything that does not fit into the previous three categories.

Unfortunately I have been having laptop issues and had to reinstall the operating system on my computer, losing some programs in the process. I am not sure if this will affect my GLOBUS endpoint, but I will try transferring some files later to determine if I need to figure all of that out again.

My goals for this week are to scan the ILL books that I currently have checked out, transfer all the files I have scanned to GLOBUS, and fine tune the way I collect my data from the books and articles. I have been finding that it is quicker to write down a large chunk of references on paper and then input the info to the database in one go. This may change as Garret makes the database quicker. The faster version of the database should be ready this week, but I do not currently have access to it so today will be a scanning day. I also plan to request another chunk of ILL books and articles from the library.

For next week’s blog I hope to write a sort of how-to guide on collecting information from the sources and inputting the info into the database. As the new semester starts in two weeks, I will have less time to spend on the project and I believe Dan plans on hiring more students to help the collection go faster. The how-to guide should ensure that we are all collecting data in the same way, and should ease any confusion that might cause errors. As the semester progresses I and whoever else might be working on the project can go through the data collection at a steady pace, and I can continue to collect and scan the sources needed to complete the bibliography.

Things seem to be on track, and hopefully the transition into the new semester will be smooth!

Until next week,

Colleen

----  

Cædmon Citation Network - Week 11

Posted: Aug 05, 2016 13:08;
Last Modified: Aug 05, 2016 13:08

---

Hi all!

I have a very short blog post this week, as the week itself was very short. I spent the last few days collecting more sources, doing some scanning, and preparing to begin data collection.

The database should be up and running this weekend, meaning data collection can officially start next week. I will see Garret on Sunday and we will be able to do some test runs on the database to make sure it is working properly. We have been discussing its functions over video conference several times throughout the week, and it seems to be coming along very well!

Next week I will be splitting my time between continuing to collect sources and beginning data collection, a suggestion made by Dan during our last meeting. This will allow us to understand any flaws in our collection system earlier on, rather than waiting for EVERY source to be collected and scanned before we try out our system. I am optimistic that it should all go well, and will report back at the end of next week!

Until then,

Colleen

----  

Cædmon Citation Network - Week 10

Posted: Jul 25, 2016 10:07;
Last Modified: Jul 25, 2016 10:07

---

Hi all,

It is week 10 already, and I feel like I am nowhere near where I thought I would be with regards to this project. While the list of the sources we need for our data collection on Zotero are as complete as we can know at present, not everything on the list has been collected yet. I was in high spirits at the beginning of last week thinking that the collection of sources was nearly complete, however I realised later on that I had missed a good chunk of the list. It turned out that I had some filters set that were omitting a portion of the 700-ish books and articles. To make a long story short, more collection is still needed!

This will mean more inter-library loan books will need to be ordered and scanned, and more articles will need to be transferred to the GLOBUS folder. Thankfully the book scanner is back up and running again! If it holds out it should make the process painless and a good deal quicker than scanning things on the photocopier.

My plan for this week is to:

- finish scanning the inter-library loan books I currently have checked out (there are about four left to scan)

- finish collecting EVERYTHING on the Zotero list, keeping track of how many inter-library loan books are due to come in so I can account for future scanning time.

- transfer all electronic copies of articles and books to the GLOBUS folder (the internet guy is FINALLY coming on Tuesday to hook up my apartment, so I can work on this every night starting tomorrow)

- And then, if by some miracle I finish everything before the end of the week, I will begin data collection.

To be quite honest, I have been very frustrated with myself and the fact that I have not begun the data collection sooner. I suppose that collecting and organising hundreds of articles just takes longer than I imagined. I really have been working at it steadily throughout the summer, trying to maintain a level of organisation that allows information on the project to transfer easily between myself, Dan, Garret, and anyone else that might happen to work on the project. Although scanning and photocopying seem like menial tasks, I think I need to remind myself that such tasks take time and are necessary to keep our project organised and moving forward.

I do hope though that if Dan has any concerns with the pace of the project that he will let me know, as I do not want to drag things out way longer than he was expecting. The project IS moving forward, however slow it may have seemed the past couple of weeks. Although collecting and organising the sources is not the most exciting part of the job to write updates about, you all can be assured that it is almost complete and the data collection will begin very soon! I am very much looking forward to beginning this part of the project, seeing what we find, and facing the challenges that I am sure we will encounter.

Until later,

Colleen

----  

Cædmon Citation Network - Mini Update (Week 9)

Posted: Jul 22, 2016 12:07;
Last Modified: Jul 22, 2016 12:07

---

Hi all!

Just thought I would post a short update for you, as I was meant to have started reading and collecting data by this point. Unfortunately my efforts have been sabotaged by the library’s book scanner which has been refusing to work properly for me.

At the beginning of the week it worked beautifully for two batches of scanning, however on the third batch it kept kicking me out and deleting my work, saying that it did not have enough memory. The library staff was quick to look at it, but as the “book scanner expert” was not available that day, I had to wait for it to be fixed.

I busied myself with other work (it turns out that I was not quite finished collecting sources, there was a sizeable chunk that had escaped my notice!), and came back this morning with even more books to scan, but a new issue has arisen:

Now when I scan a batch the images show up on the screen, but it doesn’t register as having scanned them. The screen provides me with a page count, but no indication of how many megabytes have been scanned, so when I go to email the images it says “NO IMAGES SCANNED!”. The images have been scanned! I see them on there!

Anyway, the I.T. staff are on the case and will let me know when they get it working again. The scanner really does work wonderfully when it does work, and it is so much faster than a conventional scanner or photocopier. I will continue collecting sources today, and hopefully get a chance to use the scanner again before the library closes. I also plan to come in this weekend to try to catch up on the work that was lost throughout the week.

I feel bad that it is almost August and we are still not at the data collection point. Hopefully things will go a bit smoother once everything is scanned and organized!

Until next week,

Colleen

----  

Cædmon Citation Network - Week 9

Posted: Jul 18, 2016 09:07;
Last Modified: Jul 18, 2016 09:07

---

Hi all!

I finally get to start reading this week!!! While I am still not 100% complete in my sourcing of all the books and articles, it is looking as though I will definitely be able to start reading by Wednesday if not earlier.

I also have a bunch of books from inter-library loans that I need to scan portions of. That will be part of my job today.

The database will be ready this week as well. Garret says that there will be a few improvements that he will want to make, but I will be able to start using it this week. All the information that I collect will still be available as the database is upgraded.

You may have noticed that I have switched to blogging at the beginning of the week as opposed to the end. I have found that at this point it is more beneficial to myself to post at the start of the week outlining some goals and then adding an update post sometime during the middle of the week. I am going to continue this model for the next while.

Until next time!

Colleen

----  

Cædmon Citation Network - Week 8

Posted: Jul 11, 2016 10:07;
Last Modified: Jul 11, 2016 10:07

---

Hello!

Just a quick blog post this morning to give you an update of what’s to come this week:

I am continuing to gather all of the articles/books needed for the project, and hope to complete the search this week. There may be a few inter-library loans that we will be waiting on, but I would like everything else to be ready to go!

Not all the articles will be accessible on GLOBUS right away, as the transfers do not work on the university network and I am currently living without internet at my apartment (The horror! The horror!). I will be transferring them when I can, as free wi-fi will allow.

This means that reading and data collection can start next week! The database should be good to go by then as well. It is all coming together!

Until Friday,

Colleen

----  

Cædmon Citation Network - Week 7.5

Posted: Jul 07, 2016 13:07;
Last Modified: Jul 07, 2016 13:07

---

Hi all!
You might have noticed that I forgot to blog last week… This is true and totally my fault. I moved into a new apartment and in the process may have suffered a mild concussion. Oops! I have been keeping up with my work, however because I was working at random times of the day and night in chunks of a few hours each I definitely forgot to blog! So here is my update from the last two weeks:

Unfortunately I don’t have much news to report. I have been going through our Zotero bibliography and collecting missing articles through online databases and inter-library loans. It is going well, but it is taking a bit of time.
GLOBUS is now working for me thanks to Gurpreet’s help figuring out what was going on. It was mainly a permissions issue. I can now access our group research folder which is excellent.

The database is also on its way. I am expecting an update from Garret regarding that this weekend.

I will post another blog tomorrow to outline my goals for next week. In the meantime I will continue to pull articles to complete our pool of data.

Until tomorrow!
Colleen

----  

Cædmon Citation Network - Week 6

Posted: Jun 24, 2016 09:06;
Last Modified: Jun 24, 2016 09:06

---

Hi all!

This week I have been gathering sources for the pieces in our Cædmon bibliography. This is not a speedy task by any means! I admit that I have felt a bit impatient with myself and have been concerned that I should be at the point where I am gathering data by now, but I try to remind myself that it is important to make sure that we have a complete pool of sources from which to pull data, otherwise people could poke holes in our findings when we are all done. All of the proper experimental procedures that I learned way back in 7th grade science fair still apply here!

Dan gave me the key to the Digital Humanities lab on Monday, and I was able to go in and dig through Rachel’s drawer in the filing cabinet from last summer. I was excited to find that she had a ton of articles in there that simply need to be scanned. This will be time consuming, but worth it to have them all organized in the GLOBUS folder and accessible to everyone in our group. I am wondering if when I scan articles if there is a way for the pdf’s of the scans to be grouped together or if each individual page will have to be put in order on the computer… I will have to see!

I was having trouble with GLOBUS yesterday, so I am meeting Gurpreet this afternoon to figure out what’s wrong. I updated to the new version of Windows a few days ago and it is causing my computer major hassles. I doubt that’s why I can’t get GLOBUS to work, but I would still like to blame Windows anyway.

My goals for next week are to have all of the articles from Rachel’s drawer scanned and transferred to GLOBUS and for everything that we don’t have from the Cædmon bibliography be requested or found on the internet. I will have to motor, but I think it is do-able. The database should be ready for me to start reading/counting the following Monday, and from that point on I can read, count, and determine whether or not we will need extra students hired to help get these 700 articles read!

Until next week!

Colleen

----  

Cædmon Citation Network - Week 5

Posted: Jun 17, 2016 10:06;
Last Modified: Jun 17, 2016 10:06

---

Hi all!

Painfully short blog entry this week, I’m afraid. A lot has been accomplished this week, but there is not a lot to report.

The bibliography has been completed, with the final count being approximately 700 pieces of Cædmon scholarship. This number may increase or decrease as I read through the actual works. Some may have nothing to do with Cædmon (I erred on the side of having to much rather than too little), and others may point me in the direction of something I might have missed.

I have also begun to search out access to the pieces that make up the bibliography. This week I have been finding most things on JSTOR, but I expect that I will be requesting a lot of inter-library loans next week! Once I have found all I can find online I can start reading while I wait for the inter-library loans to come in. As the loans come in I will be splitting my days between scanning the loans and reading. (Note to self: locate that book scanner Dan told you about.)

The database to record what I find while I read is in the works as well. I should have an update from Garret early next week, so I will have more info on that in next week’s blog!

Until then!

Colleen

----  

Cædmon Citation Network - Week 4

Posted: Jun 11, 2016 10:06;
Last Modified: Jun 11, 2016 11:06

---

Hello!

This blog comes to you a day later than usual, as Friday’s work ended up taking a lot longer than I thought and I ran out of time! To be honest, this week was spent much like last week: checking our Zotero bibliography against other bibliographies of Cædmon scholarship.

I ended up re-doing a bit of my work from last week, as I learned in my meeting with Dan on Monday that our scope was a bit wider than I had previously thought. I was worried that I had not been considering certain entries in the various bibliographies to be “about Cædmon enough”, so I decided to go through the entries again and add some that I may have missed. It makes sense to add more rather than less, as I can simply remove an article from the list if I read it and realise it has nothing to do with Cædmon. At the moment our bibliography is almost complete, and we have nearly 700 entries!

What are we going to do with this giant list of articles and books? Well, firstly I have to acquire access to each entry, either via JSTOR, inter-library loans, or through one of our library’s other databases. Then I read through EVERYTHING and count each quote and mention of Cædmon and note which of the approximately sixty different editions of the Hymn are cited. We have also decided to try and note every other citation as well. For example if one article about “Cædmon’s Hymn” cites a book about the history of peanut butter sandwiches, I will take note of it, as there may be other pieces of Cædmon scholarship that also cite that book about the history of peanut butter sandwiches. It will be interesting to see if there are identifiable relationships between writing about Cædmon and seemingly unrelated topics – not peanut-butter-sandwich-history obviously, I just haven’t eaten breakfast yet so I am giving you a delicious example.

How am I going to keep track of all this? Good question! We will need a database that I can use to mark down each citation as I come across them in my reading. On Monday Dan and I discussed at length what we will need from this database, and how we would like it to work. At first we were hoping something on Google Forms would do the trick for us, however we discovered as we talked that we need more control over our information than this tool would allow.

One problem emerged when we realised that among our gigantic list of 700 articles (and books, etc) we would find certain works that were actually editions of the Hymn not included in our original list of editions. We would need a way to add this piece to the Editions list… Several other concerns were raised as well, but to be honest I am finding them difficult to explain without drawing you all a little picture. (I should ask Dan how to add images to these blog posts!)

I mentioned at some point that I would pick the brain of my boyfriend, Garret Johnson, who has his degree in Computer Science from the University of Lethbridge and is my go-to person whenever I have a question about these sorts of things. Dan suggested that he could hire Garret to build our database if he would be willing, as someone with a programming background could probably produce what we need a lot faster than either Dan or I working on it ourselves. So that is our current plan! Garret will begin building us a database that will suit our needs and my job for next week will be to start acquiring the 700 articles and books on our list. By the end of next week I am sure I will have thoroughly annoyed the librarians at school with the amount of inter-library loans I will be requesting.

Until next week!

Colleen

----  

Cædmon Citation Network - Week 3

Posted: Jun 03, 2016 10:06;
Last Modified: Jun 03, 2016 10:06

---

Hi all!

Another short post this week, but I will try to make up for it by posting more than one blog next week as I get further and further into the project!

Most of this week was spent methodically checking our body of Cædmon scholarship against various databases (all listed in my previous post). I felt a bit bad that it was going so slowly, as I do not want to lollygag in my work at all. Several things seemed to make the task slower than I hoped, however.

First of all, when I started going through the lists I would try and find access to each article or book that was missing from our body of scholarship as I became aware of it. I soon abandoned this practice and decided that I would create a running list of what we are missing FIRST, and then find access to these pieces as my next step.

I also found that many of the works we are missing were in a foreign language, which made my search for them (before I submitted to simply creating a list) more difficult. I will need to ask Dan if we are including foreign language articles in our data. If we are I will also need to figure out how I am going to comb the articles for quotes later on if we do decide to include them. I suppose quotes of the original will be written in Old English, so that is simple enough to pick out, but paraphrasing of the poem in something like Italian or German might prove difficult.

And finally I was a bit impeded by my own inability to figure out how to add new entries to our shared bibliography on Zotero. This is not a huge deal at the moment as I eventually decided to create a running list of missing pieces before finding sources, however I did waste quite a bit of time fighting with the program to add new entries before settling on this. This is something else that I’m sure Dan can help me with. I believe I might just need some sort of permission to add to the shared database.

In any case, next week should provide fodder for a some more interesting blog posts! It will be like a mystery: The Search For the Missing Cædmon Articles…

Until then,

Colleen

----  

Cædmon Citation Network - The Return

Posted: May 19, 2016 10:05;
Last Modified: May 19, 2016 11:05

---

Hello, Readers of Dan’s Blog!

My name is Colleen Copland, and I am a student of Dan’s who will be working with him on the Cædmon Citation Network which he and Rachel Hanks began work on last summer. I will be blogging here weekly, and thought I’d use this first post to introduce myself and more-or-less explain the project as I understand it so far. I am still familiarizing myself with everything, so my descriptions may fall short of the actual scope of the project or they might be totally off-base altogether, but as I learn more I will let you know all the juicy details!

Little intro on myself: I am an undergraduate student at the University of Lethbridge, majoring in English and hoping to be accepted into the English/Language Arts Education program this fall (cross your fingers for me, internet!). I have taken three courses with Dan in the past two years, Medieval English, Intro to Old English, and Advanced Old English in which we spent an entire semester reading Beowulf. Suffice to say I think Dan is a pretty excellent prof and I am excited to work for him this summer so I can continue to learn from him!

The Cædmon Citation Network (also known as the Cædmon Bibliography Project and possibly a few other names – I will need to ask Dan if there is something he’d like me to call it officially) is a gathering of data on the citations of various editions of Cædmon’s Hymn. The project is interested in tracking how long it takes a new edition of a work to start being cited in studies of said work. Cædmon’s Hymn, since it is such a short piece, has been re-translated and re-published a great many times since 1644, which should allow us to notice some patterns in the way each new edition is cited.

The project is also interested in looking at the differences between the citing of digital editions of works as opposed to print editions. Many people assume that it takes longer for digital editions to begin being cited, but this project aims to suggest that they are actually cited more quickly. It will be interesting to see what the data shows us.

Where are we right now with regards to the project? Personally, I am becoming oriented with the project’s goals and working to gain access to all of the excellent data collected by Rachel Hanks who worked on the project last year – figuring out where everything was left off and where Dan would like it to go this summer.

I am excited about gathering more information and will share it with you as I progress. It often seems that I gain a better understanding of a project when I explain what is happening to someone else, so I think this blog will be an excellent tool. It will also serve as a good record of what went on at different points during the project for Dan and I. Any questions you might have can be left in the comments section that I believe is located below this post…

Until next week,

Colleen

----  

If there's such a thing as "computing for humanists" is there also such a thing as "humanities for computer scientists?" On implementing interdisciplinarity in the Digital Humanities

Posted: Jul 16, 2015 17:07;
Last Modified: Jul 16, 2015 17:07

---

This is a just a brief initial thought piece on a question I’ve been asking colleagues about, from whom I’ve not heard the answer I want.

The Digital Humanities is an interdisciplinary field that involves the intersection of computation and the humanities. That means, amongst other things, that neither computation nor humanities is primary to the discipline but both must be present in some way or another. In this way, the Digital Humanities is different from, say, the “History of Science” (History is primary) or “Cognitive approaches to cultural understanding” (Cognitive science is primary).

In actual practice, for most of its life as DH and in its earlier form, Humanities Computing, DH has been mostly the domain of humanists. The people have been located in Humanities departments and the projects have in many ways been developments from and extensions of humanities research. So while Digital Humanities, in terms of content, is more “digital + humanities” instead of the “Humanities of the Digital” (except maybe in special cases like Critical Coding), it has been for a large part in institutional terms a space in which humanists do computing.

An important practical implication of this institutional construction is that we assume that the weaker part of most practitioners’ skill sets involves technology rather than humanities. Thus, if you attend a school like DHSI, you can take courses on programming and project management designed to improve the technological chops of postcolonial theorists; but you can’t take courses on Spivak designed to improve the PoCo skills of Computer Scientists. Likewise, we have a pretty good idea of what a course on “computing for the humanities” ought to look like, but, I suspect, a much poorer idea—or, as I’m discovering, perhaps no idea at all—what a “humanities for the computing” ought to look like.

I think this is increasingly going to be a problem in our field as it becomes more-and-more prominent and attracts more and more interest from disciplines not originally involved in its development. We already have lots of projects involving Geographers, GLAM professionals, New Media specialists, and so on. It is increasingly not uncommon to find people who are interested in DH in traditional comp sci departments and software engineering. What training and skills development in humanities research and skills do practitioners and students in those fields need in order to engage properly with our interdisciplinary subject? We’ve already learned that it is possible to teach those with a humanities background “enough” programming and computer skills to function well in our world; what is the equivalent for those who come from a technological background and need to learn “enough” humanities research and exposition skills to function the same way?

I’ve been asking colleagues about this for a while and what has surprised me is the extent to which people have said, in essence, that it is not possible to do this. I.e. that it is not possible to construct, teach, and especially set work and expectations for a course that aims to teach non-Humanists enough humanities research skills and knowledge to get by.

The basic question I’ve used has been the following: if I had an incoming graduate student for a Masters in DH who had a background in computer visualisation, say, or database programming who wanted to become a Digital Humanities researcher, how should my expectations for that person differ from those of somebody with a background in the Humanities in a course on, say, Post Colonial theory or the Nineteenth Century Novel? Does a graduate student with a technological background entering under such circumstances need to perform at the same level as a student with a more traditional humanities background? And if not, how to we handicap for the disciplinary difference in grading?

I’ve asked this question of people in English, Linguistics, Philosophy of Science, and so on—people who are for the most part used to working interdisciplinarily. And the surprising answer I’ve heard from every single person I’ve asked is that we can’t handicap such a student: i.e. there is no such thing as “good enough for a non-humanist” graduate level work in the Humanities. In fact, most people I’ve asked have gone further: not only is there no such thing as “good enough for a non-humanist” graduate level work, the standard we need to use in assessing such a student is actually “is this good enough for a graduate student in that same humanities domain?” I.e. the standard to use in assessing a graduate student with a background in computer science who is taking a humanities course is whether they meet the standard we’d expect of a student in the same course who had a background in the original discipline. So for a computer engineer to pass a graduate course in Wordsworth, they need to show they are as good as a more traditional grad student trained in literary studies.

There is a startling lack of reciprocity involved in this (imagine if the computer scientists who teach “programming for humanities students” starting insisting our students perform to the same level of sophistication as computer science MSc students). But this all or nothing approach also seems to me to say something bad about either us (Humanists) as scholars and teachers or our domain (Humanities) as a research field. Is it really simply not possible to acquire (and get a grade for) a “working knowledge” level of awareness in the Humanities? Is it really unreasonable to allow somebody who can do DH because they have solid technical skills to be maybe even a little bit clunky or poor at their ability to formulate humanities arguments? Or are we all performance and no content?

Sometimes, I think some of us wonder if we actually do have skills and knowledge that others might want to possess. I once went to a colleague who is a textual critic to ask that person to teach a course on textual criticism to an excellent student from computer science. After I went through the student in question’s skills, the colleague asked “what am I going to be able to teach a student like that?” As if “textual criticism” wasn’t enough of an answer.

But we really do have skills and methods that others can use. I was once on an M.Sc. thesis committee in Computer Science for a student who was doing computational text summation. One day when we were looking at the results of a test run, we discovered that while most worked well, a couple of pieces did not. My Comp. Sci. colleagues suggested that we run another 10,000 texts through the machine to confirm the error rate. But when I looked at the actual examples that had thrown the problems, I discovered that they involved a particular kind of news story, such as you get describing hurricanes or military battles, in which the story begins with a narrative but ends with a list of the damaged, wounded, and dead: our machine was being thrown off by the change-up in form that was coming towards the end of what was, in essence, a specific genre of newspaper article. My colleagues from Computer Science were really surprised: they hadn’t realised that you could define genres like that or then use them to refine your example pool—or perhaps hadn’t thought through the extent to which the different broad sets of examples we were using “Newspaper articles,” “Physics theses,” etc., were only broad categories under which many different distinguishable sub-genres existed.

Presumably, in DH, we want students who are able to apply “humanities” ways of thinking to problems in a similar broad, rule-of-thumb way—or understand other members of an interdisciplinary team when they are applying such methods to common problems we are all working on. What is surprising to me is that I’m not sure we have a method for actually teaching such skills to anybody who doesn’t want to become a traditional researcher in the traditional humanities. I don’t know myself what standards we do use in assessing work by such students. But there has to be something different than our current all or nothing approach.

----  

First thing we do, let's kill all the authors. On subverting an outmoded tradition (Force2015 talk)

Posted: Mar 01, 2015 17:03;
Last Modified: Oct 01, 2015 15:10

---

This is a rough approximation (with some esprit d’escalier) of my speaking script from my talk at the “Credit where Credit is Due”: session at Force2015, January 13, 2015. We were asked to be controversial, so I tried to oblige.

Contents

Introduction

I’m not sure that this paper is going to introduce anything new to the discussion of authorship issues, perhaps just raise some reminders couched in the terms and methodology of a discipline that is only beginning to grapple with problems natural scientists have had to deal with for years. I’m also not going to solve anything, but rather walk through the origins of the problem and propose some potential avenues for change. But I’m also not going to be discussing tweaks or improvements to the system. Instead, I’m going to be arguing that our current author attribution system for scholarly and scientific publications is fundamentally broken and that the only route forward is sabotage.

Would we create scientific authors if they didn’t already exist?

The question I’d like to begin with is the following:

“If we didn’t have the concept of the scientific and scholarly author, would we create it?”

The answer, I think, is that we would not.

The International Council of Medical Journal Editors’ definition of authorship vs. a traditional dictionary definition

This is because what we currently describe as a scientific author actually looks nothing like almost anything else we would describe using the term “author”—as you can see if we compare the definition of scientific authorship as described by the International Committee of Medical Journal Editors and a relatively standard definition of regular authorship taken from an online dictionary:

A typical dictionary definition: Author, n., A writer of a book, article, or document.


ICMJE definition The ICMJE recommends that authorship be based on the following 4 criteria:

* Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND

* Drafting the work or revising it critically for important intellectual content; AND

* Final approval of the version to be published; AND

* Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

….

Contributors who meet fewer than all 4 of the above criteria for authorship should not be listed as authors, but they should be acknowledged. Examples of activities that alone (without other contributions) do not qualify a contributor for authorship are acquisition of funding; general supervision of a research group or general administrative support; and writing assistance, technical editing, language editing, and proofreading. Those whose contributions do not justify authorship may be acknowledged individually or together as a group under a single heading (e.g. “Clinical Investigators” or “Participating Investigators”), and their contributions should be specified (e.g., “served as scientific advisors,” “critically reviewed the study proposal,” “collected data,” “provided and cared for study patients”, “participated in writing or technical editing of the manuscript”). (Emphasis added).

In other words, while in the world outside scholarly and scientific communication, we normally think of the author as the person who actually does the writing, in the the world of research communication, it is entirely possible to have writers who are not authors and authors who are not writers. And that, it seems to me, means we are fairly deep down the rabbit hole.

The nature of the problem

There has been a lot of excellent work why our definition of authorship in research communication is the way it is, by Michel Foucault, by Roger Chartier, Mario Biagioli, Mark Rose, and others (see especially Biagioli and Galison, eds., Scientific Authorship: Credit and Intellectual Property in Science). They have tied it to issues of authority, early intellectual property rights, aesthetics, and economics.

I like to think, however, that the problem really comes down to four main issues:

The inertia of words

The first major problem with scientific authorship, in my view at least, is that our practical definition is changing faster than our term’s connotative implications.

That is to say, while it is entirely possible for us to adapt and bend the term “author” to match our current scientific practice—even if that scientific practice results in such abnormal beasts as the “writer-who-is-not-an-author” and the “author-who-is-not-a-writer”—we cannot as easily let go of our term’s original connotations. Somewhere in the back of our minds, we still believe that authors should be writers, even if our heads and contemporary practice tell us that this is simply neither reasonable nor practical for our biggest projects in the age of Big Science.

We can see that this is so, indeed, if we read through the rest of the ICMJE definition, to get to the bit where they discuss how their definition of authorship should not be abused in order to unreasonably exclude participants who deserve credit for authorship by denying them opportunities to participate in the writing and editing of the article:

These authorship criteria are intended to reserve the status of authorship for those who deserve credit and can take responsibility for the work. The criteria are not intended for use as a means to disqualify colleagues from authorship who otherwise meet authorship criteria by denying them the opportunity to meet criterion #s 2 or 3. Therefore, all individuals who meet the first criterion should have the opportunity to participate in the review, drafting, and final approval of the manuscript. (Emphasis added).

There are two things significant about this proviso. The first is the nature of the abuse that the ICMJE is attempting to prevent: the case of somebody being denied authorship credit on the basis of the second and third criteria (i.e. because they were prevented from particiating in the drafting of the article or were not given a veto over its contents) despite the fact that they met the requirements of the first and fourth criteria (i.e. made “substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work” and agreed to be held “accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved”). In other words, the ICMJE is worried that “the writing” might be used as a technicality to deny authorship to those who made otherwise major contributions to the science on which this writing reports.

But the second thing about this proviso is that it doesn’t protect against the opposite eventuality—that is to say that somebody who did participate in “the writing” might be unfairly denied authorship credit because they were prevented from making substantial contributions to the design or performance of the research or because they were not allowed to claim responsibility for the work. In other words, the ICMJE does not (for obvious reasons) think that preventing somebody from “designing the experiment” might be used as a technicality to deny somebody scientific credit. Or again in other words: while the ICMJE is prepared to accept that somebody could be deserving of authorship if they were unfairly denied access to the writing and editing, they don’t think the same thing about somebody whose only participation in the article was, well, “authorship” in the sense that everybody but academics understand the word. In scientific authorship, writing is a technicality in a way that participation in the actual experimental design and conduct is not.

The conservatism of aesthetics

This brings me to the second cause of our problems with the concept in research communication: the conservative nature of aesthetics. Because the connotations of the word are so strong, even if our practice has bent the definition almost completely out of shape, we also have a strong aesthetic feeling for where authorship attribution has to go in a scientific article: on the “byline,” between the title and the abstract or introduction. Indeed, a surprising amount of criticism of (and resistance to) “author inflation” rests on the simple idea that it looks ridiculous if you have eight hundred or a thousand authors between the title and the abstract of a scientific article—something that has affected even our opening Keynote speaker, Chris Lintott, in his attempts to accurately credit the profession and citizen-scientists who participated in his experiments with crowdsourcing.

Economic utility

The third reason why it has proven so difficult to let go of the idea that a scientific author must be a “writer” in some way has to do with economic utility. As the great historians of this topic have demonstrated, the decline in anonymous authorship came about in large part through the efforts of booksellers and publishers to create a mechanism for asserting copyright. If a work is anonymous, then it belongs to nobody (or everybody). If it has an author, then this author can alienate their rights to a publisher in exchange for money and their identity can be used subsequently both to brand the works in question (i.e. “another play from the great Shakespeare”) and identify fraudulent reproductions.

Although the situation in scholarship is not exactly analogous, the basic idea still pertains: naming the author of scientific works allows us to use such work as a mechanism for calculating economic value and reward and assigning scientific responsibility. In the specific case of academic researchers, authorship become something could count and use to make comparisons between individual researchers (your worth, of course, rises with the number and prestigiousness of your articles) and something you could use to certify against fraudulent science.

Scalability

And finally, there is the issue of scalability. There have always been epistemological differences between creative/literary and research/scientific authorship. The one is an act of creation (you create rather than discover poems), while the other is an act of discovery (good scientists discover rather than create their results). But the importance of these differences was obscured when both types of authors collaborated in similar sized groups (i.e. of one or two at most). In the age of single author science and scholarship, it was easy to see an equivalence between “doing the science” and “being the writer of the paper” since both activities were usually performed by the same person and because individual experiments could be completely reported in at most a few individual papers.

But this equivalence does not scale. As experiments got bigger and more complex, as projects began to involve many more experiments and participants, and as individual papers increasingly began to report on smaller fragments of the total scientific output of a project, this rough equivalence between “person who wrote the paper” and “people who did the science” became increasingly untenable and you end up with problems like that the ICMJE proviso quoted above is trying to head off—the case of participants in a project being denied access to the byline of an article solely because they weren’t given a change to wrangle a few words in the article reporting on their work.

Why words can hurt us

The point of all this is to show that the real cause of the “authorship crisis” is not the ever-increasing number of authors, but the fact that we are using the wrong word to describe what these people are and identify the quality we are trying to capture. And, unfortunately, that we are using a word that brings with it a lot of connotative and conceptual baggage that we simply cannot easily get rid of. While the term “author” (with all its connotative history) makes sense as a way of describing creative writers, for whom the actual act of composition is their primary activity, it simply does not work as a way of describing scientific authorship, for whom the act of composition is, ultimately, secondary to the research that precedes it. Poetry does not exist unless it is expressed in some form and its expression is in some sense at least coincidental with its composition (you can’t think of a poem without thinking of the words you will use to tell others about it). But while science ultimately requires communication, this communication cannot occur without prior activity: you can (and indeed probably should) do scientific research before you have decided on the precise words you are going to use to report on your results.

(And as a brief aside here, it is worth noting that science is not the only field of endeavour in which writing is secondary: the same is true, for example, in the legal or policy worlds where the actual writing is less important than the goals behind it, except that, in contrast to scientific credit systems, we don’t have any problem in these worlds in distinguishing between those who develop and those who merely draft legal bills or policy documents—there is no rule in parliament or congress that says that MPs or Senators can only be listed as the authors of a bill if they participated in its drafting).

So what to do?

This brings us to the problem of what to do. If the term “author” is bringing with it too strong a set of connotations to allow us to accurately capture what we want to capture (which is presumably participation in science rather than participation in typing), what can we do to change the current situation?

Accept that we don’t actually care who wrote the article

The first thing we need to do is accept that our current concept of scientific authorship is both an abuse of the term and brings with it far too much unhelpful baggage. That is to say, we need to recognise that we don’t actually care all that much about who wrote the scientific articles we read—beyond perhaps in the limited sense of making sure that those who did write out the results are rewarded for that writing. What we are actually trying to capture with our current credit/reward system is not participation in writing, but participation in communicated science.

This recognition is important, if for nothing else, in that it should free us of our aesthetic objections to long author lists. As long as we think that scientific authorship is actually about writing, then aesthetic objections to long authorship lists remain at least somewhat valid: it is indeed absurd to think that 800 people could possibly be responsible for the composition of a 6 page physics article. But if we stop thinking that what we are trying to capture is who wrote the article instead of who did the science reported on in the article, then the aesthetic objection becomes far less significant: if the people being credited are not actually authors, then we can stop thinking of their names as belonging on the byline; or we can stop thinking that the “byline” on a scientific article is in any way analogous to the byline on a newspaper article or novel.

Recognise that “authorship” is really just a special form of acknowledgement

Once we accept that scientific authorship systems are not actually about who wrote the article, it becomes easier to understand the next conceptual adjustment we need to make: recognising that “authorship” in a scientific sense is really just a special form of acknowledgement—or, in more concrete terms, that the “byline” in an article is really just an arbitrary, privileged, “above the fold” extension of the acknowledgements section.

You can see this if you compare the case of scientific authorship against that of poetry. Both books of poetry and scientific articles name authors and, commonly, have an acknowledgements section. The difference, however, is that where there is a clear epistemological difference between those mentioned in the byline and acknowledgements section in a book of poetry, there is (despite many attempts to develop one) no such clear demarcation in a scientific article. In a book of poetry, one will often find the author acknowledging the help of librarians who aided them in finding specific reference works, friends who hosted them while they were writing, thank yous to their agents and fellow poets for support, and so on. While these are all arguably people who helped the poet do his or her work, there is still a pretty clear distinction between helping a poet compose and actually composing yourself: nobody thinks the bartender at the poet’s favorite watering hole is actually an acknowledged coauthor of the poetry itself (well, not seriously, at least).

The people acknowledged in a scientific article, however, are, for the most part, those specifically responsible for conducting the science upon which the article is reporting: the people who did the calculations, who designed or ran the apparatus, who built the detectors, and so on. These are generally not people who had a purely social connection to the scientific work in the article but instead were directly responsible for its results. Our hypothetical poet would probably still have been able to compose poetry without the assistance of his or her agent. A scientific “author” would have nothing to write about if it were not for the people who helped make the discoveries in the first place.

This means, in turn, that the authorship-acknowledgements distinction in a scientific article is different from the similarly-named distinction in literary contexts. In contrast to the quite concrete distinction between the “person who composed the work” and “people who assisted the composer” we find in a literary work, in a scientific work, the distinction between “named as author” and “acknowledged as helper” is far more arbitrary, despite attempts such as those of the ICMJE to come up with discriminators. Instead of being able to make a clear binary distinction between those who have primary responsibility for a piece of science and those who merely assisted (as we are, in many ways, able to do in the case of literary authorship), what we are really doing in the case of scientific authors is attempting to determine the point on a scale of participation at which we decide to stop rewarding those who participated in our research. People to the left of the line we draw get their names put in the byline and are able to use the article on their CVs; people to the right of it get our best wishes, but little else, from their participation.

Understand that this conceptual problem is not amenable to tinkering

Since the problem with scientific authorship is conceptual—i.e. we are using the wrong criteria in attempting to determine credit—it is also not amenable to tinkering around the edges. Once you accept that an author might be somebody who doesn’t write and that a writer might not be an author, you are far beyond the power of any definitional tweak to save your system. Since the problem is the fact that we maintain an arbitrary distinction between those acknowledged on the “byline” and those acknowledged in the “acknowledgements,” reforms that improve the granularity of either without addressing the fundamental problem that we make the distinction at all is going to fail. Such reforms are attempts at refining the accounting for responsibility for “the article,” when what we really need is a system that recognises both that “the article” is only a second subcomponent of the scientific endeavour and that it is participation in reported science, not participation in the reporting of science, that our reward systems should be attempting to capture.

In fact, the only solution, in the end, is to stop using “authorship” as the primary measure of scientific participation. In the age of Big Science, the article is a better indication of the success of a project than the success of any individual within that project. We will only solve the issue of credit when we stop privileging authorship over participation.

Realise that there is no opportunity for external change

Although the problem is not amenable to tinkering, it is also true that it is not amenable to fiat. Because so much currently rides on authorship credit, we will find it almost certainly impossible to change formally in a top-down fashion. As advocates of Open Access discovered in the early years, change only comes when there is a critical mass that is comfortable with the innovation, but a critical mass only develops when the change itself is already understood to work within the current system. As various people have pointed out, Academia is very much a prestige economy and prestige markers are extremely resistant to change: scientists may want to publish in Open Access journals, but they need to publish in prestigious ones—and prestige seems, in large measure, to be a function of time and familiarity.

This is where the “sabotage” comes in. If you can’t change a system externally, then the only option left is to change it from within. And if the problem that we are facing with our current authorship systems is that they force us to make arbitrary distinctions among participants, then the solution is to refuse to make those distinctions. Since scientific authorship measures the wrong thing and excludes people who should be credited as participants solely on the relatively arbitrary grounds of whether they participated in the drafting of the article, then the solution is to stop using “writing” as a criterion for authorship: in other words, move the line that distinguishes acknowledgements above the fold from those below to put all of the people whose participation made the science possible above. It is only when the byline becomes indistinguishable from the acknowledgements section that the system will have been modified to the point where we can begin to work on more granular systems of identifying (and rewarding) actual scientific participation. Because, as Syndrome argues in the The Incredibles, “when everyone’s super, no one is!”

Conclusion

“Sabotage” is a strong word, but we are actually facing a pretty fundamental problem with our current attribution system. While equating “authorship” with “scientific productivity” made rough sense in the age of single-scientist experiments (and still does, to a large extent, in the current age of single scholar humanities research), the concept simply does not scale. It is difficult to apply to even moderately large collaborative teams and it is simply impossible to apply to the gigantic teams responsible for today’s biggest projects.

The reason for this, however, is that the concept is simply wrong. When we count authorship on scientific papers as part of our evaluation of an individual researcher, we are actually counting the wrong thing. We do not, on the whole, actually care that much whether a given scientist wrote the papers involved. What we are really attempting to capture is how productive and effective that scientist is as a participant in the science that is reflected in those papers—i.e. in the communication and discovery of communicable results. This does not mean that the article itself is irrelevant to science—you can’t have science without the communication of results. But it does mean that authorship of papers (authorship in the sense of “writing”) is no longer an adequate metric of scientific participation. The PI who conceptualised the project, the researchers who designed the equipment or methods, the people who calculated and reported the results—all of these are necessary to the production of good science whether or not they participated in actually typing or editing the articles in which this science is reported. Systems that fail to recognise this, such as that of the ICMJE with its fetishisation of, in essence, typing, are ultimately not going to solve a fundamental problem that has to do with the very term we use to describe the metric.

The answer to my question at the beginning is that we would not create the concept of the scientific author as a credit metric if it did not already exist. Now that it is causing serious trouble, it is time to kill it off.

----  

A "Thought Piece" on Digital Space as Simulation and the Loss of the Original

Posted: Feb 11, 2015 11:02;
Last Modified: Mar 04, 2015 05:03

---

A “Thought-Piece” on Digital Space as Simulation and the Loss of the Original: Final Paper for Dr. O’Donnell’s English 4400: Digital Humanities, Fall 2014

          In beginning to think about how I could integrate theory into my final project, I recalled Kim Brown, the DH Maker-Bus, and how she spoke about how her workshops with children have prompted kids to ask “big questions”. It occurred to me that the way in which humanists approach their own work is often very dependent on the ways humanity and culture are defined. It also occurred to me that now, more than ever, humanity and technology are converging. In this paper I want to explore the ways technology and the digital are seen as “copies” of an “original”. Drawing on theories post-humanism and post-modernism I will discuss technology and the internet as simulation. This paper will examine technophobia in the humanities and look to Jean Baudrillard’s theories of simulacra, simulation, and the hyperreal in an attempt to explain resistance to the digital and technology, in terms of scholarship, but also examine the larger implications of copy replacing the original. I will attempt to deconstruct the lamentation of the loss of an original, with simulations made possible by technology, and how this affects understandings of things like research, the humanities, and humanity itself.

          To begin to deconstruct the lamentation of the loss of the original; a resistance to the simulated, or technology in the humanities. I think it is important to discuss the theoretical Baudrillardian notion of simulacra. The internet can be seen as a hub of simulation. Sites like Facebook, and Twitter, email, and skype, simulate physical forms of communication, and online shopping websites simulate the physical shopping experience. People have virtual relationships, pets, can gamble, can send money, can publish, and can donate to charity online. If one “goes shopping” online, did they really go shopping? The idea that “shopping” means anything other than physically going to a store is relatively new. With online shopping, the consumer is very much detached from any product, and uses simulations of money (debit, credit, or PayPal) in the privacy of their own home. The physicality is removed, and the process becomes much more abstract. However, the lack of physicality does not make it any less “valid”. Rather, the way shopping has traditionally been defined must be re-examined in the context of a hyperreal digital era. Researching online is not less valid than in a library. The idea I have heard purported by some of my professors that online research is easier or equates to less zealous or engaged students is supported only by the elevation of the original. The original in this sense being the physical book in the physical library. However, whether information is in print or online, the idea that knowledge is easier to learn, or less valued as digital seems to suggest that there is an obvious hierarchy in the value of medium. However, present (although not necessarily pervasive) fear or resistance to digital spaces in the humanities perhaps be explained by the notion that because there is so much virtual content, and simulations online in a digital environment, the truth is elusive. I think this stems from the idea that the “real” truth exists as something that is physical; which have been authenticated in simply by existing in a physical form, and simulations (being further detached from “Reality”) distort and become further removed from truth. The internet can be understood in many ways as the epitome of simulation and the hyperreal. Baudrillard recognized the virtual world as a fourth level of simulacra, building off of his previous three levels: counterfeit, production, and the code-governed phase. “The counterfeit” (Baudrillard 50) level, being in close proximity to the original, “production” being the reproduction of the original and the code-governed phase, a much more abstract assemblage, rooted in signs, and completely detached from the original. This third phase, the “code governed” phase refers primarily to language as code. For Baudrillard (and Derrida, Saussure and others), language creates a distance from reality. In many ways, language is a tool used to simulate reality. In hyperreality, a space comprised primarily of “copies”, and for the purpose of this paper, virtual, digital spaces, and the third or fourth level of simulacra, the simulation often becomes the original. Digital hyperreality, allows for interaction with the thing that is not present, or the lost or displaced “original”.

          If we apply this understanding of simulation and hyperreality to online scholarship, research, reading, teaching, and interaction, the “original” is the physical. That is to say, that texts contained in physical books, in physical spaces are privileged as closer to nature (although not equated to nature, since text itself is simulation, or code). Consistently, digital spaces as a viable research option are subverted in the humanities. Digital text disrupts the lasting finality of print, and seems to threaten the sanctity of Truth in the nature of its detachment from the physical or original. The act of moving our bodies from a home space, to a study space authenticates my research; I conducted research because what I interacted with was real, in the molecular sense. Reading “From Modernism to Post-Modernism”, and holding the book in my hands means I read the book. Did I still read the book if I read it online? Did I still “talk” to my professor if I sent an email? Am I still a person, a human in my entirety, if I have a digital eye?

          Many post-modern theorists such as Fredrich Jameson, saw simulation in terms of its artificiality, and that in itself carried the connotation of inferiority. For example, the consumption of non-artificial, non-simulated foods is praised and sought after. There is a desire to purify foods. More and more often, marketers carefully craft product information to of that which is not natural, or “originally” present in nature. Marketers include words such as “all natural, organic, home-made, home-grown, authentic” and many products (food specifically) is advertised as “GMO-free, no artificial colors, flavors”. In a more abstract sense, consuming food which has deviated from an “original”, is seen as inferior, despite the fact that studies such as A. L. Van Eenennaam and A. E. Young’s “Prevalence and impacts of genetically engineered feedstuffs on livestock populations” have concluded that: “Numerous experimental studies have consistently revealed that the performance and health of GE-fed animals are comparable with those fed isogenic non-GE crop lines” (Van Eenennaam, Young). The mere fact that certain foods use technology threatens the sanctity of the original. In a similar way, technology is often demonized as a violation of biology. There are exceptions, and certainly, the average person would not reprimand (in any explicit way) an elderly person with a pacemaker, or a someone with prosthetic limbs. Figures like Donna Harraway posit that the definition of “human” is largely based on biological, anatomical qualities, such as DNA, and naturally occurring physical features. Anatomically, an amputee with prosthetics does not qualify as a human under Wikipedia’s bio-centric definition of a human. Is a person with prosthetics 81% human 19% machine? Where is this line drawn? At what point is a person too far removed from the “original” to be considered a human? If I am anatomically 75% machine am I still a human? No, not based on the current definition. “In the post-human, there are no essential differences or absolute demarcations between bodily existence and computer simulation, cybernetic mechanism and biological organism, robot teleology and human goals” (Lenoir 204). The copy, or simulation of body parts removes the cyborg from the current category of what it means to be human. However, this loss of the original allows for production of emancipatory copies. (Baudrillard) This can be seen in terms of the cyborg, (a failing heart allows for its copy, a pacemaker). We can look at the integration of technology into the body as a step in the direction of the eradication of distinction between original and copy. This has serious implications for humanity itself, in addition to the humanities as a discipline.

          In privileging only the original, natural, biological, and physical, it leaves no space for “the copy”, the simulation, or the hyperreal, where the original fails, or inconveniences. Baudrillard says: “the extinction of the original reference alone facilitates the general law of equivalences, that is to say, the very possibility of production” (52). This is particularly applicable to things like web-based journalism, scholarship, and communication… Baudrillard sees the loss of the original as emancipatory. He continues: “Through reproduction from one medium into another, the real becomes volatile, it becomes the allegory of death, but it also draws strength from its own destruction, becoming the real for its own sake, a fetishism of the lost object which is no longer the object of representation, but the ecstasy of denigration and its own ritual extermination: the hyperreal” (72). If we apply this to the notion of online research in English literature, or the consumption of e-books, the real, that is, the physical, does draw strength from its destruction. The simulation through virtual mediums allows for people to engage with content despite physical limitations. Murray McGillivray illustrated this perfectly in his talk. He discussed the nostalgia for the original, as Baudrillard alludes to. He commented how the original manuscripts of medieval texts are extraordinary in their physical state. But he also recognized and concluded that, for the average student, accessing these originals is simply not possible for a number of reasons. The simulation of this text allows for other people to view the content. In turn, The Cotton Nero A.x Project, a simulation of medieval manuscripts, for a reader such as myself, literally replaces the original. That is powerful, given that I will likely not see the physical manuscript in my lifetime.

          Digitized text and content may be an element of hyperreality insofar as a website containing a book is not the same thing as a book. However, digital simulations can be seen as emancipatory for a number of reasons. Permitting, information is open access, or free to view, any person with a device and access to Wi-Fi is able to view that document, regardless of time, and place. (This does not take into account disadvantaged groups, or third world countries with little or no internet accessibility). However, multiple people can view the same thing simultaneously, freeing the content from the confines of a physical object, which cannot be viewed in its “original”. It’s copies can transcend space. While the internet and digital content is often blamed as being the cause of distraction, and poor performance, digital content has also proved to help people become more efficient. The physical book is nostalgic, it is comforting and personal and often carries with it, the sense of attachment, due to its physicality. I have heard professors and fellow students observe how students become easily distracted, how it is more difficult to sit down and read a book with unwavering concentration. Of course it becomes more difficult, since so much of how we operate within the world is now digital, instantaneous, and simulated. However the emancipation Baudrillard alluded to can also be applied to the consumption digital text. Instead of seeing digital content as this formidable “copy”; and lament the loss of the original, we can look to technology which relies on our detachment from the physical, as its selling point. Take “Spritz” for example. I use the “ReadMe” application, which is partnered with a developer called “Spritz”. Spritz’s whole software concept is to utilize the fast-paced, ever-changing, video-centric component of the internet and use it to help people read quickly. I downloaded the Application, “ReadMe”. An Application which separate the words one by one and displays them individually in order. The words are displayed, individually as fragments, of digital text, but as the websites points out, it is not practical for the average reader to read 500 words per minute. This format, however allowed me to read 25% of a 200 page book in about half an hour, with arguably better comprehension than using the “original” method. Accessing books this way is even farther removed from their originals. But that is its precise advantage, its medium is advantageous yes, but it makes reading less about the book, and more about the words. As an English major I found this technology unbelievably invaluable. “When reading, only around 20% of your time is spent processing content. The remaining 80% is spent physically moving your eyes from word to word and scanning for the next ORP. With Spritz we help you get all that time back” (Spritzinc.com). The original, in this case, can be seen as a hindrance because it is simply not as efficient, in my own experience. The copy in this instance is not a book at all. Baudrillard explains how digital environments, in a way, erode the thing(s) simulated in these digital spaces: “at this (virtual) level, the question of signs and their rational destinations, their ‘real’ and their ‘imaginary’ their repression, reversal, the illusions they form of what they silence or their parallel significations is completely effaced” (Baudrillard 57). What Baudrillard is saying here is that the divide between the signifier, in this case, virtual content, is so far removed from the original that the definition of the book itself is completely eroded. The e-book is no longer a book, the e-transfer is no longer a transfer in terms of its physical definition, and therefore reading a book online, is not really “reading a book”. Reading an e-book is reading a simulation of book; a copy of an original.

          The assertion that the copy does something to the original (Baudrillard 425) is true. In many cases the “copy” or simulation is superior to the original. The humanities is based on the study of humanity. History, philosophy, anthropology, sociology, all rely on a bio-centric understanding of humanity. There is a line that has been drawn as to what is considered human. The “original”; the natural. This biological understanding of humanity fared pretty well for centuries. Although there is lots of speculation amongst scholars as to what qualifies as a cyborg, the modern digital landscape has transformed most of the western world into cyborgs. The integration of technology onto our bodies, into our bodies is now more possible than ever. Baudrillard speculates: “Even today, there is a thriving nostalgia for the natural referent of the sign” (51). There is a sense of comfort in the “real”, following the historical assumption that for there to be real things there has to be a reliable, knowable system of production and in a digital space this does not exist. In a digital age, I think it is necessary to re-assess how humanity itself is classified. To stretch the definition beyond the biological and to recognize that the natural, or biological is not always superior.

          The idea that by doing something in a virtual setting or digital space, is almost like it never happened, is another theme Baudrillard closely explores in The Gulf War Did Not Take Place. Digital environments as simulations does something to the original. In the simulation’s instantaneousness, multiplicity, accessibility, and artificiality, the original becomes sacred, and unknown in an overwhelming sea of homogeneity of simulations. Baudrillard does see the loss of the original as potentially freeing, but also recognizes the effect that simulations have on the original. The Gulf War as people came to understand it through its simulation or virtuality in film is not the same as the original war. The simulation cannot be the thing it copies. It can replace the original in the sense that people only access an original, through a copy, but it does not equate to the original its totality. It might be crude to compare war and research but the theoretical assertion that online, virtual research is not the same as researching or reading in its original bodily, physical sense. However for those who viewed the Gulf War through television, that simulation became the original, in the viewer’s inability to access the original. The experiences are different and the same. Online research is not the dictionary definition of research, but in one’s strict engagement with the simulation of texts in a digital or virtual space, that new simulated experience becomes the original. The result is that one may not go to the library, unless the simulation is not available, at which point one tries to access it’s original.

          The idea that research can occur as a purely visceral, mental experience (simulation), fundamentally changes the definition of research. The simulatory nature of anything digital or technological fundamentally changes the definition of the thing it simulates. I think in demonizing the simulation, you resist progress. Very broadly, resisting technology and the digital on the grounds that it is inferior to the physical or original, means that in many cases, progress or efficiency is delayed. Discrediting the power of virtual technology as a means to communicate because it does not carry the same nostalgia as face-to-face communication means that valuable, virtual conversations with Alex Gil for instance would never have occurred. Similarly, professors requiring students to seek out an expected number of print resources in research can correlate to missing out on valuable virtual or digital research. I always find myself back to the concept of “big questions”. The topic of technophobia and resistance to the digital in some humanities spaces can be explored as a discussion of theory. Baudrillard’s work on simulacra and simulation has allowed me to explore the sometimes subordinate status of simulation and copies. My paper focused mostly on the loss of the original and the ways in which this can be seen as emancipatory, especially when we begin to consider the implications the digital and simulated has on how humanities research is conducted, and the discipline itself is defined. These theories can be applied to the greater understanding of humanity. The merging of technology and humanity has led to massively complicated questions, not only about simulation and original, in terms of research and scholarship, but of humanity in general. Where does machine begin and human end? And are cyborg’s the new species? I don’t think I would have pushed myself to trying to understand the boundaries of humanity and machinery in a post-human sense, without this course. In this merging of technology into classroom’s and bodies, it is clear that the definition of the original must be expanded to include its copies, and simulations.

Works Cited

Baudrillard, Jean. “Symbolic Exchange and Death.” From Modernism to Postmodernism. Malden: Blackwell Publishing, 2003.

          421-434. Print.

Baudrillard, Jean. Symbolic Exchange and Death Theory, Culture & Society. Sage Publications Inc., 1993. Web. 2 Dec. 2014.

Baudrillard Jean. The Gulf War Did Not Take Place. Bloomington, Indiana: Indiana University Press, 1991. Print.

Harraway, Donna. From Modernism to Postmodernism. Malden: Blackwell Publishing, 2003. 460-484. Print.

Horn, Eva. “Editor’s Introduction: “There Are No Media”.” (Abstract). Grey Room 29 (2007): 6-13. Web. 4 Dec. 2014

Lenoir, Timothy. “Makeover: Writing the Body into the Posthuman Technoscape: Part One: Embracing the Posthuman.” (Excerpt).

         Configurations 10.2 203–220. Web. 6 Dec. 2014

Van Eenennaam, A. L., and A. E. Young. “Prevalence and Impacts of Genetically Engineered Feedstuffs on Livestock Populations.”

         American Society of Animal Science (2014): 1–61. Web. 7 Dec. 2014.



----  

The People’s Field: The Ethos of a Humanities-Centred Social Network

Posted: Feb 05, 2015 10:02;
Last Modified: Mar 04, 2015 05:03

---

Hello readers of Daniel Paul O’Donnell’s blog. My name is Megan and I am a former student of his, having completed (among others) his 2014 seminar on the Digital Humanities. The following is a paper I wrote for that class, which Dan has kindly offered to feature on his blog.

The inspiration for this essay comes from my experience as a musician, specifically a guitarist. It has always been possible to — indeed, far more common anyway, I would think — to learn to play outside of a classroom setting. But the Web has given us something spectacular: huge social networking websites aiming to encompass all aspects of playing guitar, whether learning, teaching, critiquing, or making music with others. The education is there, and the community too, similar to the post-secondary experience. If non-academic music education can thrive online, why not the humanities?

I’m sure many of you are humanities people, and so I’m also sure you’ve thought about the State of the Humanities — their financial viability, their usefulness, their place in academia and in general public life. This essay is not so much an argument as it is a reflection, an appeal to all of us humanists to broaden the venue and audience of what we do. It is an appeal to think bigger: not what the humanities should look like just in academia, but what they could look like in the wider world. Humanists hate clichés, but I think one rings true in this case: if we love what we do, we have to set it free.

The People’s Field: The Ethos of a Humanities-Centred Social Network

In a 2010 article for the Chronicle of Higher Education, Frank Donoghue attempts to finally answer why the academic importance of the humanities is seemingly in permanent dispute:

The shift in the material base of the university leaves the humanities entirely out in the cold. Corporations don’t earmark donations for the humanities because our research culture is both self-contained and absurd. Essentially, we give the copyrights of our scholarly articles and monographs to university presses, and then buy them back, or demand that our libraries buy them back, at exorbitant markups. And then no one reads them. The current tenure system obliges us all to be producers of those things, but there are no consumers.

The public simply does not need humanities research the way it needs scientific or medical research – incest in Hamlet or the meaning of Finnegan’s Wake are still great questions worth pursuing, but no one’s life hinges on the resolution of Hamlet’s fraught relationship with his mother; people will and do, however, die of cancer and diabetes, and James Joyce cannot fry our brains if climate change does it first. For a field whose very name suggests a focus on all humankind, the humanities’ products are remarkably individualistic in scope, the pet projects of bookworms. Yet this is not to say these products have no value, nor is the decline of the humanities as an academic field indicative of a culture that no longer cares for literature, history, languages, or philosophy. Donoghue notes that

Intelligent popular novels continue to be written; the nonfiction of humanists who defy disciplinary affiliation . . . will still make best-seller lists; and brilliant independent films . . . will occasionally capture large public audiences. The survival of the humanities in academe, however, is a different story. The humanities will have a home somewhere in 2110, but it won’t be in universities. We need at least to entertain the possibility that the humanities don’t need academic institutions to survive, but actually do quite well on their own.

People will always enjoy creating and consuming literature; it is the demand for literary (and other humanities-centered) criticism that is constantly being called into question in modern academia. But if not in universities, whither that criticism? The answer lies in the one of the most ubiquitous – and perhaps most important – technological developments in recent history: social media. The aim of this paper is twofold: firstly, to prove that social media, specifically social networking websites, is a viable way to build a consumer base for literary criticism; second, to provide an outline of the features of a theoretical humanities-centred social network and how it would operate. For simplicity’s sake, my project will focus primarily on only one aspect of the humanities, namely literary criticism, and it will admittedly be North American-centric in its analysis of the state of the humanities and assumptions of available technology.

First let us take a more in-depth look at what is seemingly Wrong with the humanities. Little academic research has gone into this topic (though Stuart Hall formally explores the disconnect between the less-than-concrete goals of the humanities and its potential for informing social activism in his pre-Web 2.0 “Emergence of Cultural Studies and the Crisis of the Humanities”). However, the last five years have provided a plethora of popular articles devoted to parsing out this perennial problem. Two key themes endure: for one, the typical argument that the humanities do not make employable graduates, thus turning the field into more of an economic burden than an aide (Sinclair); second, the more interesting idea of a disconnect between the public and humanist academia, causing the hoi polloi to distrust the humanities and therefore not value them. With regards to the first argument, Stefan Sinclair claims that, “the attacks on the humanities are bolstered by the underlying assumption that in this model [the “knowledge-based” economy] every department must rely solely on their own market revenues. Whether or not humanities departments would actually be viable in this model is up for debate, but commentators often assume this would not be the case.” David Lea attributes these assumptions to a shift from collegial to managerial principles in university governance; administrative and technology-centred spending has thus increased at the expense of cuts to the humanities (261). Sinclair obviously finds these assumptions and their resulting cuts unfair and unimaginative, and he does remain somewhat justified in that no one seems to have expended any thinking on how to make the humanities a more profitable academic field. Yet that point brings us right back to Donoghue: the humanities are inherently insular, and their societal effects are markedly indirect compared to the immediate benefits of the natural sciences, technology, and medicine.

This self-contained nature brings us to the second key explanation of the humanities’ decline. Surprisingly, much of the popular criticism involves not the economics-centred points above, but the argument that the humanities have become inaccessible to the wider population. Mark Bauerlein recounts the myriad points made during the 2011 symposium “The Future of the Humanities,” and summarizes the perceived problem as the “neglect or inability or lack of desire . . . [of humanists] to speak directly to the public in a public language” (Bauerlein). One can easily take objection to this argument: scientific papers are just as – if not more so – incomprehensible to the average citizen. Academia is fundamentally esoteric. But the humanities differ greatly from the sciences in one key aspect, laid out by Steven Knapp:

An investment in their [art and literature’s] particularity and therefore in their history is what most deeply and importantly separates the objects and events studied by the humanities from the phenomena studied by the natural and even the social sciences. In science, what matters is not the irreplaceable particularity, the irreplaceable origin, of the phenomenon in question but instead its generalizability and therefore precisely the replaceability of its particular history.(Knapp)

In other words, the sciences are necessarily future-oriented; they are always looking for answers to improve upon current human knowledge, to make generalizations such as “climate change is caused by greenhouse gas emissions” and thus replace the old understandings. The humanities do not operate in this way. They are concerned with and motivated by “the pleasure human beings take in the particularity of lived experience . . . the pleasure human beings take in preserving and enjoying particular things” (Knapp). The subject matter of the humanities therefore belongs to the public in a way that of the sciences does not: the vast majority of us cannot learn to explain the physical world in Newton’s laws without at least some instruction, but most of us can read Hamlet and get something out of it, regardless of formal instruction in literary criticism. Knapp elaborates: “What matters to the public is Shakespeare, not the logic of theatrical representation. What matters is the story of America, not the ideological structure of American essentialism” (Knapp). North American academe’s current love affair with (in his opinion) deconstruction, Marxism, feminism, and post-colonialism has alienated the public: “Humanities professors disrespected great works, so naturally the public turned around and disrespected them” (Bauerlein). Of course, we can hardly blame humanities scholars for examining literature through these ideological lenses; to expect professors to never question the messages of canon texts and dominant cultural narratives is tyrannical. But we must respect the fact that art and literature belong to the people and therefore traditional readings remain valid as well.

Yet the pleasure-motivated, “particularity”-centred nature of literary criticism is also a point against the humanities in academia. Laurie Fendrich claims “the only way to justify studying the humanities is to abandon modern utilitarian arguments in favor of much older arguments about the end, or purpose of man. Yet Darwin, in firmly swatting down the idea that man has an end, makes returning Aristotle . . . difficult for most modern thinkers” (Fendrich). There is a kind of nobility in studying literature, but as reasoned above, the humanities simply do not provide the kind of progress-fuelling information that the STEM fields do. Fendrich further recalls the highly elite nature of the early university, where well-to-do young men would go to learn the classics, philosophy, and languages, becoming “knowledgeable” but ultimately “useless” – a place where they could spend their time before inheriting their prospective family wealth (Fendrich). Now that universities have opened up to the “common” people, the study of literature has opened as well; however, this democratization of education means that post-secondary institutions must prepare students whose various social classes necessitate they will spend their lives in the workforce, not luxury drawing rooms.

None of these commentators propose any real solutions to the humanities problem; in fact, they all admit that the purpose of the humanities will always be called into question in utilitarian modern (i.e. Western) society. So what can we do with them? The answer is difficult – perhaps ultimately irresolvable – and this paper’s scope can only propose a partial remedy. The humanities are so deeply entrenched in academia that it would be unreasonable to simply get rid of them altogether, at least in the foreseeable future. We must maybe admit that studying literature in university is a privilege for those who need not worry about work once they graduate, and those of us from the middle- and working-classes who take that route must deal with the consequences. But by examining the problems, we can at least parse out a partial remedy: the humanities are the people’s pleasure, and we must give it back to them.

The answer lies in social media. David Lea, in addition to the shift from collegial to managerial values, places the decline of the humanities on online learning, despite what he admits might be “obvious financial advantages” (261). Providing the humanities with a physical space is expensive; moving them online would cut costs to university departments, though of course there remains the desire to teach the humanities in actual classrooms. David Lea is therefore right to worry about the threat of online learning to the state of humanities education. But his observation also reveals a willingness among the public to transfer the education process to an online environment, and this willingness could be the saving grace of the humanities, the opportunity to bring it back to the people. My outline of a theoretical humanities-centred social network provides the crux of my argument.

First of all, social media can be split into five or six broad types. For our purposes, we will use Tim Grahl’s categories: 1. Social networking sites, where users create profiles to connect with others (e.g. Facebook); 2. Bookmarking sites, where users “save, organize, and manage links” (e.g. StumbleUpon); 3. Social news, where users share links with others and rate them (e.g. Reddit); 4. Media sharing, where users upload their own content, often accompanied by “additional social features, such as profiles, commenting, etc.” (e.g. YouTube); 5. Microblogging, “services that focus on short updates that are pushed out to anyone subscribed to receive the updates” (e.g. Twitter); and finally, 6. Blog comments and forums. Grahl further notes that many social media platforms incorporate features from multiple categories (Grahl). A humanities social network would primarily combine elements from categories 1 and 4, with elements of 2, 3, and 6.

The raison d’être of a humanities social network would be providing students, hobbyists, and professional academics with a space to upload and share their writing on various works of literature. Much like media networking sites YouTube or DeviantArt, users would create a profile in which they would list their literary and critical interests: which authors and styles they admire, which critical theories they like to employ. All content they upload would be labelled with various tags indicating the topic and types of criticism used; these tags would make the content searchable by other users, who could search for writing on a particular topic, read other peoples’ work, and provide feedback or invite them to be “friends” in a similar vein to Facebook. Users could then follow their friends’ content, discover the writing of their friends’ friends, and in turn building a community of people whose primary connection to each other is their passion for literature. The perceived hierarchy between professor and student, or academic and layman, would not exist, encouraging a perception of literary criticism as a hobby, accessible to anyone with a favourite book and ideas they can support. Another important feature reinforcing this bond would be discussion forums, allowing users to have meaningful conversations about works of literature and critical theories outside of the context of a particular paper’s comment page. Again, these discussion forums would blur hierarchical lines and make literary criticism accessible.

The success of social networks-with-a-purpose such as LinkedIn and Academia.edu sets a precedent for the branching out of a humanities social network. The website could easily split into a free version, and a premium version where established academics wishing to publish professionally can do, creating a database similar to JSTOR or Project Muse. Like those two databases, postsecondary institutions could pay for the premium version to give their students access to these articles, and students would still be able to share their opinions on the article with other users. Integrating this social network into university online services could reduce costs paying for physical copies of journals; furthermore, it would serve to get students interacting with their peers both inside and outside their institution, as well as with people not enrolled in academia. Furthermore, incorporating elements of social media categories 2 and 3, students could save papers (from the academic premium version only, as saving papers from the free version would invite too many complications regarding plagiarism) and rate them for students writing on similar topics. The noble intentions of the academic humanities and the pleasure of the people would both be served.

In the face of a changing academic landscape, the humanities are increasingly perceived as too costly for institutions and too pretentious for everyday people. A social network focused on the humanities would help remedy that, fostering the perception that literary criticism is accessible, while providing a database with the potential to cut costs for libraries.

Works Cited

Bauerlein, Mark. “Oh the Humanities!” Weekly Standard. 16 May 2011./p>

Donoghue, Frank. “Can the Humanities Survive the 21st Century?.” Chronicle of Higher Education. 05 Sept. 2010.

Fendrich, Laurie. “The Humanities Have No Purpose.” Chronicle of Higher Education. 20 Mar 2009.

Grahl, Tim. “The 6 Types of Social Media.” Out:think. Out:think Group.

Hall, Stuart. “The Emergence of Cultural Studies and the Crisis of the Humanities.” Humanities as Social Technology. 53. (1990): 11-23.

Knapp, Steven. “The Enduring Dilemma of the Humanities.” Phi Beta Kappa Society. Phi Beta Kappa Society, 29 Mar 2011.

Lea, David. “The Future of the Humanities in Today’s Financial Markets.” Educational Theory. 64.3 (2014): 261-83.

Sinclair, Stefan. “Confronting the Criticisms: A Survey of Attacks on the Humanities.” 4Humanities.org. The Digital Humanities Community, 09 Oct 2012.

----  

Four National and International talks by University of Lethbridge Digital Humanities students

Posted: Feb 02, 2015 12:02;
Last Modified: Mar 04, 2015 05:03

---

A quick catchup post: this semester is shaping up to be a blockbuster in terms of University of Lethbridge Digital Humanities students’ success in national and international refereed conferences.

The semester began strongly with Kayla Ueland’s presentation “Reconciling between novel and traditional ways to publish in the Social Sciences” at the Force 2015 conference in Oxford this past January. Ueland is a graduate student in Sociology and a Research Assistant in the Lethbridge Journal Incubator.

We have also just heard that four students and recent graduates of the University of Lethbridge’s Department of English have had papers accepted at the joint meeting of the Canadian Society for the Digital Humanities/Société canadienne des humanités numériques and the Association for Computers and the Humanities.

The students and their papers are:

Babalola Aiyegbusi is a recent graduate of the department’s M.A. programme (2014). Rawluk and Alexander, are both fourth-year undergraduates. Singh is a first year M.A. student.

----  

On the Road: Adventures in Public Digital Humanities (Kim Martin on the DH Maker bus at the University of Lethbridge)

Posted: Sep 26, 2014 16:09;
Last Modified: Sep 26, 2014 16:09

---

In 2013, Kim and two friends, Ryan Hunt and Beth Compton, purchased a 1991 school bus, which they have since converted into Ontario’s first mobile makerspace: the DH MakerBus (Makerbus.ca).

What started as a passion project quickly became an area of academic interest, and Kim now works to showcase the public benefits of humanities education in London and beyond. She is a co-lead on the Humanities Matters Bus Tour and is currently implementing a local chapter of 4Humanities in London, Ontario.

This paper discusses her experiences in establishing this project.

Speaker: Kim Martin, Library and Information Science, Western University
Date: Monday, September 29, 2014
Location: D-631
Time: 12:00 – 1:00 p.m.

----  

University of Lethbridge Tenure Track job: Postcolonial or Modernism, DH welcome (Deadline April 15)

Posted: Mar 17, 2014 18:03;
Last Modified: Mar 17, 2014 18:03

---

The Department of English at the University of Lethbridge invites applications for a probationary (tenure-track) position at the Assistant Professor rank to begin 1 July 2014, subject to budgetary approval. The position is in the area of Twentieth-Century Literature with specialization in either Post-Colonial Literature or Modernism.

Applicants should have a Ph.D. at or near completion and teaching experience at the university level. The University aspires to hire individuals who have demonstrated considerable potential for excellence in teaching, research and scholarship. New faculty members are eligible to apply for university funding in support of research and scholarly activities.

The position is open to all qualified applicants, although preference will be given to Canadian citizens and permanent residents of Canada. The University is an inclusive and equitable campus encouraging applications from qualified women and men including persons with disabilities, members of visible minorities and Aboriginal persons.

The Department of English is a dynamic unit committed to excellence in research and teaching with faculty members who represent a wide range of disciplinary interests.  Members of the department are involved in collaborative and interdisciplinary research initiatives within the University of Lethbridge and beyond.  The university houses the Institute for Child and Youth Studies (I-CYS), the Centre for Oral History and Tradition (COHT), and a new centre in Digital Humanities is currently under development.  The University of Lethbridge is the home of Global Outlook::Digital Humanities (globaloutlookdh.org) and the editorial offices of the scholarly journal Digital Studies/Le champ numérique. Students have the opportunity to have their writing published in the university’s Whetstone magazine and to participate in two annual student writing competitions.  The department is dedicated to ensuring the continued quality of its strong undergraduate program and its emerging graduate program.

Located in southern Alberta, near the Rocky Mountains, Lethbridge offers a sunny, dry climate that is agreeably mild for the prairies, excellent cultural and recreational amenities and attractive economic conditions. Founded in 1967, the University has an enrollment of over 8,000 students from around the world. Our student body has grown by 50 percent in the last 10 years, phenomenal growth among institutions in Canada. Despite this growth, we have remained true to who we are – student-focused, research-intensive, and grounded in liberal education. For more information about the University, please visit our web site at www.uleth.ca.

Applications should include a curriculum vitae, transcripts, outlines of courses previously taught, teaching evaluations, publication reprints or preprints, a statement of teaching philosophy and research interests, and three letters of reference. Send this material and arrange for letters to be mailed directly to:           
                             
Dr. Adam Carter, Chair
Department of English
The University of Lethbridge
4401 University Drive
Lethbridge, Alberta, T1K 3M4
Canada

Telephone: (403) 380-1894
Fax:  (403) 382-7191
Email: bev.garnett@uleth.ca

Consideration of completed applications will begin by April 15, 2014, and will continue until the position is filled.  

----  

A Review of “A Machine Learning Approach For Identification of Thesis and Conclusion Statements in Student Essays"

Posted: Jun 06, 2013 13:06;
Last Modified: Jun 06, 2013 13:06

---

I’ve become quite interested in the idea of machines grading papers ever since I read the New York Times Article Dan posted in the group library: “New test for Computers: Grading Essays at the College level.” For now I am just going to concern myself with the article in my title, but I am working on a much larger piece which combines several scholarly articles as well as a few editorials, on an educational issue that I feel will become increasingly relevant as technology expands: grading machines.

This article is interesting for several reasons, but mostly because it tests the abilities of human-markers against machine-markers, which is after all the most important issue when determining the efficacy and usefulness of these machines. Can these machines pick out those things that produce an effective piece of writing? The article defines what it means by effective writing, which I believe is an adequate but unfinished definition: “The literature in the teaching of writing suggests that invention, arrangement and revision in essay writing must be developed in order to produce effective writing. Stated in practical terms, students at all levels, elementary school through post-secondary education, can benefit from practice applications that give them an opportunity to work on discourse structure in essay writing.” I think we can mostly agree that if a machine can fulfill these requirements, that while imperfect, it is headed in the right direction.

So how well do the machines in this experiment perform these functions? Firstly, it is important to look at what it is the machines are being asked to do. In a broad sense they are being asked to identify the thesis and conclusion statements in a few hundred student essays. But the greater goal is to have them outperform a positional algorithm; this would show evidence that the machines can not only recognize specific examples input into them, but can also apply knowledge based on those examples.

The positional algorithm pertains to how a computer marks an essay based on length and position of words and paragraphs:

Essay length is highly correlated with human or machine scores (i.e., the longer the essay, the higher the score). Similarly, the position of the text in an essay is highly related to particular discourse elements. Therefore, we computed a positional label for the thesis and conclusion discourse categories. The method outlined in Table II was used for computing baselines reported in a later section” (462). The computing baselines for the positional algorithm are as follows, where P=paragraph:

For thesis statements: (1)# of P=3 or more all text in P 1, excluding the first sentence. (2) # of P=2 or more select all text in the first P. (3) # of P=1 Select nothing

For conclusion statements: (1) # of P=3 or more all text in final P (2) # of P=2 or more select all text in final P (3) # of P=1 select nothing.

The Results: “the performance of both discourse-based systems exceeds that of the positional algorithm, with the exceptions of the topic-independent system, PIC, for identification of thesis statements” (465).

“For identification of conclusion statements, the topic-dependent and topic independent systems have overall higher agreement than for thesis statements” (465)

“Thesis statements are more difficult to model as is apparent when we compare system performance for thesis and conclusion statements” (465).

”Overall, the results in this study indicate that it is worth continuing research using machine learning approaches for this task, since they clearly outperform the positional baseline algorithm” (465).

The machines are better at identifying conclusion and thesis statements than the positional baseline algorithm, but they are not as effective as the human markers. However the machines can do this process much faster than the human markers, providing almost instant feedback. What we see here, I think, is that machines are helpful when we want to identify specific discourse elements related to writing: i.e. grammar, thesis and conclusion statements, punctuation etc., Machines handle the mechanical aspects of writing quite well. What I have been finding, however, is that machines are notoriously poor at dealing with the creative aspects of writing, including subversion of writing rules.

My larger blog will focus on a synthesis of the creative and mechanical aspects of writing and the pros and cons of machine grading that goes along with those. Specifically, I will look at how a machine might deal with some of the more unusual writing pieces the unessay is likely to produce. Can a machine ever be relied upon to mark something that bends the rules of writing for a purpose?

Works Cited:

Burstein, Jill, and Daniel Marcu. “A Machine Learning Approach for Identification of Thesis and Conclusion Statements in Student Essays.” Computers and the Humanities 37.4 (2003): 455–467. JSTOR. Web. 31 May 2013

----  

When everyone’s super… On gaming the system

Posted: May 23, 2012 20:05;
Last Modified: May 23, 2012 21:05

---

note: first published on the dpod blog

Syndrome: Oh, I’m real. Real enough to defeat you! And I did it without your precious gifts, your oh-so-special powers. I’ll give them heroics. I’ll give them the most spectacular heroics the world has ever seen! And when I’m old and I’ve had my fun, I’ll sell my inventions so that everyone can have powers. Everyone can be super! And when everyone’s super… [chuckles evilly] no one will be.

The Incredibles

Here’s a funny little story about how a highly specialised journal gamed journal impact measurements:

The Swiss journal Folia Phoniatrica et Logopaedica has a good reputation among voice researchers but, with an impact factor of 0.655 in 2007, publication in it was unlikely to bring honour or grant money to the authors’ institutions.

Now two investigators, one Dutch and one Czech, have taken on the system and fought back. They published a paper called ‘Reaction of Folia Phoniatrica et Logopaedica on the current trend of impact factor measures’ (H. K. Schutte and J. G. Švec Folia Phoniatr. Logo.59, 281–285; 2007). This cited all the papers published in the journal in the previous two years. As ‘impact factor’ is defined as the number of citations to articles in a journal in the past two years, divided by the total number of papers published in that journal over the same period, their strategy dramatically increased Folia‘s impact factor this year to 1.439.

In the ‘rehabilitation’ category, shared with 26 other journals, Folia jumped from position 22 to position 13.

—“Tomáš Opatrný. Playing the system to give low-impact journal more clout:. Nature 455, 167 (11 September 2008).

Assessing (and hence demonstrating) impact is a difficult but important problem in contemporary academia.

For most of the last century, university researchers have been evaluated on their ability to “write something and get it into print… ‘publish or perish’” (as Logan Wilson put it as early as 1942 in The Academic Man: A Study in the Sociology of a Profession, one of the first print citations of the term).

As you might expect, the development of a reward system built on publication led to a general increase in number of publications. Studies of science publication suggest a growth rate in the number of scientific articles and journals of between 2 and 5% per year since 1907 (a rate that leads to doubling roughly every 15 years). There is also evidence for a particularly marked rise in numbers after the 1950s.

This kind of growth vitiates the original point of the metric. If everybody publishes all the time, then the simple fact of publication is no longer sufficient as a proxy for excellence. You could count the sheer number of publications—a measure that is in fact widely used in popular contexts to imply productivity—were it not so obviously open to abuse: unless you institute some kind of control over the type and quality of publication, a system that simply counts publications will lead inevitably to an increase in number, and a corresponding decrease in quality, originality, and length.

It is perhaps for this reason that modern peer review systems begin to be institutionalised in the course of the second half of the last century. In fact, while peer review is probably understood to be the sine qua non of university research, and while it is possible to trace sporadic examples of activity resembling peer review back into the classical period, peer review in its modern form in fact really only begins to take shape only in the period from the 1940s-1970s. Major scientific journals, including Science and The Journal of the American Medical Association, for example, begin to make systematic use of external reviewers only in the 1940s, partially as an apparent response to the growing number and specialisation of submissions.

As you might expect, the peer review/reward system has itself been gamed. In the same way a reward system built on counting publications leads inevitably to an increase in the number of publications, a reward system build on counting peer-reviewed publications leads, inevitably, to an increase in the number of peer-reviewed publications… and the size and number of the journals that publish them.

Journal impact measurements are a controversial response to the not-surprising fact that peer review has also become an insufficient proxy for excellence. It is still relatively early days in this area (though less so in the natural sciences) and there is as yet not a complete consensus as to how impact should be quantified. As a result, the measures can still take many forms, from lists of ranked journals, to citation counts, to circulation and aggregation statistics, to in the case of on-line journals even more difficult-to-interpret statistics such as bounce and exit rates.

Regardless of how the impact factor debate settles out, however, it is only a matter of time until it too is gamed. Indeed, as the example of Folia Phoniatrica et Logopaedica suggests, it even may not be a matter of time. If you count citations, researchers will start ensuring they get cited. If you rank journals, they will ensure their journals fit your ranking criteria. If you privilege aggregation, the aggregators will be flooded with candidates for aggregation. And it is not clear that commercial understandings of good web analytics are really appropriate for scholarly and scientific publishing.

But the Folia Phoniatrica et Logopaedica example is also interesting because I’m not sure it is a bad thing. I can’t independently assess Opatrný’s claim that the journal is well respected though faring badly in impact measurements, but it wouldn’t surprise me if he was right. And the fact that a single researcher in a single article was able to more than double his journal’s impact score simply by citing every paper published in the journal in the previous two years leaves me… quite happy for him. I doubt there are many people who would consider the article cited by Opatrný to be in some way fraudulent. Instead, I suspect most of us consider it evidence (at best) that there are still some bugs in the system and (at worst) of a successful reductio ad absurdum–similar in a certain sense to Alan Sokol’s submission to Social Text.

None of this means that impact metrics are an intrinsically bad thing. Or that peer review isn’t good. Or that researchers shouldn’t be expected to publish. In fact, in many ways, the introduction of these various metrics, and the emphasis they receive in academia, is very good. Peer review has become almost fully institutionalised in the humanities in the course of my career. When I was a graduate student in the early 1990s, most journals I submitted to did not have formal explanation of their review policies and many were probably not, strictly speaking, peer reviewed. But it was difficult to tell and nobody I knew even attempted to distinguish publications on their CVs on the basis of whether or not they were peer reviewed. We were taught to distinguish publications (and the primary metric was still number of publications) on the basis of genre: you separated reviews from encyclopedia entries from notes from lengthy articles. A review didn’t count for much, even if we could have shown it was peer reviewed, and a lengthy article in what “everybody knew” to be a top journal counted for a lot, whether it was peer reviewed or not.

By the time I was department chair, 10 years later, faculty members were presenting me with CVs that distinguished output on the basis of peer review status. In these cases, genre was less important that peer review status. Reviews that were peer-reviewed were listed above articles that weren’t and journals began being quite explicit about their reviewing policies. The journal I helped found, Digital Medievalist, began from its first issue with what we described as “ostentatious peer review”: we named the referees who recommended acceptance on every article, partially as a way of borrowing their prestige for what we thought was, at the time, a fairly daring experiment in open access publication.

But we did this also because we thought (and think) that peer review is a good thing. My peer reviewed articles are, in almost every case, without a doubt better written and especially better and more carefully argued than my non-peer-reviewed articles. I’ve had stupid comments from referees (though none as stupid as seems to be the norm on grant applications), but there is only one case I can think of where I really couldn’t see how satisfying what the referee wanted wouldn’t improve things.

And the same is true for publication frequency. On the whole, my experience is that people who publish more (within a given discipline) also tend to publish better. I don’t publish too badly for somebody in my discipline. But most of the people who publish more than me in that same discipline are people I’d like to emulate. It is possible to game publication frequency; but on the whole, even the people who (I think) game it are among our most productive and most interesting scholars anyway: they’d still be interesting and productive even if they weren’t good at spinning material for one article into three.

So what does it mean that Schutte and Švec were able to game the impact measure of their journal with such apparent ease? And what should we say in response to the great uproar (much of it in my view well-founded) about the introduction of journal ranking lists by the ESF and Australian governments in recent years? Obviously some journals simply are better than others–more prestigious, better edited, more influential, containing more important papers. And it is difficult to see how frequency of citation is a bad thing, even if its absence is not necessarily evidence something is not good or not important. I would still rather have a heavily cited article in the PMLA than an article nobody read in a journal nobody has ever heard of.

Perhaps the most important thing is that it suggests, as Barbossa says to Miss Turner in Pirates of the Caribbean concerning the “Pirates’ Code,” that these kind of metrics should really be considered “more what you’d call ‘guidelines’ than actual rules.” Journals (and articles), that have a high impact factor, lots of citations, and are heavily read, are probably to be celebrated. But impact, citations, and subscription are not in themselves sufficient proxies for quality: we should expect to find equally good articles, journals, and scholars to exist with lower numbers in all these areas. And more importantly, we should expect to find that any quantifiable criteria we do establish will almost immediately be gamed by researchers in the field: most people with PhD-level research positions got where they are, after all, because they were pretty good at producing what examiners wanted to hear.

The real issue, then, is that metrics like “impact” or “peer review” or even “quantity” are attempts to use quantitative values as substitutes for qualitative assessment. The only real way of assessing quality is through qualitative assessment: that is to say by assessing a work on its own merits in relation to the goals it sets itself in terms of audience, impact, and subject matter, including the reasonableness of these goals. An article by an author who is not famous, in an obscure field, in a on-line journal that has no subscribers, and is not frequently cited may or may not represent poor quality work–in much the same way as might a frequently cited article in a popular field in a journal that is published by a famous academic, in the journal of the main scholarly society in a discipline. What is (or should be) important to the assessor is how reasonably each author has defined his or her goals and how well the resulting work has done in relation to those goals.

And this is where academics’ ability to game any other system becomes a virtue. Since there is no single metric we can create that researchers as a group will not figure out how to exploit (and then in short order), we should accept that we will simply never be able to propose a quantitative measurement for assessing intrinsic quality. What we can rely on, however, is that researchers will, on the whole, try to present their work in its best light. By asking the researchers to explain how their work can be best assessed, and being willing to evaluate that both that explanation and the degree to which the work meets the proposed criteria, we can find a way of comparatively evaluating excellence. Journals, articles, and researchers, that define, then meet or exceed reasonable targets for their disciplines and types of work, are excellent. Those that don’t, aren’t.

And in the meantime, we’ll develop far more innovative measurements of quality.

----  

Extracting a catalogue of element names from a collection of XML documents using XSLT 2.0

Posted: Sep 15, 2011 17:09;
Last Modified: May 23, 2012 19:05

---

We are trying to build a single stylesheet to work with the documents of two independent journals. In order to get a sense of the work involved, we wanted to create a catalogue of all elements used in the published articles. This means loading as input document directories’ worth of files and then going through extracting and sorting the elements across all the input documents.

Here’s the stylesheet that did it for us. It is probably not maximally optimised, but it currently does what we need. Any suggestions for improvements would be gratefully received.

Some notes:

  1. Our goal was to pre-build some templates for use in a stylesheet, so we formatted the elements names into xsl templates.
  2. Although you need to use this sheet with an input document, the input document is not actually transformed (the files we are extracting the element names from are loaded using the collection() function). So it doesn’t matter what the input document is as long as it is valid XML (we used the stylesheet itself)
<?xml version="1.0"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<!-- this output is because we are going to construct 
ready-made templates for each element -->
    <xsl:output method="text"/>

<!-- for pretty printing -->
    <xsl:variable name="newline">
        <xsl:text> 
        </xsl:text>
    </xsl:variable>

<!-- Load the files 
in the relevant directories -->
    <xsl:variable name="allFiles"
    select="collection(iri-to-uri('file:///some/path/to/the/directories?select=*.xml;
    recurse=yes'))"/>

<!-- Dump their content into a single big pile -->
    <xsl:variable name="everything">
        <xsl:copy-of select="$allFiles"/>
    </xsl:variable>

<!-- Build a key for all elements using their name -->
    <xsl:key name="elements" match="*" use="name()"/>

<!-- Match the root node of the input document
(since the files we are actually working on have been 
loaded using the using the collection() function, nothing 
is actually going to happen to this element) -->
    <xsl:template match="/">

       <!-- this is information required to turn the output into an 
              XSL stylesheet itself -->
        <xsl:text>&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            version="1.0"></xsl:text>
        <xsl:value-of select="$newline"/>
        <xsl:text>&lt;!--Summary of Elements --&gt;</xsl:text>
        <xsl:value-of select="$newline"/>
        <xsl:value-of select="$newline"/>

       <!-- this invokes the collection of all elements in all the files
       in the directory for further processing -->
        <xsl:for-each select="$everything">

           <!-- This makes sure we are dealing with the first named key -->
            <xsl:for-each 
            select="//*[generate-id(.)=generate-id(key('elements',name())[1])]">

               <!-- sort them -->
                <xsl:sort select="name()"/>

                <xsl:for-each select="key('elements', name())">

                   <!-- this makes sure that only the first instance 
                    of each element name is outputted -->
                    <xsl:if test="position()=1">
                        <xsl:text>&lt;xsl:template match="</xsl:text>
                        <xsl:value-of select="name()"/>
                        <xsl:text>"> </xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:text>&lt;!--</xsl:text>
                        <!-- this counts the remaining occurences -->
                        <xsl:value-of select="count(//*[name()=name(current())])"/>
                        <xsl:text> occurences</xsl:text>
                        <xsl:text>--&gt;</xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:text>&lt;/xsl:template></xsl:text>
                        <xsl:value-of select="$newline"/>
                        <xsl:value-of select="$newline"/>
                    </xsl:if>
                </xsl:for-each>
            </xsl:for-each>
        </xsl:for-each>
        <xsl:value-of select="$newline"/>
        <xsl:text>&lt;/xsl:stylesheet></xsl:text>
    </xsl:template>
</xsl:stylesheet>
----  

How to "clone" a test in Moodle 2.0

Posted: Mar 27, 2011 21:03;
Last Modified: May 23, 2012 19:05

---

Here’s how to clone a test in Moodle 2.0 (i.e. make an exact copy so that both appear in the course; this is useful for making practice tests or copying a basic test format so that it can be reused later in the course):

  1. Backup the test. Exclude all user data but include activities, blocks, and filters.
  2. Select “Restore.” Your backup should be listed under user private backups. Simply restore the file to create a second instance.
  3. Treat one of the instances as your clone: move it, edit it, change its titles and questions. It is a completely independent version of the original file.
----  

Organising Quizzes in Moodle 2.0

Posted: Mar 27, 2011 21:03;
Last Modified: May 23, 2012 19:05

---

Moodle 2.0 allows designers to divide questions into pages. But while this introduces great flexibility, it can be quite a cumbersome system to use at first. Here’s a method for making it more efficient:

  1. When you first build a test, put all questions on one page.
  2. Once you have the questions in the order you want, divide the test into different pages by selecting the last question for each page and selecting the “Begin new page after selected question.

This will cut down on your server calls (and hence time) immensely.

----  

Differences between Moodle and Blackboard/WebCT short answer questions

Posted: Mar 27, 2011 20:03;
Last Modified: May 23, 2012 19:05

---

There is an important difference between Moodle and Blackboard (WebCT) short answer questions that instructors should be aware of, namely that Moodle short answer questions allow only one answer field.

This means, for example, that you can’t easily import blackboard questions of the type “Supply the part of speech, person, tense, and number for the following form.” In Blackboard, you can present the student with four blanks for them to fill in, each with a different answer. When these are imported into Moodle, the question is converted into a form in which there is a single blank that has four possible correct answers.

There are various ways of asking the same kinds of questions in Moodle. The easiest when you are dealing with imported questions is to ask for a single quality in each answer. So instead of one question asking for part of speech, person, tense, and number, you might have four different questions, one for part of speech, another, for person, a third for tense, and a fourth for number.

A second way of asking this kind of question in Moodle is to use the embedded answer type. These are harder to write, but are arguably closer to the paper equivalent of the same type of question:

For the following Old English word supply the requested information:

clipode

Part of Speech: ____________
Tense: ____________
Number: ____________

----  

Multiple Choice Questions in Moodle

Posted: Mar 27, 2011 18:03;
Last Modified: May 23, 2012 19:05

---

Here are some tips for the composition of Multiple Choice Questions in Moodle.

  1. If students are allowed to mark more than one option correct and you intend to include at least one question where none of the offered options are correct, include as a possible answer “None of the listed options.”
    1. Do not call it “none of the above” since if (as you normally should) you have selected “shuffle answers,” you have no guarantee that it will be the final answer in the sequence.
    2. You should include this option in all questions in the set (including those for which some of the options are correct) to avoid giving the answer away when it appears.
    3. When “none of the listed options” is not the right answer, it should be scored at -100%, to avoid a student hedging his or her bets by selecting it and all the other answers.
  2. If you anticipate having a question for which all the answers are correct, you do not need a “All of the listed answers,” since selecting all will give students 100%.
  3. The correct options should be scored so they add up to 100%, of course!
  4. Incorrect options (exclusive of other than “None of the listed forms”) can be scored in a number of different ways:
    1. So that the total for all incorrect options (except “none of the listed forms”) is -100% (this stops a student hedging his or her bets by selecting all options); if you do not have a “none of the listed options” answer, you almost certainly should score this way.
    2. So that each negative is the reciprocal of a correct answer, regardless of whether all the incorrect answers add up to -100%. Use this if you don’t mind that a student selecting everything except a “None of the listed options” might end up with part marks.
----  

How to build a randomised essay/translation question in Moodle 2.0

Posted: Mar 20, 2011 16:03;
Last Modified: May 23, 2012 19:05

---

In my courses I often use a question of the following format:

  1. Common introduction
  2. Two or more sample passages or questions requiring an essay response
  3. A common form field for the answer to the student’s choice from #2.

Here is an example:

Write a modern English translation of one of the following passages in Old English in the space provided below.

1. Hæfst þū ǣnige ġefēran?
2. Hwæt māre dēst þū? Hæfst þū ġīet māre tō dōnne?

[Essay answer box for translation].

The point of this format is to provide the student with a choice of topics. If students all write their essays or translations at the same time, you can build your choice of topics by hand and write them into a single question. The challenge comes if you want to be able to allow your students to write the test asynchronously, as is common with Learning Management Software. In such cases you want to be able to draw your essay topics or translation passages randomly from a test bank.

All the basic elements you would need to do this are available in Moodle, both 1.x and 2.0+. You can use the “description” question type to put in the general instructions at the beginning; you can use the essay format question to provide the answer box. And you can use Moodle’s ability to assign random questions to draw your topics or translation passage from your test bank.

But there are also some problems:

  1. Description questions are unnumbered, meaning your introduction will not start with the question number
  2. Although there was some discussion before the release of Moodle 2.0 about allowing description questions to be randomised, this appears not to have been implemented. All questions that can be randomised must have an action associated with them. This means that every topic or translation passage must ask the student to do something. And also that each topic or translation will have a number.

What I do is the following:

  1. I write the introduction as a description question (and just accept that it has no number assigned).
  2. I write my translation passage or topics as “true / false” questions. Each consists of the topic or passage, followed by the question “I am writing on this topic/passage…” as the prompt for a true/false answer.
  3. I use the essay topic question to provide the common answer box. Since you need to have some text in an essay question, I use an anodyne instruction like “Write your essay/translation in the following space” to fill out the question.
  4. I assign a grade value of 0 to the two random topic/passages and assign the full grade value of the question to the essay answer box. The result is not elegant, but it works.
----  

How to setup a signup sheet in Moodle

Posted: Mar 15, 2011 14:03;
Last Modified: May 23, 2012 19:05

---

You can create a signup sheet for Moodle using the “Choice” activity.

A video showing how to do this can be found here: https://ctl.furman.edu/main/index.php?option=com_content&task=view&id=78&Itemid=90

In brief, however, here’s how to do it:

  1. Go to the section of your course in which you want the signup sheet to appear.
  2. With editing on, select the “Choice” activity.
  3. Fill in the title and description information.
  4. If you are restricting attendance, set the “Limit the number of responses allowed” option under “Limit” to “enabled.” Setting this allowed you to set how many people are allowed to choice any one option. If it is disabled, any number of participants may sign up for any particular session.
  5. Each “Option” represents an entry on the signup sheet. Write in the date and time (or anything else you require) in the “Option” field and, if you have enabled limits, the maximum number of participants for the entry in the “limit” field. If you need more than the standard five options, select “Add three more options” after you’ve filled in the first five.
----  

Schools of Schools of "Humanities Computing"

Posted: Feb 14, 2009 11:02;
Last Modified: May 23, 2012 19:05

---

When I went to Yale to begin my PhD in 1989, the English department—or perhaps just the graduate students, a group that tends to feel these things more strongly—was mourning the decline of the “Yale School”. New Historicism was the increasingly dominant critical approach at the time, and while it seemed that all the Deconstructionists had been at Yale, none of the major New Historicists were—Stephen Greenblatt got his PhD (and B.A. and M.A.) from Yale, but, like Michel Foucault, seems never to have held a faculty appointment there.

I was thinking of this sense of “school” yesterday, while I was attending the University of Alberta’s Humanities Computing Graduate School conference.

To call the Humanities Computing programme (HuCo) at the U of A a humanities computing school is a little of a misnomer—the students’ interests range far more widely than the traditional definition of the application of technology to the study of humanistic topics. This programme included papers on the design of a commercial children’s game (albeit one tied to musea and libraries), the visualisation of pre- and co-requisites in an undergraduate programme planning application, the development of a controlled vocabulary for medical education, attempts to regulate cybercrime, the standardisation of library icons, a game for teaching boolean searching, and approaches to analysing contributions to wikis by students and researchers.

There were some papers that might be considered more traditionally to belong to the domain of the “digital humanities”—I gave a talk on ontologies and scholarly editing, and there were papers on Heidigger and textual media in 19th India, a discussion of the computer as metaphor for cognition, and a paper on amongst other things fears of co-evolution found in the popular American work of German writer Heinrich Hauser. Significantly, though, these were given by people who were to a greater or lesser degree outside the current school: the two keynote speakers, a PhD student from the psychology department, and a graduate of the HuCo programme (albeit one of the lead organisers of the conference itself).

The interesting thing for me is not that so many papers were more informatics than the digital humanities more narrowly defined—I have heard many papers from members of HuCo students and faculty in the traditional digital humanities. Rather it is the relative consistency of focus on analysis, design, and usability that tends to characterise work of students from this school, a consistency that might well even allow us to speak of an Alberta “School” (in fact, this concentration, perhaps due to projects like TAPoR, and in earlier days, TACT, might even be something that could characterise a broader trend in Canadian digital humanities). The contrast in focus is certainly very striking, for example, when compared against a programme such as that of the Incontri di Filologia Digitale in Verona this past January, or even, in a way that I’m finding harder to define, most of the papers on digital topics presented at the MLA in San Francisco. I suspect this focus might also mark the “Alberta School” apart from, for example, the focus of the Centre for Humanities Computing (CCH) at King’s College, London—a school that can also be considered a “School,” albeit of a fairly different kind based on the work of so many of its most prominent graduates, students, associates, and faculty.

I was discussing this with Geoffrey Rockwell, late of the New Media programme at McMaster, and now one of the newer senior members of the programme at the University of Alberta, and he suggested to me that this diversity of institutional approaches is a sign of strength in the discipline. And I think he is right. The problem when I went to graduate school in English in the early 1990s was the sense among the graduate students that the rise of other “Schools” was part of the reason for the decline of Yale’s. In the case of the Digital Humanities, however, this rise of “Schools” with important differences seems primarily to involve the development of complementary strengths.

----  

Byte me: Technological Education and the Humanities

Posted: Dec 20, 2008 14:12;
Last Modified: May 23, 2012 19:05

---

Note: Published in "_Heroic Age_ 12":http://www.heroicage.org/issues/12/em.php

I recently had a discussion with the head of a humanities organisation who wanted to move a website. The website was written using Coldfusion, a proprietary suite of server-based software that is used by developers for writing and publishing interactive web sites (Adobe nd). After some discussion of the pros and cons of moving the site, we turned to the question of the software.

Head of Humanities Organisation: We'd also like to change the software.
Me: I'm not sure that is wise unless you really have to: it will mean hiring somebody to port everything and you are likely to introduce new problems.
Head of Humanities Organisation: But I don't have Coldfusion on my computer.
Me: Coldfusion is software that runs on a server. You don't need it on your computer. You just need it on the server. Your techies handle that.
Head of Humanities Organisation: Yes, but I use a Mac.

I might be exaggerating here—I can't remember if the person really said they used a Mac. But the underlying confusion we faced in the conversation was very real: the person I was talking to did not seem at all to understand the distinction between a personal computer and a network server— basic technology by which web pages are published and read.

This is not an isolated problem. In the last few years, I have been involved with a humanities organisation that distributes e-mail by cc:-list to its thirty-odd participants because some members believe their email system can't access listervers. I have had discussions with a scholar working on a very time-consuming web-based research project who was intent on inventing a custom method for indicating accents because they thought Unicode was too esoteric. I have helped another scholar who wrote an entire edition in a proprietary word-processor format and needed to recover the significance of the various coloured fonts and type faces he had used. And I have attended presentations by more than one project that intended to do all their development and archiving in layout-oriented HTML.

These examples all involve basic technological misunderstandings by people actively interested in pursuing digital projects of one kind or another. When you move outside this relatively small subgroup of humanities scholars, the level of technological awareness gets correspondingly lower. We all have colleagues who do not understand the difference between a blog and a mailing list, who don't know how web-pages are composed or published, who can't insert foreign characters into a word-processor document, and who are unable to backup or take other basic precautions concerning the security of their data.

Until very recently, this technological illiteracy has been excusable: humanities researchers and students, quite properly, concerned themselves primarily with their disciplinary work. The early Humanities Computing experts were working on topics, such as statistical analysis, the production of concordances, and building the back-ends for dictionaries, that were of no real interest to those who intended simply to access the final results of this work. Even after the personal computer replaced the typewriter, there was no real need for humanities scholars to understand technical details beyond such basics as turning a computer on and off and starting up their word-processor. The principal format for exchange and storage of scholarly information remained paper and the few areas where paper was superseded—such as in the use of email to replace the memo—the technology involved was so widely used, so robust, and above all so useful and so well supported that there was no need to learn anything about it: if your email and word-processor weren't set up at the store when you bought a computer, you could expect this work to be done for you by the technicians at your place of employment or over the phone by the Help Desk at your Internet Service Provider: nothing about humanities scholars' use of the technology required special treatment or distinguished them from the University President, a lawyer in a one-person law office... or their grandparents.

In the last half-decade, this situation has changed dramatically. The principal exchange format for humanities research is no longer paper but the digital byte—albeit admittedly as represented in PDF and word-processor formats (which are intended ultimately for printing or uses similar to that for which we print documents). State agencies are beginning to require open digital access to publicly-funded research. At humanities conferences, an increasing number of sessions focus on digital project reports and the application. And as Peter Robinson has recently argued, it is rare to discover a new major humanities project that does not include a significant digital component as part of its plans (Robinson 2005). Indeed some of the most interesting and exciting work in many fields is taking advantage of technology such as GPS, digital imaging, gaming, social networking, and multimedia digital libraries that was unheard of or still very experimental less than a decade ago.

That humanists are heavily engaged with technology should come, of course, as no real surprise. Humanities computing as a discipline can trace its origins back to the relatively early days of the computer, and a surprising number of the developments that led to the revolution in digital communication over the last decade were led by people with backgrounds in humanities research. The XML specification (XML is the computer language that underlies all sophisticated web-based applications, from your bank statement to Facebook) was edited under the supervision of C. Michael Sperberg-McQueen, who has a PhD in Comparative Literature from Stanford and was a lead editor of the Text Encoding Initiative (TEI) Guidelines, the long-standing standard for textual markup in the humanities, before he moved to the W3C (Sperberg-McQueen 2007). Michael Everson, the current registrar and a co-author of the Unicode standard for the representation of characters for use with computers, has an M.A. from UCLA in Indo-European linguistics and was a Fulbright Scholar in Irish at the University of Dublin (Evertype 2003-2006). David Megginson, who has also led committees at the W3C and was the principal developer of SAX, a very widely used processor for XML, has a PhD in Old English from the University of Toronto and was employed at the Dictionary of Old English and the University of Ottawa before moving to the private sector (Wikipedia Contributors 2008).

Just as importantly, the second generation of interactive web technology (the so-called "Web 2.0") is causing the general public to engage with exactly the type of questions we research. The Wikipedia has turned the writing of dusty old encyclopedias into a hobby much like ham-radio. The social networking site Second Life has seen the construction of virtual representations of museums, and libraries. Placing images of a manuscript library or museum's holding on the web is a sure way of increasing in-person traffic at the institution. The newest field for the study of such phenomenon, Information Studies, is also one of the oldest: almost without exception, departments of Information Studies are housed in and are extensions of traditional Library science programmes.

The result of this technological revolution is that very few active humanists can now truthfully say that they have absolutely no reason to understand the technology underlying their work. Whether we are board members of an academic society, working on a research project that is considering the pros and cons of on-line publication, instructors who need to publish lecture notes to the web, researchers who are searching JSTOR for secondary literature in our discipline, or the head of a humanities organisation that wants to move its web-site, we are all increasingly involved in circumstances that require us to make basic technological decisions. Is this software better than that? What are the long-term archival implications for storing digital information in format x vs. format y? Will users be able to make appropriate use of our digitally-published data? How do we ensure the quality of crowd-sourced contributions? Are we sure that the technology we are using will not become obsolete in an unacceptably short period of time? Will on-line publication destroy our journal's subscriber base?

The problem is that these are not always questions that we can "leave to the techies." It is true that many universities have excellent technical support and that there are many high-quality private contractors available who can help with basic technological implementation. And while the computer skills of our students is often over-rated, it is possible to train them to carry out many day-to-day technological tasks. But such assistance is only as good as the scholar who requests it. If the scholar who hires a student or asks for advice from their university's technical services does not know in broad terms what they want or what the minimum technological standards of their discipline are, they are likely to receive advice and help that is at best substandard and perhaps even counter-productive. Humanities researchers work on a time-scale and with archival standards far beyond those of the average client needing assistance with the average web-site or multimedia presentation. We all know of important print research in our disciplines that is still cited decades after the date of original publication. Not a few scholarly debates in the historical sciences have hinged on questions of whether a presentation of material adequately represents the "original" medium, function, or intention. Unless he or she has special training, a technician asked by a scholar to "build a website" for an editorial project may very well not understand the extent to which such questions require the use of different approaches to the composition, storage, and publication of data than those required to design and publish the athletic department's fall football schedule.

Even if your technical assistant is able to come up with a responsible solution for your request without direction from somebody who knows the current standards for Digital Humanities research in your discipline, the problem remains that such advice almost certainly would be reactive: the technician would be responding to your (perhaps naive) request for assistance, not thinking of new disciplinary questions that you might be able to ask if you knew more about the existing options. Might you be able to ask different questions by employing new or novel technology like GPS, serious gaming, or social networking? Can technology help you (or your users) see your results in a different way? Are there ways that your project could be integrated with other projects looking at similar types of material or using different technologies. Would your work benefit from distribution in some of the new publication styles like blogs or wikis? These are questions that require a strong grounding in the original humanistic discipline and a more-than-passing knowledge of current technology and digital genres. Many of us have students who know more than than we do about on-line search engines; while we might hire such students to assist us in the compilation of our bibliographies, we would not let them set our research agendas or determine the contours of project we hire them to work on. Handing technological design of a major humanities research project over to a non-specialist university IT department or a student whose only claim to expertise is that they are better than you at instant messaging is no more responsible.

Fortunately, our home humanistic disciplines have had to deal with this kind of problem before. Many graduate, and even some undergraduate, departments require students to take courses in research methods, bibliography, or theory as part of their regular degree programmes. The goal of such courses is not necessarily to turn such students into librarians, textual scholars, or theorists—though I suppose we wouldn't complain if some of them discovered a previously unknown interest. Rather, it is to ensure that students have a background in such fundamental areas sufficient to allow them to conduct their own research without making basic mistakes or suffering unnecessary delays while they discover by trial-and-error things that might far more efficiently be taught to them upfront in the lecture hall.

In the case of technology, I believe we have now reached the stage where we need to be giving our students a similar grounding. We do not need to produce IT specialists—though it is true that a well-trained and knowledgeable Digital Humanities graduate has a combination of technological skills and experience with real-world problems and concepts that are very easily transferable to the private sector. But we do need to produce graduates who understand the technological world in which we now live—and, more importantly, how this technology can help them do better work in their home discipline.

The precise details of such an understanding will vary from discipline to discipline. Working as an Anglo-Saxonist and a textual critic in an English department, I will no doubt consider different skills and knowledge to be essential than I would if I were a church historian or theologian. But in its basic outlines such a orientation to the Digital Humanities probably need not vary too much from humanities department to humanities department. We simply should no longer be graduating students who do not know the basic history and nature of web technologies, what a database is and how it is designed and used, the importance of keeping content and processing distinct from each other, and the archival and maintenance issues involved in the development of robust digital standards like Unicode and the TEI Guidelines. Such students should be able to discuss the practical differences (and similarities) of print vs. web publication; they should be able to assess intelligently from a variety of different angles the pros and cons of different approaches to basic problems involving the digitisation of text, two and three-dimensional imaging, animation, and archival storage and cataloguing; and they should be acquainted with basic digital pedagogical tools (course management and testing software; essay management and plagiarism detection software) and the new digital genres and rhetorics (wikis, blogs, social networking sites, comment boards) that they are likely to be asked to consider in their future research and teaching.

Not all humanists need to become Digital Humanists. Indeed, in attending conferences in the last few years and observing the increasingly diverging interests and research questions pursued by those who identify themselves as "Digital Humanists" and those who define themselves primarily as traditional domain specialists, I am beginning to wonder if we are not seeing the beginnings of a split between "experimentalists" and "theorists" similar to that which exists today in some of the natural sciences. But just as theoretical and experimental scientists need to maintain some awareness of what each branch of their common larger discipline is doing if the field as a whole is to progress, so too must there remain an interaction between the traditional humanistic and digital humanistic domains if our larger fields are also going to continue to make the best use of the new tools and technologies available to us. As humanists, we are, unavoidably, making increasing use of digital media in our research and dissemination. If this work is to take the best advantage of these new tools and rhetorics—and not inadvertently harm our work by naively adopting techniques that are already known to represent poor practice, we need to start treating a basic knowledge of relevant digital technology and rhetorics as a core research skill in much the same way we currently treat bibliography and research methods.

Works Cited

Adobe. nd. "Adobe Coldfusion 8." http://www.adobe.com/products/coldfusion/

Evertype 2003-2006. "Evertype: About Michael Everson." http://www.evertype.com/misc/bio.html

Robinson, Peter. 2005. "Current issues in making digital editions of medieval texts—or, do electronic scholarly editions have a future?" DM 1.1 (2005): http://www.digitalmedievalist.org/journal/1.1/robinson/

Sperberg-McQueen, C. M. 2007. "C.M. Sperberg-McQueen Home Page." http://www.w3.org/People/cmsmcq/

Wikipedia contributors. 2008. "David Megginson." Wikipedia. http://en.wikipedia.org/w/index.php?title=David_Megginson&oldid=257685665

.
----  

Digital Plagiarism

Posted: Dec 15, 2008 13:12;
Last Modified: Mar 04, 2015 05:03

---

Essay and test management software

I have recently started using plagiarism detection software. Not so much for the ability to detect plagiarism as for the essay submission- and grading- management capabilities it offered. Years ago I moved all my examinations and tests from paper to course management software (WebCT originally, now Blackboard, and soon Moodle). I discovered in my first year using that software that simply delivering and correcting my tests on-line—i.e. without making any attempt to automate any aspect of the grading—reduced the time I spent marking exams by an immediate 50%: it turned out that I had been spending as much time handling tests (sorting, adding, copying grades, etc.) as I had marking them—more, in fact, if you included the in-class time lost to proctoring and returning corrected work to students.

I long wondered whether I could capture the same kind of efficiencies by automating my essay administration. Here too, I thought that I spent a lot of time handling paper rather than engaging with content. In this case, however, I was not sure I would be able to gain the same kind of time-saving. While I was sure that I could streamline my workflow, I was afraid that marking on screen might prove much less efficient than pen and paper—to the point perhaps of actually hurting the quality and speed of my essay-grading.

My experience this semester has been that my fears about lack of efficiency in the intellectual aspects of my correction were largely unfounded. And that my hopes for improving my administrative efficiency closely reflected the actual possibilities. The amount of time I spend handling a given set of essays has now dropped by approximately the expected 50%. While marking on screen is slower than marking with a pencil (a paper that used to take me 20 minutes to mark now will take 24 to 25 minutes), the difference is both smaller than I originally feared and more than compensated by the administrative time-savings, again including the class time freed up from the need to collect and redistribute papers.

Detecting plagiarism

Although I use it primarily for essay management, plagiarism dedection software such as turnitin, the system I use, was, of course, originally designed to detect plagiarism—which means that I too can use it to check my students’ originality. The developers remind users that a lack of originality is not the same thing as plagiarism: plagiarism is a specific type of lack of originality and even good pieces of work will have numerous passages in common with other texts in the software’s database. Obvious examples of this include quotations from works under discussion and bibliographic entries. It is also quite common to see the occasional short phrase or clause flagged in otherwise original work, especially at the beginning of paragraphs or in passages introducing or linking quotations. Presumably there are only so many ways of saying “In Pride and Prejudice, Jane Austen writes…”. In shorter papers, in fact, it is not unusual to see non-plagiarised student papers with as much as 30%-40% of their content flagged initially as being as “non-original.”

Some students, however, actually do plagiarise—which I understand to mean the use of arguments, examples, or words of another as if they were one’s own. When marking by hand, I’ve generally considered this to be a relatively small problem. In twelve years at the University of Lethbridge, I’ve caught probably less than ten students whose work was significantly plagiarised. Obviously I’ve never been able to say whether this was because my methods for discovering such work were missing essays by more successful plagiarists or because the problem really wasn’t that significant. Using plagiarism detection software gave me the opportunity of checking how well I had been doing catching plagiarists the old fashioned way, when I was marking by hand.

To the extent that one semesters’ data is a sufficient sample, my preliminary conclusions are that the problem of plagiarism, at least in my classes, seems to be more-or-less as insignificant as I thought it was when I graded by hand, and that my old method of discovering plagiarism (looking into things when a paper didn’t seem quite “right”) seemed to work.1 This past semester, I caught two people plagiarising. But neither of them had particularly high unoriginality scores: in both cases, I discovered the plagiarism after something in their essays seemed strange to me and caused me to go through originality reports turnitin provides on each essay more carefully. I then went through the reports for every essay submitted by that class (a total of almost 200), to see if I had missed any essays that turnitin’s reports suggested might be plagiarised. None of the others showed the same kind of suspicious content that had led me to suspect the two I caught. So for me, at least, the “sniff test” remains apparently reliable.

How software improves on previous methods of detecting plagiarism

Even though it turns out that I apparently can still rely on my ability to discover plagiarism intuitively, there are two things about plagiarism detection software that do mark an improvement over previous methods of identifying such problems by hand. The first is how quickly such software lets instructors test their hunches. In the two cases I caught this semester, confirming my hunch took less than a minute: I simply clicked on the originality report and compared the highlighted passages until I discovered a couple that were clearly copied by the students without ackowledgement in ways that went beyond reasonable use, unconscious error, or unrealised intellectual debt. Working by hand would have required me to Googling specific phrases from the paper one after the other and/or go to the library and to find a print source for the offending passages. In the past it has often taken me hours to make a reasonable case against even quite obvious examples of plagiarism.

The second improvement brought on by plagiarism detection software lies in the type of misuse of sources it uncovers. Although I became suspicious about the originality of the two papers I caught this semester on my own rather than through the software’s originality report, the plagiarism I uncovered from the originality report was in both cases quite different from anything I have seen in the past. Instead of the wholesale copying from one or two sources I used to see occasionally when I marked by hand, the plagiarism I found this year with turnitin involved the much more subtle use of unacknowledged passages, quotations, and argument and at key moments in the students’ papers. In the old days, my students used to plagiarise with a shovel; these students were plagiarising with a scalpel. I’m not completely sure I would have been able to find the sources for at least some of this unacknowledged debt if I had been looking by hand.

A new kind of plagiarism

This is where my title comes in. It is of course entirely possible that students always have plagiarised in this way and that I (and many of my colleagues) simply have missed it because it is so hard to spot by hand. But I think that the plagiarism turnitin caught in these two essays this semester actually may represent a new kind of problem involving the misappropriation of sources in student work—a problem that has different origins, and may even involve more examples of honest mistake, than we used to see when students had to go to the library to steal their material. Having interviewed a number of students in the course of the semester, I am in fact fairly firmly convinced that what turnitin found is a symptom of new problems in genre and research methodology that are particularly to the current generation of students—students who are undergoing their intellectual maturation as young adults in a digital culture that is quite different from that of even five years ago. What they were doing was still culpable—the great majority of my students were able to avoid misappropriating other peoples’ ideas in their essays. But new technologies, genres, and student approaches to note-taking are making it easier for members of the current generation to “fall into” plagiarism without meaning to in ways that previous generations of students would not. In the old days, you had to positively decide to plagiarise an essay by buying one off a friend or going to the library and actually typing text out that you were planning to present as your own. Nowadays, I suspect, students who plagiarise the way my two students did this semester do so because they haven’t taken steps to prevent it from happening.

Digital students, the essay, and the blog

This first thing to realise about how our students approach our assignments has to do with genre. For most (pre-digital) university instructors, the essay is self-evidently the way one engages with humanistic intellectual problems. It is what we were taught in school and practiced at university. But more importantly, it was almost exclusively how argument and debate were conducted in the larger society. The important issues of the day were discussed in magazines and newspapers by journalists whose opinion pieces were also more-or-less similar to the type of work students were asked to do at the university: reasoned, original, and polished pieces of writing in which a single author demonstrated his or her competence by the focussed selection of argument and supporting evidence. The value of a good essay—at the university or in the newspaper—lay in the author’s ability to digest arguments and evidence and make it his or her own: find and assimilate the most important material into an original argument that taught the reader a new way of understanding the information and opinions of others.

For most contemporary students, however, the essay is neither the only nor the most obviously appropriate way of engaging with the world of ideas, politics, and culture. Far more common, certainly numerically and, increasingly, in influence, is the blog—and making a good blog can often involve skills that are anathemetic to the traditional essay. While it is possible to publish essays using blog software, the point of blogs, increasingly, is less to digest facts and arguments than to accumulate and react to them. Political blogs—like the Ed Morrisey’s Captain’s Quarters (now at Hot Air, or Dan Froomkin’s (Whitehouse Watch)—tend to consist of collections of material from other on-line sources interspaced with opinion. The skill an accomplished blogger brings to this type of material lies in the ability to select and organise these quotations. A good blog, unlike a good essay, builds its argument and topic through the artful arrangement and excerpting of usually verbatim material passages from other people’s work—in much the same way that some types of music are based on the original use and combination of digitised sound samples from earlier recordings.

In other forums this method of “argument by quotation” is the norm: every video worth anything on YouTube has at least one response—a companion video where somebody else picks up on what the original contributor has done and answers back, usually with generous visual or verbal quotation. Professional examples include the various Barack Obama tributes that were a defining feature of the 2008 Democratic Primary in the U.S. (examples include the work of Obama Girl= and will.i.am= ). But amateur examples are also extremely common—as was the case with the heavy amateur response to the question of whether the “(lonelyGirl15)” series of 2005 was actually a professional production.

The real evidence of the evolving distinction between the essay and the blog as methods of argumentation and literary engagement, however, can be seen in the blogs that newspapers are increasingly asking their traditional opinion columnists to write. It is no longer enough to write essays about the news, though the continued existence and popularity of the (on-line and paper) newspaper column shows that there is still an important role for this kind of work. Newspapers (and presumably their readers) also now want columnists to document the process by which they gather the material they write about—creating a second channel in which they accumulate and react to facts and opinions alongside their more traditional essays. Among the older journalists, an example of this is Nicholas Kristof at the New York Times, who supplements his column with a blog and other interactive material about the subjects he feels most passionate about. In his column he digests evidence and makes arguments; in his blog he accumulates the raw material he uses to write his columns and presents it to others as part of a process of sharing his outrage.

In the case of our students, the problem this generic difference between the blog and the essay causes is magnified by the way they conduct their research. On the basis of my interviews, it appears to me that most of my first year students now conduct their research and compile their notes primarily by searching the Internet, and, when they find an interesting site, copying and pasting large sections of verbatim quotation into their word processor. Often they include the URL of this material with the quotations; but because you can always find the source of a passage you are quoting from the Internet, it is easy for them to get sloppy. Once this accumulation of material is complete, they then start to add their own contribution to the collection, moving the passages they have collected around and interspacing them with their opinions, arguments, and transitions.

This is, of course, how bloggers, not essayists, work. Unfortunately, since we are asking them to write essays, the result if they are not careful is something that belongs to neither genre: it is not a good blog, because it is not spontaneous, dynamic, or interactive enough; and it is not a good traditional essay, because it is more pastiche than an original piece of writing that takes its reader in an new direction. The best students working this way do in the end manage to overcome the generic mismatch between their method of research and their ultimate output, producing something that is more controlled and intellectually original than a blog. But less good students, or good students working under mid- or end-of-term pressure, are almost unavoidably leaving themselves open to producing work that is, in a traditional sense at least, plagiarised—by forgetting to distinguish, perhaps even losing track of the distinction, between their own comments and opinions and those of others, or by collecting and responding exclusively to passages mentioned in the work of others rather than finding new and original passages that support their particular arguments.

This is still plagiarism: it is no more acceptable to misrepresent the words and ideas of others as your own in the blogging world as it is in the world of the traditional essay. And in fact it is more invidious that the older style of plagiarism that involved copying large chunks out of other people’s work: in the new, digital plagiarism, the unackowledged debt tends to come in the few places that really matter in a good essay: the interesting thesis, the bold transition, the surprising piece of evidence that make the work worth reading. Because it is so closely tied to new genres and research methods, however, this type of plagiarism may also have as much a cultural as a “criminal” motivation. In preventing it, instructors will need to take into account the now quite different ways of working and understanding intellectual argument that the current generation of students bring with them into the classroom.

Advice to the Digital Essayist

So how can the contemporary student avoid becoming a Digital Plagiarist?

The first thing to do is realise the difference between the essay and the blog. When you write an essay, your reader is interested in your ability to digest facts and arguments and set your own argumentative agenda. A blog that did not allow itself to be driven by current events, incidents, and arguments in its field of endeavour—whether this is an event in the blogger’s personal life or the ebb and flow of an election campaign—would not be much of a blog. Essays are not bound by this constraint, however: they can be about things nobody is talking about and make arguments that don’t respond to anybody. Even when, as is more normal and probably better, essays do engage with previous arguments and topics that are of some debate, the expectation is that the essayist will digest this evidence and these opinions and shape the result in ways that point the reader in new directions—not primarily to new sources, but rather to new claims and ideas that are not yet part of the current discourse.

The second thing to realise is just how dangerous the approach many students take to note-taking is in terms of inviting charges of plagiarism. In a world of Google, where text is data that can be found, aggregated, copied, and reworked with the greatest of ease, it is of course very tempting to take notes by quotation. When people worked with paper, pens, and typewriters, quotation was more difficult and time-consuming: when you had to type out quotations by hand, writing summaries and notes was far quicker. Nowadays, it is much easier and less time-consuming to quote something than it is to take notes: when you find an interesting point in an on-line source, it uses far fewer keystrokes (and less intellectual effort) to highlight, copy, and paste the actual verbatim text of the source in a file than it does to turn to the keyboard and compose a summary statement or not. And if you are used to reading blogs, you know that this method can be used to summarise even quote long and complex arguments.

There are two problems, however. The first is that this method encourages you to write like a blogger rather than an essayist: your notes are set up in a way that makes it easier to write around your quotations (linking, organising, and responding to them) than to digest what they are saying and produce a new argument that takes your reader in unexpected directions.

The second problem is that it is almost inevitable that you will end up accidentally incorporating the words and ideas of your sources in your essay without acknowledgement. It is easy, in reworking your material, to drop a set of quotation marks, or to start paraphrasing something and then end up editing it back into an almost verbatim quotation—without realising what you’ve done. And it is even easier to get sloppy in your initial note-taking—forgetting to put quotation marks around passages you’ve copied or losing the source URL. Once you add your own material to this collection of quotations in the file that will eventually become your essay, you will discover that it is almost impossible to remember or distinguish between what you have added and what you got from somebody else.

One way of solving this is to change the way you take notes, doing less quoting and more summarising. Doing this might even help you improve the originality of your essays by forcing you to internalise your evidence and arguments. But cutting and pasting from digital sources is so easy that you are unlikely ever to stop doing it completely—and even if your do, you are very likely to run into trouble again the moment you face the pressure of multiple competing deadlines.

A better approach is to develop protocols and practices that help you reduce the chances that your research method will cause you to commit unintentional plagiarism. In other words to find a way of working that allows you to keep doing the most important part of what you currently do (and are going to continue to do no matter what your instructors say), but in a fashion that won’t lead you almost unavoidably into plagiarising from your sources at some point in your career.

Perhaps the single most important thing you can do in this regard is to establish a barrier between your research and your essay. In a blog, building your argument around long passages of text that you have cut and pasted into your own document is normal and accepted; in essay writing it isn’t. So when you come to write an essay, create two (or more) files: one for the copying and pasting you do as part of your research (or even better, one file for each source from which you copy and paste or make notes), and, most importantly, a separate file for writing your essay. In maintaining this separate file for you essays, you should establish a rule that nothing in this file is to be copied directly from an outside source. If you find something interesting in your research, you should copy this material into a research file; only if you decide to use it in your essay should should you copy it from your research file into your essay file.In other words, your essay file is focussed on your work: in that file, the words and ideas of others appear only when you need them to support your already existing arguments.

An even stricter way of doing this is to establish a rule that nothing is ever pasted into your essay file: if you want to quote a passage in your text, you can decide that you will only type it out by hand. This has the advantage of discouraging you from over-quoting or building your essay around the words of others—something that is fine in a blog, but bad in an essay. If this rule sounds too austere and difficult to enforce, at least make it a rule that you paste nothing into you essay before you have composed the surrounding material—i.e. the paragraph in which the passage is to appear and the sentence that is supposed to introduce it. Many professional essayists, especially those who learned to write before there were word-processors, actually leave examples and supporting quotations out of their earliest drafts—using place holders like “{{put long quotation from p35 here}}” to represent the material they are planning to quote until they have their basic argument down.

Another thing you could try is finding digital tools that will make your current copy-and-paste approach to note-taking more valuable and less dangerous. In the pre-digital era, students often took notes on note cards or in small notebooks. They would read a source in the library with a note card or notebook in front of them. They would begin by writing basic bibliographic information on this card or notebook. Then, when they read something interesting, they would write a note on the card or in the notebook, quoting the source if they thought the wording was particularly noteworthy or apt. By the time they came to write their essays, they would have stacks of cards or a series of notebooks, one dedicated to each work or idea.

There are several ways of replicating (and improving on) this method digitally. One way is to use new word-processor files for each source: every time you discover a new source, start a new file in your word-processor, recording the basic information you need to find the source again (URL, title, author, etc.). Then start pasting in your quotations and making your notes in this file. When you are finished you give your file a descriptive name that will help you remember where it came from and save it.

Using your word-processor for this method will be cumbersome (you’ll spend a lot of time opening and closing files), difficult to use when you come to write (in a major essay you might end up with tens of files open on your desktop alongside the new file for your essay), and difficult to oversee (unless you have an excellent naming system, you will end up with a collection of research files with cryptic sounding names of which you have forgotten the significance). And if you can’t remember the specific source of a given quotation or fact, it will be hard to find later without special tools or opening and closing each file.

But other tools exist that allow you to implement this basic method more easily. Citation managers such as Endnote or Refworks, for example, tie notes to bibliographic entries. If you decide to try one of these, you start your entry for a new source (i.e. the equivalent of your paper notebook or note card) by entering it in the bibliographic software (This will also allow you to produce correctly formatted bibliographies and work cited lists quickly and automatically later on when you are ready to hand in your paper in). You then use the “notes” section as the place for pasting quotations and adding comments and notes that you might want to reuse in your paper. There is no problem with naming files (your notes are all stored under the relevant bibliographic entry in a single database), with moving between sources (you call up the each source by the bibliographic reference), and in most cases you will be able to use a built in search function to find passages in your notes if you forget which particular work you read them in.

Bibliographic databases and citation managers are great if all your notes revolve around material from text-based sources. But what if you also need to record observations, evidence, interviews, and the like that cannot easily be tied to specific references? In this case, the best tool may be a private wiki—for example at PbWiki (or if you are computer literate, and have access to a server, a private installation of MediaWiki, the software that runs the Wikipedia).

We tend to think of wikis as being primarily media for the new type of writing that characterises collaborative web applications like the Wikipedia or Facebook. In actual fact, however, wikis have a surprising amount in common with the notebooks or stacks of note cards students used to bring with them to the library. Unlike an entry in citation management software, wiki entry pages are largely free-form space on which you can record arbitrary types of information—a recipe, an image (more accurately a link to an image rather than the image itself), pasted text, bibliographic information, tables of numerical data, and your own annotations and comments on any of the above. As with an index card, you can return to your entry whenever you want in order to add or erase things (though a wiki entry, unlike an index card preserves all your original material as well), or let others comment on. And as with note cards you can shuffle and arrange them in various different ways depending on your needs—using the category feature, you can create groupings that collect all the pages you want to use in a given essay, or that refer to a specific source, or involve a particular topic. Of course unlike notes cards which had to be sorted physically, wiki entries can simultaneously belong to more than one grouping; and because they are stored in a database, you can search your wiki automatically, looking for passages and ideas even if you don’t remember where you saw them.

However you decide to solve this problem, the most important thing is to avoid the habit which is most likely to lead you into (unintentionally) plagiarising from your sources: starting an essay by copying and pasting large passages of direct quotation into the file that you ultimately intend to submit to your instructor. In an essay, unlike a blog, the point is to hear what you have to say.


1 I now take back the claim that this is as insignificant as I thought. In the year-end papers, I found a surprisingly large number of papers with plagiarised passaged in them (five or six out of sixty with perhaps one or two doubtful cases). At the same time, a paper-by-paper review of the originality reports still seems to confirm that one can rely on one’s hunches—I’ve not yet found plagiarism in a paper that didn’t seem right as I was reading it. The larger number of hits is coming from the ability turnitin is giving me to check my hunches more easily and quickly, The pattern I describe above of writing between large quotations and paraphrases still seems to be holding true, however—as is the age or generational difference: my senior students are not nearly as likely to write essays like this.

----  

Back to the future: What digital editors can learn from print editorial practice.

Posted: Feb 09, 2007 18:02;
Last Modified: May 23, 2012 19:05

---

A ersion of this essay was published in literary and Linguistic Computing

Digital Editing and Contemporary Textual Studies

The last decade or so has proven to be a heady time for editors of digital editions. With the maturation of the digital medium and its application to an ever increasing variety of cultural objects, digital scholars have been led to consider their theory and practice in fundamental terms (for a recent collection of essays, see Burnard, O’Keeffe, and Unsworth 2006). The questions they have asked have ranged from the nature of the editorial enterprise to issues of academic economics and politics; from problems of textual theory to questions of mise-en-page and navigation: What is an Edition? What kinds of objects can it contain? How should it be used? Must it be critical? Must it have a reading text? How should it be organised and displayed? Can intellectual responsibility be shared among editors and users? Can it be shared across generations of editors and users? While some of these questions clearly are related to earlier debates in print theory and practice, others involve aspects of the production of editions not relevant to or largely taken for granted by previous generations of print-based editors.

The answers that have developed to these questions at times have involved radical departures from earlier norms1. The flexibility inherent to the electronic medium, for example, has encouraged editors to produce editions that users can manipulate interactively, displaying or suppressing different types of readings, annotation, and editorial approaches, or even navigate in rudimentary three-dimensional virtual reality (e.g. Railton 1998-; Foys 2003; O’Donnell 2005a; Reed Klein 2001; Ó Cróinín nd). The relatively low production, storage, and publication costs associated with digital publication, similarly, have encouraged the development of the archive as the de facto standard of the genre: users of digital editions now expect to have access to all the evidence used by the editors in the construction of their texts (assuming, indeed, that editors actually have provided some kind of mediated text): full text transcriptions, high-quality facsimiles of all known witnesses, and tools for building alternate views of the underlying data (e.g. Kiernan 1999/2003; Robinson 1996). There have been experiments in editing non-textual objects (Foys 2003; Reed-Kline 2001), in producing image-based editions of textual objects (Kiernan 1999/2003), and in recreating digitally aspects of the sensual experience users might have had in consulting the original objects (British Library nd). There have been editions that radically decenter the reading text (e.g. Robinson 1996), and editions that force users to consult their material using an editorially imposed conceit (Reed-Kline 2001). Even elements carried over from traditional print practice have come in for experimentation and redesign: the representation of annotation, glossaries, or textual variation, for example, are rarely the same in any two electronic editions, even in editions published by the same press (see O’Donnell 2005b, § 5)2.

Much of the impetus behind this theoretical and practical experimentation has come from developments in the wider field of textual and editorial scholarship, particularly work of the book historians, new philologists, and social textual critics who came into prominence in the decade preceding the publication of the earliest modern digital editorial projects (e.g. McKenzie 1984/1999; McGann 1983/1992; Cerquiglini 1989; Nicols 1990; for a review see Greetham 1994, 339-343). Despite significant differences in emphasis and detail, these approaches are united by two main characteristics: a broad interest in the editorial representation of variance as a fundamental feature of textual production, transmission, and reception; and opposition to earlier, intentionalist, approaches that privileged the reconstruction of a hypothetical, usually single, authorial text over the many actual texts used and developed by historical authors, scribes, publishers, readers, and scholars. Working largely before the revolution in Humanities Computing brought on by the development of structural markup languages and popularity of the Internet, these scholars nevertheless often expressed themselves in technological terms, calling for changes in the way editions were printed and organised (see, for example, the call for a loose leaf edition of Chaucer in Pearsall 1985) or pointing to the then largely incipient promise of the new digital media for representing texts as multiforms (e.g. McGann 1994; Shillingsburg 1996).

Digital Editing and Print Editorial Tradition

A second, complementary, impetus for this experimentation has been the sense that the digital editorial practice is, or ought to be, fundamentally different from and even opposed to that of print. This view is found to a greater or lesser extent in both early speculative accounts of the coming revolution (e.g. McGann 1994; the essays collected in Finneran 1996 and Landow and Delaney 1993) and subsequent, more sober and experienced discussions of whether digital practice has lived up to its initial promise (e.g. Robinson 2004, 2005, 2006; Karlsson and Malm 2004). It is characterised both by a sense that many intellectual conventions found in print editions are at their root primarily technological in origin, and that the new digital media offer what is in effect a tabula rasa upon which digital editors can develop new and better editorial approaches and conventions to accommodate the problems raised by textual theorists of the 1980s and 1990s.

Of course in some cases, this sense that digital practice is different from print is justified. Organisational models such as the Intellectual Commons or Wiki have no easy equivalent in print publication (O’Donnell Forthcoming). Technological advances in our ability to produce, manipulate, and store images cheaply, likewise, have significantly changed what editors and users expect editions to tell them about the primary sources. The ability to present research interactively has opened up rhetorical possibilities for the representation of textual scholarship difficult or impossible to realise in the printed codex.

But the sense that digital practice is fundamentally different from print has been also at times more reactionary than revolutionary. If digital theorists have been quick to recognise the ways in which some aspects of print editorial theory and practice have been influenced by the technological limitations of the printed page, they have been also at times too quick to see other, more intellectually significant aspects of print practice as technological quirks. Textual criticism in its modern form has a history that is now nearly 450 years old (see Greetham 1994, 313); seen more broadly as a desire to produce “better” texts (however “better” is defined at the moment in question), it has a history stretching back to the end of the sixth century BCE and is “the most ancient of scholarly activities in the West” (Greetham 1994, 297). The development of the critical edition over this period has been as much an intellectual as a technological process. While the limitations of the printed page have undoubtedly dictated the form of many features of the traditional critical edition, centuries of refinement—by trial-and-error as well as outright invention—also have produced conventions that transcend the specific medium for which they were developed. In such cases, digital editors may be able to improve upon these conventions by recognising the (often unexpressed) underlying theory and taking advantage of the superior flexibility and interactivity of the digital medium to improve their representation.

The Critical Text in a Digital Age

Perhaps no area of traditional print editorial practice has come in for more practical and theoretical criticism than the provision of synthetic, stereotypically eclectic, reading texts3. Of course this criticism is not solely the result of developments in the digital medium: suspicion of claims to definitiveness and privilege is, after all, perhaps the most characteristic feature of post-structuralist literary theory. It is the case, however, that digital editors have taken to avoiding the critical text with a gusto that far outstrips that of their print colleagues. It is still not unusual to find a print edition with some kind of critical text; the provision of similarly critical texts in digital editions is far less common. While most digital projects do provide some kind of top-level reading text, few make any strong claims about this text’s definitiveness. More commonly, as in the early ground breaking editions of the Canterbury Tales Project (CTP), the intention of the guide text is, at best, to provide readers with some way of organising the diversity without making any direct claim to authority (Robinson nd):

We began… work [on the CTP] with the intention of trying to recreate a better reading text of the Canterbury Tales. As the work progressed, our aims have changed. Rather than trying to create a better reading text, we now see our aim as helping readers to read these many texts. Thus from what we provide, readers can read the transcripts, examine the manuscripts behind the transcripts, see what different readings are available at any one word, and determine the significance of a particular reading occurring in a particular group of manuscripts. Perhaps this aim is less grand than making a definitive text; but it may also be more useful.

There are some exceptions to this general tendency—both in the form of digital editions that are focussed around the provision of editorially mediated critical texts (e.g. McGillivray 1997; O’Donnell 2005a) and projects, such as the Piers Plowman Electronic Archive (PPEA), that hope ultimately to derive such texts from material collected in their archives. But even here I think it is fair to say that the provision of a synthetic critical text is not what most digital editors consider to be the really interesting thing about their projects. What distinguishes the computer from the codex and makes digital editing such an exciting enterprise is precisely the ability the new medium gives us for collecting, cataloguing, and navigating massive amounts of raw information: transcriptions of every witness, collations of every textual difference, facsimiles of every page of every primary source. Even when the ultimate goal is the production of a critically mediated text, the ability to archive remains distracting4.

In some areas of study, this emphasis on collection over synthesis is perhaps not a bad thing. Texts like Piers Plowman and the Canterbury Tales have such complex textual histories that they rarely have been archived in any form useful to the average scholar; in such cases, indeed, the historical tendency—seen from our post-structuralist perspective—has been towards over-synthesis. In these cases, the most popular previous print editions were put together by editors with strong ideas about the nature of the textual history and/or authorial intentions of the works in question. Their textual histories, too, have tended to be too complex for easy presentation in print format (e.g. Manley and Rickert 1940). Readers with only a passing interest in these texts’ textual history have been encouraged implicitly or explicitly to leave the question in the hands of experts.

The area in which I work, Old English textual studies, has not suffered from this tendency in recent memory, however. Editions of Old English texts historically have tended to be under- rather than over-determined, even in print (Sisam 1993; Lapidge 1994, 1991). In most cases, this is excused by the paucity of surviving witnesses. Most Old English poems (about 97% of the known canon) survive in unique manuscripts (O’Donnell 1996a; Jabbour 1968; Sisam 1953). Even when there is more primary material, Anglo-Saxon editors work in a culture that resists attempts at textual synthesis or interpretation, preferring parallel-text or single-witness manuscript editions whenever feasible and limiting editorial interpretation to the expansion of abbreviations, word-division, and metrical layout, or, in student editions, the occasional normalisation of unusual linguistic and orthographic features (Sisam 1953). One result of this is that print practice in Anglo-Saxon studies over the last century or so has anticipated to a great extent many of the aspects that in other periods distinguish digital editions from their print predecessors.

Cædmon’s Hymn: A Case Study

The scholarly history of Cædmon’s Hymn, a text I have recently edited for the Society of Early English and Norse Electronic Texts series (O’Donnell 2005a), is a perfect example of how this tendency manifests itself in Old English studies. Cædmon’s Hymn is the most textually complicated poem of the Anglo-Saxon period, and, for a variety of historical, literary, and scholarly reasons, among the most important: it is probably the first recorded example of sustained poetry in any Germanic language; it is the only Old English poem for which any detailed account of its contemporary reception survives; and it is found in four recensions and twenty-one medieval manuscripts, a textual history which can be matched in numbers, but not complexity, by only one other vernacular Anglo-Saxon poem (the most recent discussion of these issues is O’Donnell 2005a).

The poem also has been well studied. Semi-diplomatic transcriptions of all known witnesses were published in the 1930s (Dobbie 1937)5. Facsimiles of the earliest manuscripts of the poem (dating from the mid-eighth century) have been available from various sources since the beginning of the twentieth century (e.g. Dobiache-Rojdestvensky 1928) and were supplemented in the early 1990s by a complete collection of high quality black and white photos of all witnesses in Fred C. Robinson and E.G. Stanley ‘s Old English Poems from Many Sources (1991). Articles and books on the poem’s transmission and textual history have appeared quite regularly for over a hundred years. The poem has been at the centre of most debates about the nature of textual transmission in Anglo-Saxon England since at least the 1950s. Taken together, the result of this activity has been the development of an editorial form and history that resembles contemporary digital practice in everything but its medium of production and dissemination. Indeed, in producing a lightly mediated, witness- and facsimile-based archive, constructed over a number of generations by independent groups of scholars, Cædmon’s Hymn textual criticism even anticipates several recent calls for the development of a new digital model for collective, multi-project and multi-generational editorial work (e.g. Ore 2004; Robinson 2005).

The print scholarly history of the poem anticipates contemporary digital practice in another way as well: until recently, Cædmon’s Hymn had never been the subject of a modern critical textual edition. The last century has seen the publication of a couple of student editions of the poem (e.g. Pope and Fulk 2001; Mitchell and Robinson 2001), and some specialised reconstructions of one of the more corrupt recensions (Cavill 2000, O’Donnell 1996b, Smith 1938/1978, Wuest 1906). But there have been no critical works in the last hundred years that have attempted to encapsulate and transmit in textual form what is actually known about the poem’s transmission and recensional history. The closest thing to a standard edition for most of this time has been a parallel text edition of the Hymn by Elliot Van Kirk Dobbie (1942). Unfortunately, in dividing this text into Northumbrian and West-Saxon dialectal recensions, Dobbie produced an edition that ignored his own previous and never renounced work demonstrating that such dialectal divisions were less important that other distinctions that cut across dialectal lines (Dobbie 1937)6.

The Edition as Repository of Expert Knowledge

The problem with this approach—to Cædmon’s Hymn or any other text—should be clear enough. On the one hand the poem’s textual history is, by Anglo-Saxon standards, quite complex and the subject of intense debate by professional textual scholars. On the other, the failure until recently to provide any kind of critical text representing the various positions in the debate has all but hidden the significance of this research—and its implications for work on other aspects of the Hymn_—from the general reader. Instead of being able to take advantage of the expert knowledge acquired by editors and textual scholars of the poem over the last hundred years, readers of _Cædmon’s Hymn instead have been forced either to go back to the raw materials and construct their own texts over and over again or rely on a standard edition that misrepresents its own editor’s considered views of the poem’s textual history.

This is not an efficient use of these readers’ time. As Kevin Kiernan has argued, the textual history of Cædmon’s Hymn is not a spectacle for casual observers (Kiernan 1990), and most people who come to study Cædmon’s Hymn are not interested in collating transcriptions, deciphering facsimiles, and weighing options for grouping the surviving witnesses. What they want is to study the poem’s sources and analogues, its composition and reception, its prosody, language, place in the canon, significance in the development of Anglo-Saxon Christianity, or usefulness as an index in discussions of the position of women in Anglo-Saxon society—that is, all the other things we do with texts when we are not studying their transmission. What these readers want—and certainly what I want when I consult an edition of a work I am studying for reasons other than its textual history—is a text that is accurate, readable, and hopefully based on clearly defined and well-explained criteria. They want, in other words, to be able to take advantage of the expert knowledge of those responsible for putting together the text they are consulting. If they don’t like what they see, or if the approach taken is not what they need for their research, then they may try to find an edition that is better suited to their particular needs. But they will not—except in extreme cases I suspect—actually want to duplicate the effort required to put together a top-quality edition.

The Efficiency of Print Editorial Tradition

The failure of the print editors of Cædmon’s Hymn over the last hundred years to provide a critical-editorial account of their actual knowledge of the poem is very much an exception that proves the rule. For in anticipating digital approaches to textual criticism and editorial practice, textual scholars of Cædmon’s Hymn have, ironically, done a much poorer job of supplying readers with information about their text than the majority of their print-based colleagues have of other texts in other periods.

This is because, as we shall see, the dissemination of expert knowledge is something that print-based editors are generally very good at. At a conceptual level, print approaches developed over the last several hundred years to the arrangement of editorial and bibliographic information in the critical edition form an almost textbook example for the parsimonious organisation of information about texts and witnesses. While there are technological and conventional limitations to the way this information can be used and presented in codex form, digital scholars would be hard pressed to come up with a theoretically more sophisticated or efficient organisation for the underlying data.

Normalisation and Relational Database Design

Demonstrating the efficiency of traditional print practice requires us to make a brief excursion into questions of relational database theory and design7. In designing a relational database, the goal is to generate a set of relationship schemas that allow us to store information without unnecessary redundancy but in a form that is easily retrievable (Silberschatz, Korth, and Sudarshan 2006, 263). The relational model organises information into two-dimensional tables, each row of which represents a relationship among associated bits of information. Complex data commonly requires the use of more than one set of relations or tables. The key thing is to avoid complex redundancies: in a well designed relational database, no piece of information that logically follows from any other should appear more than once8.

The process used to eliminate redundancies and dependencies is known as normalisation. When data has been organised so that it is free of all such inefficiencies, it is usually said to be in third normal form. How one goes about doing this can be best seen through an example. The following is an invoice from a hypothetical book store (adapted from Krishna 1992, 32):

Invoice: JJSmith0001
Customer ID: JJS01
Name: Jane J. Smith
Address: 323 Fifteenth Street S., Lethbridge, Alberta T1K 5X3.
ISBN Author Title Price Quantity Item Total
0-670-03151-8 Pinker, Stephen The Blank Slate: The Modern Denial of Human Nature $35.00 1 $35.00
0-8122-3745-5 Burrus, Virginia The Sex Lives of Saints: An Erotics of Ancient Hagiography $25.00 2 $50.00
0-7136-0389-5 Dix, Dom Gregory The Shape of the Liturgy $55.00 1 $55.00
Grand Total $140.00

Describing the information in this case in relational terms is a three step process. The first step involves identifying what is that is to be included in the data model by extracting database field names from the document’s structure. In the following, parentheses are used to indicate information that can occur more than once on a single invoice:

Invoice: invoice_number, customer_id, customer_name, customer_address, (ISBN, author, title, price, quantity, item_total), grand_total

The second step involves extracting fields that contain repeating information and placing them in a separate table. In this case, the repeating information involves bibliographical information about the actual books sold (ISBN, author, title, price, quantity, item_total). The connection between this new table and the invoice table made explicit through the addition of an invoice_number key that allows each book to be associated with a specific invoice9:

Invoice: invoice_number, customer_id, customer_name, customer_address, grand_total

Invoice_Item: invoice_number, ISBN, author, title, price, quantity, item_total

The final step involves removing functional dependencies within these two tables. In this database, for example, information about a book’s author, title and item_price are functionally dependent on its ISBN: for each ISBN, there is only one possible author, title, and item_price. Likewise customer_id is associated with only one customer_name and customer_address. These dependencies are eliminated by placing the dependent material in two new tables, Customer and Book, which are linked to rest of the data by the customer_id and ISBN keys respectively.

At this point the data is said to be in third normal form: we have four sets of relations, none of which can be broken down any further:

Invoice: invoice_number, customer_id, grand_total

Invoice_Item: invoice_number, ISBN, quantity, item_total

Customer: customer_id, customer_name, customer_address

Book: ISBN, author, title, price

Normalising Editorial Data

The normalisation process becomes interesting when one applies it to the type of information editors commonly collect about textual witnesses. The following, for example, is a simplified version of a sheet I used to record basic information about each manuscript witness to Cædmon’s Hymn:

Shelf-Mark: B1 Cambridge, Corpus Christi College 41
Date: s. xi-1
Scribe: Second scribe of the main Old English text.
Location: Copied as part of the main text of the Old English translation of the Historia ecclesiastica (p. 332 [f. 161v]. line 6)
Recension: West-Saxon eorðan recension
Text: Nuweherigan sculon

heofonrices weard metodes mihte

&hismod ge þanc weorc wuldor godes

[etc]

From the point of view of the database designer, this sheet has what are essentially fields for the manuscript sigil, date, scribe, location, and, of course, the text of the poem in the witness itself, something that can be seen, on analogy with our book store invoice, as itself a repeating set of (largely implicit) information: manuscript forms, normalised readings, grammatical and lexical information, metrical position, relationship to canonical referencing systems, and the like.

As with the invoice from our hypothetical bookstore, it is possible to place this data in normal form. The first step, once again, is to extract the relevant relations from the manuscript sheet and, in this case, the often unstated expert knowledge an editor typically brings to his or her task. This leads at the very least to the following set of relations10:

Manuscript: shelf_mark, date, scribe, location, (ms_instance, canonical_reading, dictionary_form, grammatical_information, translation)

Extracting the repeating information about individual readings, leaves us with two tables linked by the key shelf_mark:

Manuscript: shelf_mark, date, scribe, location
bq(code). Text: shelf_mark, ms_instance, canonical_reading,
bq(code). dictionary_form, grammatical_information, translation

And placing the material in third normal form generates at least one more:

Manuscript: shelf_mark, date, scribe, location

Text: shelf_mark, ms_instance, canonical_reading

Glossary: canonical_reading, dictionary_form, grammatical_information, translation

At this point, we have organised our data in its most efficient format. With the exception of the shelf_mark and canonical_reading keys, no piece of information is repeated in more than one table, and all functional dependencies have been eliminated. Of course in real life, there would be many more tables, and even then it would be probably impossible—and certainly not cost effective—to treat all editorial knowledge about a given text as normalisable data.

What is significant about this arrangement, however, is the extent to which our final set of tables reflects the traditional arrangements of information in a stereotypical print edition: a section up front with bibliographic (and other) information about the text and associated witnesses; a section in the middle relating manuscript readings to editorially privileged forms; and a section at the end containing abstract lexical and grammatical information about words in the text. Moreover, although familiarity and the use of narrative can obscure this fact in practice, much of the information contained in these traditional sections of a print edition actually is in implicitly tabular form: in structural terms, a glossary are best understood as the functional equivalent of a highly structured list or table row, with information presented in a fixed order from entry to entry. Bibliographical discussions, too, often consist of what are in effect, highly structured lists that can easily be converted to tabular format: one cell for shelf-mark, another for related bibliography, provenance, contents, and the like11.

Database Views and the Critical Text

This analogy between the traditional arrangement of editorial matter in print editions and normalised data in a relational database seems to break down, however, in one key location: the representation of the abstract text. For while it is possible to see the how the other sections of a print critical edition might be rendered in tabular form, the critical text itself—the place where editors present an actual reading as a result of their efforts—is not usually presented in anything resembling the non-hierarchical, tabular form a relational model would lead us to expect. In fact, the essential point of the editorial text—and indeed the reason it comes in for criticism from post-structuralists—is that it eliminates non-hierarchical choice. In constructing a reading text, print editors impose order on the mass of textual evidence by privileging individual readings at each collation point. All other forms—the material that would make up the Text table in a relational database—is either hidden from the reader or relegated, and even then usually only as a sample, to appearance in small type at the bottom of the page in the critical apparatus. Although it is the defining feature of the print critical edition, the critical text itself would appear to be the only part that is not directly part of the underlying, and extremely efficient, relational data model developed by print editors through the centuries.

But this does not invalidate my larger argument, because we build databases precisely in order to acquire this ability to select and organise data. If the critical text in a print edition is not actually a database table, it is a database view—that is to say a “window on the database through which data required for a particular user or application can be accessed” (Krishna 1992, 210). In computer database management systems, views are built by querying the underlying data and building new relations that contain one or more answers from the results. In print editorial practice, editors build critical texts by “querying” their knowledge of textual data at each collation point in a way that produces a single editorial reading. In this understanding, a typical student edition of a medieval or classical text might be understood as a database view built on the query “select the manuscript or normalised reading at each collation point that most closely matches paradigmatic forms in standard primers.” A modern-spelling edition of Shakespeare can be understood as the view resulting from a database query that instructs the processor to replace Renaissance spellings for the selected forms with their modern equivalents. And an edition like the Kane-Donaldson Piers Plowman can be understood as a view built on basis of a far more complex query derived from the editors’ research on metre, textual history, and scribal practice. Even editorial emendations are, in this sense, simply the result of a query that requests forms from an unstated “normalised/emended equivalent” column in the editors’ intellectual understanding of the underlying textual evidence: “select readings from the database according to criteria x; if the resulting form is problematic, substitute the form found in the normalised/emended_equivalent column.”12.

How Digital Editors can Improve on Print Practice

If this understanding of the critical text and its relationship to the data model underlying print critical practice is correct, then digital editors can almost certainly improve upon it. One obvious place to start might seem to lie in the formalising and automating the process by which print editors process and query the data upon which their editions are based. Such an approach, indeed, would have two main advantages: it would allow us to test others’ editorial approaches by modelling them programatically; and it would allows us to take advantage of the inherent flexibility of the digital medium by providing users with access to limitless critical texts of the same work. Where, for economic and technological reasons, print editions tend to offer readers only a single critical approach and text, digital editions could now offer readings a series of possible approaches and texts built according to various selection criteria. In this approach, users would read texts either by building their own textual queries, or by selecting pre-made queries that build views by dynamically modelling the decisions of others—a Kane-Donaldson view of Piers Plowman, perhaps, or a Gabler reading text view of Ulysses.

This is an area of research we should pursue, even though, in actual practice, we are still a long way from being able to build anything but the simplest of texts in this manner. Certain processes can, of course, be automated and even improved upon electronically—we can use computers to collate readings from different witnesses, derive manuscript stemma, automatically normalise punctuation and spelling, and even model scribal performance (see Ciula 2005; O’Donnell 2005c). And it is easy to see how it we might be able to build databases and queries so that we could model human editorial decisions in relatively simple cases—reproducing the flawed dialectal texts of Cædmon’s Hymn discussed above, perhaps, or building simple student editions of small poems.

Unfortunately, such conceptually simple tasks are still at the extreme outer limits of what it is currently possible, let alone economically reasonable, to do. Going beyond this and learning to automate higher-level critical decisions involving cultural, historical, or literary distinctions, is beyond the realm of current database design and artificial intelligence even for people working in fields vastly better funded than textual scholarship. Thus, while it would be a fairly trivial process to generate a reading text based on a single witness from an underlying relational database, building automatically a best text edition—that is to say, an edition in which a single witness is singled out automatically for reproduction on the basis of some higher-level criteria—is still beyond our current capabilities. Automating other distinctions of the type made every day by human editors—distinguishing between good and bad scribes, assessing difficilior vs. facilior readings, or weighing competing evidence of authorial authorisation—belong as yet to the realm of science fiction.13.

This doesn’t let us off the hook, however. For while we are still far away from being able to truly automate our digital textual editions, and we do need to find some way of incorporating expert knowledge into digital editions that are becoming ever more complex. The more evidence we cram into our digital editions, the harder it becomes for readers to make anything of them. No two witnesses to any text are equally reliable, authentic, or useful for all purposes at all times. In the absence of a system that can build custom editions in response to naïve queries—“build me a general interest text of Don Juan”, “eliminate unreliable scribes”, or even “build me a student edition“—digital editors still need to provide readers with explicit expert guidance as to how the at times conflicting data in their editions is to be assessed. In some cases, it is possible to use hierarchical and object-oriented data models to encode these human judgements so that they can be generated dynamically (see note 14 above). In other cases, digital editors, like their print predecessors, will simply have to build critical texts of their editions the old fashioned way, by hand, or run the risk or failing to pass on the expert knowledge they have built up over years of scholarly engagement with the primary sources.

It is here, however, that digital editors can improve theoretically and practically the most on traditional print practice. For if critical reading texts are, conceptually understood, the equivalent of query-derived database views, then there is no reason why readers of critical editions should not be able to entertain multiple views of the underlying data. Critical texts, in other words—as post-structuralist theory has told us all along—really are neither right nor wrong: they are simply views of a textual history constructed according to different, more or less explicit, selection criteria. In the print world, economic necessity and technological rigidity imposed constraints on the number of different views editors could reasonably present to their readers—and encouraged them in pre post-structuralist days to see the production of a single definitive critical text as the primary purpose of their editions. Digital editors, on the other hand, have the advantage of a medium that allows the inclusion much more easily of multiple critical views, a technology in which the relationship between views and data is widely known and accepted, and a theoretical climate that encourages an attention to variance. If we are still far from being at the stage in which we can produce critical views of our data using dynamic searches, we are able even now to hard-code such views into our editions in unobtrusive and user-friendly ways.14. By taking advantage of the superior flexibility inherent in our technology and the existence of a formal theory that now explains conceptually what print editors appear to have discovered by experience and tradition, we can improve upon print editorial practice by extending it to the point that it begins to subvert the very claims to definitiveness we now find so suspicious. By being more like our print predecessors, by ensuring that our expert knowledge is carefully and systematically encoded in our texts, we can, ironically, use the digital medium to offer our readers a greater flexibility in how they use our work.

Conclusion

And so in the end, the future of digital editing may lie more in our past than we commonly like to consider. While digital editorial theory has tended to define its project largely in reaction to previous print practice, this approach underestimates both the strength of the foundation we have been given to build upon and the true significance of our new medium. For the exciting thing about digital editing is not that it can do everything differently, but rather that it can do some very important things better. Over the course of the last half millennium, print editorial practice has evolved an extremely efficient intellectual model for the organisation of information about texts and witnesses—even as, in the last fifty years, we have become increasingly suspicious of the claims to definitiveness this organisation was often taken to imply. As digital editors, we can improve upon the work of our predecessors by first of all recognising and formalising the intellectual strength of the traditional editorial model and secondly reconciling it to post-structuralist interest in variation and change by implementing it far more fully and flexibly than print editors themselves could ever imagine. The question we need to answer, then, is not whether we can do things differently but how doing things differently can improve on current practice. But we won’t be able to answer this question until we recognise what current practice already does very very well.

Works Cited

Bart, Patricia R. 2006. Controlled experimental markup in a TEI-conformant setting. Digital Medievalist 2.1 <http://www.digitalmedievalist.org/article.cfm?RecID=10>.

British Library, nd. Turning the Pages. <http://www.bl.uk/onlinegallery/ttp/ttpbooks.html>.

Cavill, Paul. 2000. The manuscripts of Cædmon’s Hymn. Anglia 118: 499-530.

Cerquiglini, Bernard. 1989. Éloge de la variante: Histoire critique de la philologie. Paris: Éditions de Seuil.

Ciula, Arianna. 2005. Digital palaeography: Using the digital representation of medieval script to support palaeographic analysis. Digital Medievalist 1.1 <http://www.digitalmedievalist.org/article.cfm?RecID=2>

Dobbie, Elliott Van Kirk. 1937. The manuscripts of Cædmon’s Hymn and Bede’s Death Song with a critical text of the Epistola Cuthberti de obitu Bedæ. Columbia University Studies in English and Comparative Literature, 128. New York: Columbia University Press.

───, ed. 1942. The Anglo-Saxon minor poems. The Anglo-Saxon Poetic Records, a Collective Edition, 6. New York: Columbia University Press.

Dobiache-Rojdestvensky, O. 1928. Un manuscrit de Bède à Léningrad. Speculum 3: 314-21.

Finneran, Richard J., ed. 1996. The literary text in the digital age. Ann Arbor: University of Michigan Press.

Foys, Martin K., ed. 2003. The Bayeux Tapestry: Digital Edition. Leicester: SDE.

Greetham, D.C. 1994. Textual Scholarship. New York: Garland.

Jabbour, A. A. 1968. The memorial transmission of Old English poetry: a study of the extant parallel texts. Unpublished PhD dissertation, Duke University.

Karlsson, Lina and Linda Malm. 2004. Revolution or remediation? A study of electronic scholarly editions on the web. HumanIT 7: 1-46.

Kiernan, Kevin S. 1990 Reading Cædmon’s Hymn with someone else’s glosses. Representations 32: 157-74.

───, ed. 1999/2003. The electronic Beowulf. Second edition. London: British Library.

Krishna, S. 1992. Introduction to database and knowledge-base systems. Singapore: World Scientific.

Landow, George P. and Paul Delaney, eds. 1993. The digital word: text-based computing in the humanities. Cambridge, MA, MIT Press.

Lapidge, Michael. 1991. Textual criticism and the literature of Anglo-Saxon England. Bulletin of the John Rylands University Library. 73:17-45.

───. 1994. On the emendation of Old English texts. Pp. 53-67 in: D.G. Scragg and Paul Szarmach (ed.), The editing of Old English: Papers from the 1990 Manchester conference.

Manly, John M. and Edith Rickert. 1940. The text of the Canterbury tales. Chicago: University of Chicago Press.

McGann, Jerome J. 1983/1992. A critique of modern textual criticism. Charlottesville: University of Virginia Press.

───. 1994. Rationale of the hypertext. <http://www/iath.virginia.edu/public/jjm2f/rationale.htm>

McGillivray, Murray, ed. 1997. Geoffrey Chaucer’s Book of the Duchess: A hypertext edition. Calgary: University of Calgary Press.

McKenzie, D.F. 1984/1999. Bibliography and the sociology of texts. Cambridge: Cambridge University Press.

Mitchell, Bruce and Fred C. Robinson, eds. 2001. A guide to Old English. 6th ed. Oxford: Blackwell.

Nicols, Stephen G. Jr., ed. 1990. Speculum 65.

Ó Cróinín, Dáibhí. nd. The Foundations of Irish Culture AD 600-850. Website. <http://www.foundationsirishculture.ie/>.

O’Donnell, Daniel Paul. 1996a. Manuscript Variation in Multiple-Recension Old English Poetic Texts: The Technical Problem and Poetical Art. Unpubl. PhD Dissertation. Yale University.

───. 1996b. A Northumbrian version of “Cædmon’s Hymn” (eordu recension) in Brussels, Bibliothèque Royale MS 8245-57 ff. 62r2-v1: Identification, edition and filiation. Beda venerabilis: Historian, monk and Northumbrian, eds. L. A. J. R. Houwen and A. A. MacDonald. Mediaevalia Groningana, 19. 139-65. Groningen: Egbert Forsten.

───. 2005a. Cædmon’s Hymn: A multimedia study, edition, and archive. SEENET A.8. Cambridge: D.S. Brewer.

───. 2005b. O Captain! My Captain! Using Technology to Guide Readers Through an Electronic Edition. Heroic Age 8. <http://www.heroicage.org/issues/8/em.html>

───. 2005c. The ghost in the machine: Revisiting an old model for the dynamic generation of digital editions. HumanIT 8 (2005): 51-71.

───. Forthcoming. If I were “You”: How Academics Can Stop Worrying and Learn to Love “the Encyclopedia that Anyone Can Edit.” Heroic Age 10.

Ore, Espen S. 2004. Monkey Business—or What is an Edition? Literary and Linguistic Computing 19: 35-44.

Pearsall, Derek. 1985. Editing medieval texts. Pp. 92-106 in Textual criticism and literary interpretation. Ed. Jerome J. McGann. Chicago: U Chicago.

Pope, John C. and R. D. Fulk, eds. 2001. Eight Old English poems. 3rd ed. New York: W. W. Norton.

Railton, Stephen, ed. 1998-. Uncle Tom’s Cabin and American Culture. Charlottesville: University of Virginia. Institute for Advanced Technology in the Humanities. <http://www.iath.virginia.edu/utc/>.

Reed Kline, Naomi, ed. 2001. A Wheel of Memory: The Hereford Mappamundi. Ann Arbor: University of Michigan Press

Robinson, Fred C. and E. G. Stanley, eds. 1991. Old English verse texts from many sources: a comprehensive collection. Early English Manuscripts in Facsimile, 23. Copenhagen: Rosenkilde & Bagger.

Robinson, Peter. nd. New Methods of Editing, Exploring, and Reading the Canterbury Tales. <http://www.cta.dmu.ac.uk/projects/ctp/desc2.html>.

───, ed. 1996. The Wife of Bath’s Prologue on CD-ROM. Cambridge, Cambridge University Press.

───. 2004. Where are we with electronic scholarly editions, and where to we want to be? Jahrbuch für Computerphilologie Online at <http://computerphilologie.uni-muenchen.de/ejournal.html>. Also available in print: Jahrbuch für Computerphilologie. 123-143.

───. 2005. Current issues in making digital editions of medieval texts—or, do electronic scholarly editions have a future? Digital Medievalist 1.1 <http://www.digitalmedievalist.org/article.cfm?RecID=6>

───. 2006. The Canterbury Tales and other medieval texts. In Burnard, O’Brian O’Keefe, and Unsworth. New York: Modern Language Association of America.

Shillingsburg, Peter L. 1996 Electronic editions. Scholarly editing in the computer age: Theory and practice. Third edition.

Silberschatz, Avi, Hank Korth, and S. Sudarshan. 2006. Database system concepts. New York: McGraw-Hill.

Sisam, Kenneth. 1953. Studies in the history of Old English literature. Oxford: Clarendon Press.

Smith, A.H., ed. 1938/1978. Three Northumbrian poems: Cædmon’s Hymn, Bede’s Death Song, and the Leiden Riddle. With a bibliography compiled by M. J. Swanton. Revised ed. Exeter Medieval English Texts. Exeter: University of Exeter Press.

Wuest, Paul. 1906. Zwei neue Handschriften von Cædmons Hymnus. ZfdA 48: 205-26.

Notes

1 In a report covering most extant, web-based scholarly editions published in or before 2002, Lina Karlsson and Linda Malm suggest that most digital editors up to that point had made relatively little use of the medium’s distinguishing features: “The conclusion of the study is that web editions seem to reproduce features of the printed media and do not fulfil the potential of the Web to any larger extent” (2004 abstract).

2 As this list suggests, my primary experience with actual practice is with digital editions of medieval texts. Recent theoretical and practical discussions, however, suggest that little difference is to be found in electronic texts covering other periods.

3 Synthetic here is not quite synonymous with eclectic as used to describe the approach of the Gregg-Bower’s school of textual criticism. Traditionally, an eclectic text is a single, hypothetical, textual reconstruction (usually of the presumed Authorial text) based on assumption of divided authority. In this approach, a copy text is used to supply accidental details of spelling and punctuation and (usually) to serve as a default source for substantive readings that affect the meaning of the abstract artistic work. Readings from this copy text are then corrected by emendation or, preferably, from forms found in other historical witnesses. In this essay, synthetic is used to refer to a critical text that attempts to summarise in textual form an editorial position about an abstract work’s development at some point in its textual history. All eclectic texts are therefore synthetic, but not all synthetic texts are eclectic: a best text (single witness) edition is also synthetic if, as the name implies, an editorial claim is being made about the particular reliability, historical importance, or interest of the text as represented in the chosen witness. A diplomatic transcription, however, is not synthetic: the focus there is on reporting the details of a given witness as accurately as possible. For a primer on basic concepts in textual editing, excluding the concept of the synthetic text as discussed here, see Greetham 1994.

4 It is indeed significant that the PPEA —the most ambitious digital critical edition of a medieval text that I am aware of—is at this stage in its development publishing primarily as an archive: the development of critical texts of the A-, B-, and C-text traditions has been deferred until after the publication of individual edition/facsimiles of the known witnesses (Bart 2006).

5 Transcriptions, editions, facsimiles, and studies mentioned in this paragraph in many cases have been superseded by subsequent work; readers interested in the current state of Cædmon’s Hymn should begin with the bibliography in O’Donnell 2005a.

6 While there is reason to doubt the details of Dobbie’s recensional division, his fundamental conclusion that dialect did not play a crucial role in the poem’s textual development remains undisputed. For recent (competing) discussions of the Hymn’s transmission, see O’Donnell 2005a and Cavill 2000.

7 There are other types of databases, some of which are at times more suited to representation of information encoded in structural markup languages such as XML, and to the type of manipulation common in textual critical studies (see below, note 14). None of these other models, however, express information as parsimoniously as does the relational model (see Silberschatz, Korth, and Sudarshan 2006, 362-365).

8 This is a rough rather than a formal definition. Formally, a well-designed relational database normally should be in either third normal form or Boyce-Codd normal form (BCNF). A relation is said to be in third normal form when a) the domains of all attributes are atomic, and b) all non-key attributes are fully dependent on the key attributes (see Krishna 1992, 37). A relation is said to be in BCNF if whenever a non-trivial functional dependency → A holds in R, X is a superkey for R (Krishna 1992, 38). Other normal forms exist for special kinds of dependencies (Silbertschatz, Korth, Sudarshan 2006, 293-298).

9 In actual fact, the model for a real bookstore invoice would be more complex, since the example here does not take into account the possibility that there might be more than one copy of any ISBN in stock. A real bookstore would need additional tables to allow it to keep track of inventory.

10 In actual practice, the model would be far more complex and include multiple levels of repeating information (words within lines and relationships to canonical reference systems, for example). This example also assumes that the word is the basic unit of collation; while this works well for most Old English poetry, it may not for other types of literature.

11 Of course, critical editions typically contain far more than bibliographic, textual, and lexical/grammatical information. This too can be modelled relationally, however, although it would be quixotic to attempt to account for the infinite range of possible material one might include in a critical edition in this essay. Thus cultural information about a given text or witnesses is functionally dependent on the specific text or witness in question. Interestingly, the more complex the argumentation becomes, the less complex the underlying data model appears to be: a biographical essay on a text’s author, for example, might take up but a single cell in one of our hypothetical tables.

12 The critical apparatus in most print and many digital editions is itself also usually a view of an implicit textual database, rather than the database itself. Although it usually is presented in quasi-tabular form, it rarely contains a complete accounting for every form in the text’s witness base.

13 This is not to say that it is impossible to use data modelling to account for these distinctions—simply that we are far from being able to derive them arbitrarily from two dimensional relational databases, however complex. Other data models, such as hierarchical or object-oriented databases can be used to build such distinctions into the data itself, though this by definition involves the application of expert knowledge. In O’Donnell 2005a, for example, the textual apparatus is encoded as a hierarchical database. This allows readers to in effect query the database, searching for relations pre-defined as significant, substantive, or orthographic by the editor. See O’Donnell 2005a, §§ ii.7, ii.19, 7.2-9.

14 In the case of my edition of Cædmon’s Hymn, this takes the form of multiple critical texts and apparatus: several reconstructions of the poem’s archetypal form, and various critical views of the poem’s five main recensions and collations. The criteria used to construct these views is indicated explicitly in the title of each page and explained in detail in the editorial introductions. The individual editions were extracted from an SGML encoded text using stylesheets—in essence hard-wired database queries reflecting higher-level editorial decisions—but presented to the reader as a series of progressively abstract views. In keeping with the developing standard for digital textual editions, the edition also allows users direct access to the underlying transcriptions and facsimiles upon which it is based. The result is an edition that attempts to combine the best of the digital and print worlds: the archiving function common to most electronic editions (and traditionally the focus of Cædmon’s Hymn textual research in print), with the emphasis on the presentation of expert knowledge characteristic of traditional print editorial practice.

----  

If I were “You”: How Academics Can Stop Worrying and Learn to Love “the Encyclopedia that Anyone Can Edit”

Posted: Feb 02, 2007 22:02;
Last Modified: May 23, 2012 19:05

---

Original Publication Information: Forthcoming in Heroic Age (2007). http://www.heroicage.org/.

Time Magazine and the Participatory Web

So now it is official: Time magazine thinks the Wikipedia is here to stay.

In its December 2006 issue, Time named “You” as its “Person of the Year” (Grossman 2006). But it didn’t really mean “you“—either the pronoun or the person reading this article. It meant “us“—members of the participatory web, the “Web 2.0,” the community behind YouTube, FaceBook, MySpace, WordPress,… and of course the Wikipedia.

In its citation, Time praised its person of the year “for seizing the reins of the global media, for founding and framing the new digital democracy, for working for nothing and beating the pros at their own game.” It suggested that the new web represented

an opportunity to build a new kind of international understanding, not politician to politician, great man to great man, but citizen to citizen, person to person.

Actually, as this suggests, Time didn’t really mean “us” either. At least not if by “us” we mean the professional scholars, journalists, authors, and television producers (that is to say the “pros”) who used to have more-or-less sole responsibility for producing the content “you” (that is to say students, readers, and audiences) consumed. In fact, as the citation makes clear, Time actually sees the new web as being really a case of “you” against “us“—a rebellion of the amateurs that has come at the expense of the traditional experts:

It’s a story about community and collaboration on a scale never seen before. It’s about the cosmic compendium of knowledge Wikipedia and the million-channel people’s network YouTube and the online metropolis MySpace. It’s about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes.

Academic Resistance

This sense that the participatory web represents a storming of the informational Bastille is shared by many scholars in our dealings with the representative that most closely touches on our professional lives—the Wikipedia, “the encyclopedia that anyone can edit”. University instructors (and even whole departments) commonly forbid students from citing the Wikipedia in their work (Fung 2007). Praising it on an academic listserv is still a reliable way of provoking a fight. Wikipedia founder Jimmy Wales’s suggestion that college students should not cite encyclopaedias, including his own, as a source in their work is gleefully misrepresented in academic trade magazines and blogs (e.g. Wired Campus 2006).

And none of this is having any effect. Eighteen months ago, I had yet to see a citation from the Wikipedia in a student’s essay. This past term, it was rare to find a paper that did not cite it and several of my students asked for permission to research and write new entries for the Wikipedia instead of submitting traditional papers. Other elements of the participatory web mentioned by Time are proving equally successful: politicians, car companies, and Hollywood types now regularly publish material on YouTube or MySpace alongside or in preference to traditional media channels. This past summer, the story of LonelyGirl15 and her doomed relationship to DanielBeast on YouTube became what might be described as the first “hit series” to emerge from the new medium: it attracted millions of viewers on-line, was discussed in major newspapers, and, after it was revealed to be a “hoax” (it was scripted and produced using professional writers, actors, and technicians), its “star” made the requisite appearance on Jay Leno’s Tonight show (see LonelyGirl15).

Why the Participatory Web Works

The participatory web is growing so quickly in popularity because it is proving to be a remarkably robust model. Experiments with the Wikipedia have shown that deliberately planted false information can be corrected within hours (Read 2006). A widely cited comparison of select articles in the Wikipedia and the Encyclopaedia Britannica by the journal Nature showed that the Wikipedia was far more accurate than many had suspected: in the forty-two articles surveyed, the Wikipedia was found to have an average of four mistakes per article to Britannica’s three (Giles 2006). In fact even just Googling web pages can produce surprisingly useful research results—a recent study showed that diagnoses of difficult illness built by entering information about the symptoms into the search engine Google were accurate 60% of the time (Tang and Hwee Kwoon Ng 2006). In some circumstances, the participatory web actually may prove to be more useful than older methods of professional content creation and dissemination: an article in the Washington Post recently discussed how the United States intelligence community is attempting to use blogs and wikis to improve the speed and quality of information reported to analysts, agents, and decision-makers (Ahrens 2006).

Why Don’t We Like It

Given this popularity and evidence of effectiveness both as a channel of distribution and a source of reasonably accurate and self-correcting information, the academic community’s opposition to the Wikipedia may come at first as something of a surprise. What is it that makes “us” so dislike “you”?

One answer is that the Wikipedia and other manifestations of the participatory web do not fit very well with contemporary academic models for quality control and professional advancement. Professional academics today expect quality scholarship to be peer-reviewed and contain a clear account of intellectual responsibility. Authorship attributions are commonly found with forms of intellectual labour, such as book reviews and encyclopaedia entries, that were published without attribution as little as fifty years ago. Some scholarly journals are naming referees who recommend acceptance; readers for journals that have traditionally used anonymous reviews are frequently asking for their names to be revealed.

This emphasis on review and responsibility has obvious advantages. While peer-review is far from a perfect system—there have been enough hoaxes and frauds across the disciplines in the last twenty years to demonstrate its fallibility—it is surely better than self-publication: I imagine most scholars benefit most of the time from the comments of their readers. In my experience, the interest of good acquisition and copy-editor invariably improves the quality of a final draft.

Moreover, peer-review and clear attribution have an important role in the academic economy: they are the main (and usually only) currency with which researchers are paid by the presses and journals that publish them. In the professional academe, our worth as scholars depends very much on where our work appears. A long article in a top journal or a monograph published at a major University press is evidence that our research is regarded highly. Articles in lesser journals, or lesser forms of dissemination such as book reviews, conference papers, and encyclopaedia entries published under our names are less important but can still be used as evidence of on-going professional activity (see, for example, Department of English, University of Colorado [2007]). While it is not quite money in the bank, this transference of prestige and recognition is an important element in most universities’ systems for determining rank and pay.

An article in the Wikipedia is not going to get anybody tenure. Because they are written collectively and published anonymously, Wikipedia articles do not highlight the specific intellectual contributions of individual contributors—although, in contrast to medical and scientific journals with their perennial problem of “co-authors” who lend names to articles without actually contributing any research (for a discussion of one example, see Bails 2006), it is possible to trace specific intellectual responsibility for all contributions to any entry in the Wikipedia using the history and compare features. And while the Wikipedia does have a formal certification process—articles can be submitted for “peer-review” and selected for “feature” status—this process is optional and not very selective: authors or readers nominate articles for peer-review and certification after they have already been published to the web and the reviewing body consists of simply those interested users who happen to notice that an article has been put forward for review and are willing to comment on the relevant discussion page (see Wikipedia: Peer Review). While this body might include respected experts in the field, it also certainly includes amateurs whose main interest is the Wikipedia itself. It also, almost equally certainly, includes people whose knowledge of the topic in question is ill-informed or belongs to the lunatic fringe.

Why We Can’t Do Anything About It

Given these objections, it is not surprising that some of us in the professional academic community are trying to come up with some alternatives—sites that combine desirable aspects of the Wikipedia model (such as its openness to amateur participation) with other elements (such a expert-review and editorial control) taken from the world of the professional academy. One new project that attempts to do this is the Citizendium, a project which, beginning as a fork (i.e. branch) of the original Wikipedia, intends to bring it under more strict editorial control: in this project, “Editors“—contributors with advanced degrees—are to be recruited to serve as area experts and help resolve disputes among contributors while “Constables“—“a set of persons of mature judgment“—will be “specially empowered to enforce rules,… up to and including the ejection of participants from the project” (Citizendium 2006). Other, though far more specialised, attempts to merge the openness of wiki-based software with more strict editorial control and peer-review are also increasingly being proposed by scholarly projects and commercial scholarly publishers.

Few if any of these projects are likely to succeed all that well. While the addition of formal editorial control and an expert-based certification system brings their organisation more closely into line with traditional academic expectations, the economics remain suspect. On the one hand, such projects will find it difficult to generate enough prestige from their peer-review process to compete for the best efforts of professional scholars with more traditional, invitation-only, encyclopaedias such as the Britannica or collections published by the prestigious academic presses. On the other hand, they are also unlikely to be able to match the speed and breadth of content-development found at more free-wheeling, community-developed projects of the participatory web.

In fact, the Wikipedia itself is the successful offshoot of a failed project of exactly this sort. The ancestor of the Wikipedia was the Nupedia, an on-line, open-source (though non-wiki) project whose goal was to develop an on-line, peer-reviewed and professionally written encyclopaedia (see History of Wikipedia, Nupedia, Wikipedia, and Sanger 2005). The editorial board was subject to strict review and most participants were expected to have a Ph.D. or equivalent. The review process involved seven steps: five analogous to those found traditional academic publishing (assigning to an editor, finding a reader, submitting for review, copy-editing, and final pre-publication approval) and two borrowed from the world of open source software (a public call for reviews, and a public round of copy-editing). Begun in March 2000, the project ultimately collapsed in September 2003 due to a lack of participation, slow time-to-publication, and conflicts between professional contributors and editors and members of the public in the open review and copy-editing parts of the review process. In its relatively brief existence, the project managed to approve only twenty-four peer-reviewed articles for publication. At its suspension, seventy-four were still in various stages of review. After the project as a whole was suspended, the successful articles were rolled into the Wikipedia. Relatively few can be found in their original form today.

The Wikipedia was originally established as a support structure for the Nupedia’s open processes—as a place where participants in the larger project could collaborate in the creation of material for the “official” project and contribute to their review and copy-editing. The wiki-based project was proposed on the Nupedia’s mailing list on January 2, 2001 and rejected almost immediately by participants for much the same reasons it is frowned upon by professional academics today. It was reestablished as a separate project with its own domain name by January 10. Almost immediately, it began to best its “mother” project: within a year the Wikipedia had published 20,000 articles and existed in 18 different languages; by the Nupedia’s suspension in the fall of 2003, the Wikipedia had published 152,000 articles in English and was found in twenty-six different languages (Multilingual Statistics). By October 30th, 2006, there were over 1.4 million articles in English alone.

The contrasting fates of the Nupedia and the Wikipedia illustrate the central problem that faces any attempt to impose traditional academic structures on projects designed for the participatory web: the strengths and weaknesses of wiki-based and traditional academic models are almost directly out of phase. The Wikipedia has been successful in its quest to develop a free, on-line encyclopaedia of breadth and accuracy comparable to that of its print-based competitors because the barrier to participation is so low. Because anybody can edit the Wikipedia, millions do. And it is their collective contribution of small amounts of effort that enables the growth and success of the overall project.

The Nupedia, on the other hand, failed because its use of traditional academic vetting procedures raised the bar to mass participation by amateurs but did not make the project significantly more attractive to professionals. Academics who need prestige and authorial credit for their professional lives are still going to find it difficult to use participation in the Nupedia (or, now, the Citizendium) on our CVs. Even in fields where collaboration is the norm, scholars need to be able to demonstrate intellectual leadership rather than mere participation. A listing as first author is far more valuable than second or third. And second or third author in a traditional venue is infinitely preferable to professional academics to membership in an relatively undifferentiated list of contributors to an on-line encyclopaedia to which amateurs contribute. The most prestigious journals, presses, and encyclopaedias all enforce far higher standards of selectivity than the mere evidence of an earned Ph.D. suggested by Nupedia and or “eligibility” for “a tenure track job” preferred by the Citizendium. No project that hopes to remain open to free collaboration by even a select group well-informed amateurs or marginally qualified is going to be able to compete directly with already existing, traditional publications for the best original work of professional scholarly researchers, no matter how stringent the review process. But by raising the bar against relatively casual participation by large numbers of amateurs, such projects also risk vitiating the “many hands make light work” principle that underlies the explosive success of the Wikipedia and similar participatory projects.

A New Model of Scholarship: The Wikipedia as Community Service

If I am correct in thinking that attempts to create alternatives to the Wikipedia by combining aspects of traditional academic selectivity and review with a wiki-based open collaboration model are doomed to failure, then the question becomes what “we” (the professional University teachers and researchers who are so suspicious of the original Wikipedia) are to do with what “you” (the amateurs who contribute most of the Wikipedia’s content) produce.

It is clear that we can’t ignore it: no matter what we say in our syllabi, students will continue to use the Wikipedia in their essays and projects—citing it if we allow them to do so, and plagiarising from it if we do not. Just as importantly, the Wikipedia is rapidly becoming the public’s main portal to the subjects we teach and research: popular journalists now regularly cite the Wikipedia in their work and the encyclopaedia commonly shows up on the first page of Google searches. While it may not be in any specific scholar’s individual professional interest to take time away from his or her refereed research in order to contribute to a project that provides so little prestige, it is clearly in our collective interest as a profession to make sure that our disciplines are well represented in the first source to which our students and the broader public turn when they want to find out something about the topics we actually research.

But perhaps this shows us the way forward. Perhaps what we need is to see the Wikipedia and similar participatory sites less as a threat to our way of doing things than a way of making what we do more visible to the general public. The fictional romance between LonelyGirl15 and DanielBeast on YouTube did not threaten the makers of commercial television. But it did give prominence to a medium that makers of commercial television now use regularly to attract audiences to their professional content in the traditional media. In our case, the Wikipedia is less an alternative to traditional scholarship (except perhaps as this is represented in print encyclopaedias) than it is a complement—something that can be used to explain, show off, and broaden the appeal of the work we do in our professional lives.

In fact, the important thing about the Wikpedia is that it has been built almost entirely through the efforts of amateurs—that is to say people who are not paid to conduct research in our disciplines but do so anyway because it is their hobby. While it can certainly be disheartening to see the occasional elementary mistake or outlandish theory in a Wikipedia entry, we should not ignore the fact that the entry itself exists because people were interested enough in what we do to try and imitate it in their spare time. Given the traditional lack of respect shown scholarly research by governments and funding agencies for much of the last century, we should be rejoicing in this demonstration of interest—in much the same way scientists judging a science fair are able to see past the many relatively trivial experiments on display and recognise the event’s importance as a representation of popular interest in what they do.

This recognition of the extent to which the Wikipedia has engaged the imagination of the general public and turned it to the amateur practice of scholarship suggests what I think may prove to be the best way of incorporating it into the lives of professional academics: since the Wikipedia appears unable to serve as a route to professional advancement for intrinsic reasons, perhaps we should begin to see contributions to it by professional scholars as a different type of activity altogether—as a form community service to be performed by academics in much the same way lawyers are often expected to give back to the public through their pro bono work. A glance at almost any discussion page on the Wikipedia will show that the Wikipedians themselves are aware of the dangers posed to the enterprise by the inclusion of fringe theories, poor research, and contributions by people with insufficient disciplinary expertise. As certified experts who work daily with the secondary and primary research required to construct good Wikipedia entries, we are in a position to contribute to the construction of individual articles in a uniquely positive way by taking the time to help clean up and provide balance to entries in our professional areas of interest. In doing so, we can both materially improve the quality of the Wikipedia and demonstrate the importance of professional scholars to a public whose hobby touches very closely on the work we are paid to do—and whose taxes, by and large, support us.

And who knows, maybe “we” could even join “you” in accepting Time Magazine’s nomination for person of the year.

Works Cited

Ahrens, Frank 2006. “A Wikipedia Of Secrets.” Washington Post. Sunday, November 5: F07. Online edition, URL: http://www.washingtonpost.com/wp_dyn/content/article/2006/11/03/AR2006110302015.html

Bails, Jennifer.2006. “Schatten’s hand in bogus paper detailed.” Pittsburg Tribune-Review, January 11. http://www.pittsburghlive.com/x/tribune-review/trib/regional/s_412326.html

Bergstein, Brian. “Microsoft Offers Cash for Wikipedia Edit.” Washington Post, January 23. http://www.washingtonpost.com/wp-dyn/content/article/2007/01/23/AR2007012301025.html

Citizendium 2006. “Citizendium’s Fundamental Policies.” Citizendium (citation from version 1.4, October 11) http://www.citizendium.org/fundamentals.html

Department of English, University of Colorado [2007]. “Department of English guidelines for promotion.” Department Handbook. http://www.colostate.edu/Depts/English/handbook/guidepro.htm

Fung, Brian, 2007. “Wikipedia distresses History Department.” middleburycampus.com. Online. URL: http://media.www.middleburycampus.com/media/storage/paper446/news/2007/01/24/News/Wikipedia.Distresses.History.Department-2670081.shtml

Giles, Jim. 2005. “Internet encyclopaedias go head to head.” news@nature.com. “ http://www.nature.com/news/2005/051212/full/438900a.html”: http://www.nature.com/news/2005/051212/full/438900a.html

Grossman, Lev. 2006. “Time’s Person of the Year: You.” Time. Wednesday, Dec. 13. Online Edition. URL: http://www.time.com/time/magazine/article/0%2C9171%2C1569514%2C00.html .

History of Wikipedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=History_of_Wikipedia&oldid=104389205 (accessed January 31, 2007).

Lonelygirl15. Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Lonelygirl15&oldid=104136723 (accessed January 31, 2007).

Multilingual Statistics. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia:Multilingual_statistics&oldid=97805501 (accessed February 2, 2007).

Nupedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Nupedia&oldid=103617050 (accessed January 31, 2007).

Read, Brock. 2006. “Can Wikipedia Ever Make the Grade?” The Chronice of Higher Education October 27. URL: http://chronicle.com/temp/reprint.php?%20id=z6xht2rj60kqmsl8tlq5ltqcshc5y93y

Sanger, Larry J. 2005. “The Early History of Nupedia and Wikipedia: A Memoir.” Part I http://features.slashdot.org/article.pl?sid=05/04/18/164213&tid=95&tid=149&tid=9 Part II: http://features.slashdot.org/article.pl?sid=05/04/19/1746205&tid=95.

Tang, Hangwi and Jennifer Hwee Kwoon Ng. 2006. “Googling for a diagnosis—use of Google as a diagnostic aid: internet based study” BMJ 333)7570): 1143-1145. URL: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1676146: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1676146

Wikipedia: Peer Review. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia:Peer_review&oldid=104637689 (accessed January 31, 2007).

Wikipedia. Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Wikipedia&oldid=104645649 (accessed January 31, 2007).

Wired Campus 2006. “Wikipedia Founder Discourages Academic Use of His Creation.” Chronicle of Higher Education. June 12. URL: http://chronicle.com/wiredcampus/article/1328/wikipedia-founder-discourages-academic-use-of-his-creation

----  

The Ghost in the Machine: Revisiting an Old Model for the Dynamic Generation of Digital Editions

Posted: Dec 16, 2006 00:12;
Last Modified: May 23, 2012 20:05

---

First Published: HumanIT 8.1 (2005): 51-71. http://www.hb.se/bhs/ith/1-8/dpo.pdf

“The Electronic Cædmon’s Hymn Editorial Method” (1998)

In 1998, a few months into the preparation of my electronic edition of the Old English poem Cædmon’s Hymn (O’Donnell forthcoming), I published a brief prospectus on the “editorial method” I intended to follow in my future work (O’Donnell 1998). Less a true editorial method than a proposed workflow and list of specifications, the prospectus called for the development of an interactive edition-processor by which “users will […] be able to generate mediated (‘critical’) texts on the fly by choosing the editorial approach which best suits their individual research
or study needs” (O’Donnell 1998, ¶ 1).

The heart of the prospectus was a diagram of the “Editorial Process Schema” I intended to follow (figure 1). The edition was to be based on TEI (P2) SGML-encoded diplomatic transcriptions of all twenty-one known witnesses to the poem. Its output was to consist of dynamically generated “HTML/XML” display texts that would allow users access to different views of the underlying textual data depending on their specific interests: e.g. editions containing reconstructions of archetypal texts, student texts based on witnesses showing the simplest vocabulary and grammar, “best text” editions of individual witnesses or recensions, etc. The production of these display texts was to be handled by a series of SGML “filters” or “virtual editions” that would be populated by the
unspecified processor used to format and display the final output. [Begin p. 51]

Figure 1. Editorial Process Schema (O’Donnell 1998)

Goals

The initial impetus for this approach was practical. Although it is quite short, Cædmon’s Hymn has a relatively complex textual history for an Anglo-Saxon poem. Even in print, it has always been edited as a multitext. The standard print edition (Dobbie 1942) reproduces two editorial versions of the poem without commenting on their relative priority. Few other studies have managed to be even this decisive. Dobbie’s text was the last (before my forthcoming edition) to attempt to produce critical texts based on the entire manuscript tradition. Most editions before and
since have concentrated on individual recensions or groups of witnesses[1[. Anticipating great difficulty in proof-reading an electronic edition that might have several editorial texts and multiple textual apparatus2. I was at this early stage keenly interested in reducing the opportunity for typographical error. A workflow that would allow me to generate a number of [Begin p. 52] different critical texts from a single set of diplomatic transcriptions without retyping was for this reason an early desideratum.

This convenience, however, was not to come at the expense of editorial content: a second important goal of my prospectus was to find an explicit home for the editor in what Murray McGillivray recently had described as a “post-critical” world (McGillivray 1994; see also Ross 1996; McGann 1997). In medieval English textual studies in 1998, indeed, this post-critical world seemed to be fast approaching: the first volume of the Canterbury Tales Project, with its revolutionary approach to electronic collation and stemmatics and a lightly-edited guide text, had been published two years earlier (Robinson 1996). Forthcoming publications from the Piers Plowman Electronic Archive (Adams et al. 2000) and Electronic Beowulf (Kiernan 1999) projects, similarly, promised a much heavier emphasis on the manuscript archive (and less interest in the critical text) than their more traditional predecessors. My initial work with the Cædmon’s Hymn manuscripts (e.g. O’Donnell
1996a; O’Donnell 1996b), however, had convinced me that there was a significant need in the case of this text for both user access to the witness archive and editorial guidance in the interpretation of this primary evidence – or, as Mats Dahlström later would point out, that the two approaches had complementary strengths and weaknesses:

The single editor’s authoritative control in the printed SE [Scholarly Edition], manifested in e.g. the versional prerogative, isn’t necessarily of a tyrannical nature. Conversely, the much spoken-of hypermedia database exhibiting all versions of a work, enabling the user to choose freely between them and to construct his or her “own” version or edition, presupposes a most highly competent user, and puts a rather heavy burden on him or her. Rather, this kind of ultra-eclectic archive can result in the user feeling disoriented and even lost in hyperspace. Where printed SE:s tend to bury rival versions deep down in the variant apparatuses, the document architecture of extreme hypertext SE:s, consequential to the very nature of digitally realised hypertext, threatens to bury the user deep among the mass of potential virtuality. (Dahlström 2000, 17) [Begin p. 53]

Keen as I was to spare myself some unnecessary typing, I did not want this saving to come at the expense of providing access to the “insights and competent judgement” (Dahlström 2000, 17) I hoped to acquire in several years’ close contact with the manuscript evidence. What I needed, in other words, was a system in which the computer would generate, but a human edit, the final display texts presented to the reader.

Theory

In order to accomplish these goals, the prospectus proposed splitting the editorial process into distinct phases: a transcription phase, in which human scholars recorded information about the text as it appeared in the primary sources (the “Witness Archive”); an editorial (“Filtering”) phase, in which a human editor designed a template by which a display text was to be produced from the available textual evidence (“Virtual Editions”); a processing phase, in which a computer applied these filters to the Witness Archive; and a presentation phase, in which the resultant output was presented to the reader. The first and second stages were to be the domains of the human editor; the third and fourth that of the computer. An important element of this approach was the assumption that the human editor, even in traditional print sources, functioned largely as a rules-based interpreter of textual data – or as I (in retrospect unfortunately) phrased it, could be “reduced to a set of programming instructions”3 – in much the same way as a database report extracts and format specific information from the underlying data table of a database:

bq..In my view, the editor of a critical edition is understood as being functionally equivalent to a filter separating the final reader from the uninterpreted data contained in the raw witnesses. Depending on the nature of the instructions this processor is given, different types of manipulation will occur in producing the final critical edition. An editor interested in producing a student edition of the poem, for example, can be understood to be manipulating the data according to the instructions “choose the easiest (most sensible) readings and ignore those which raise advanced textual problems”; an editor interested in producing the “original” text can be seen as a processor performing the instruction “choose readings from the earliest manuscript(s) when these are available [Begin p. 54] and sensible; emend or normalise readings as required”; and an editor interested in producing an edition of a specific dialectal version of a text is working to the instruction “choose readings from manuscripts belong to dialect x; when these are not available, reconstruct or emend readings from other manuscripts, ensuring that they conform to the spelling rules of the dialect”. (O’Donnell 1998, ¶¶ 4 f.)

Advantages

From a theoretical perspective, the main advantage of this approach was that it provided an explicit location for the encoding of editorial knowledge – as distinct from textual information about primary sources, or formatting information about the final display. By separating the markup used to describe a text’s editorial form from that used to describe its physical manifestation in the witnesses, or its final appearance to the end user, this method made it easier in principle both to describe phenomena at a given level in intrinsically appropriate terms and to modify, reuse, or revise information at each level without necessarily having to alter other aspects of the edition design – in much the same way as the development of structural markup languages themselves had freed text encoders from worrying unduly about final display. Scholars working on a diplomatic transcription of a manuscript in this model would be able to describe its contents without having to ensure that their markup followed the same semantic conventions (or even DTD) as that used at the editorial or display levels.

Just as importantly, the approach was, in theory at least, infinitely extensible. Because it separated transcription from editorial activity, and because it attempted to encode editorial activity as a series of filters, users were, in principle, free to ignore, adapt, add to, or replace the work of the original editor. Scholars interested in statistical or corpus work might choose to work with raw SGML data collected in the witness archive; those interested in alternative editorial interpretations might wish to provide their own filters; those wishing to output the textual data to different media or using different display formats were free to adapt or substitute a different processor. Espen S. Ore recently has discussed how well-made and suitably-detailed transcriptions of source material might be used or adapted profitably by other scholars and projects as the basis [Begin p. 55] for derivative work (Ore 2004); from a theoretical perspective the “Editorial Method” proposed for use in Cædmon’s Hymn offered an early model for how such a process might be built into an edition’s design. Indeed, the method in principle allowed editors of new works to operate in the other direction as well: by building appropriate filters, editors of original electronic editions could attempt to model the editorial decisions of their print-based predecessors, or apply techniques developed for other texts to their own material4.

Implementation (1998)

Despite its theoretical attractiveness, the implementation of this model proved, in 1998, to be technically quite difficult. The main problem was access to technology capable of the type of filtering envisioned at the Virtual Edition level. In the original model, these “editions” were supposed to be able both to extract readings from multiple source documents (the individual witness transcriptions) and to translate their markup from the diplomatic encoding used in the original transcriptions to that required by the new context – as a reading used in the main text of a critical edition, say, or a form cited in an apparatus entry, textual note, or introductory paragraph. This type of transformation was not in and of itself impossible to carry out at the time: some SGML production environments and several computer languages (e.g. DSSSL or, more generally, Perl and other scripting languages) could be used to support most of what I wanted to do; in the days before XSL, however, such solutions were either very much cutting edge, or very expensive in time and/or resources. As a single scholar without a dedicated technical staff or funding to purchase commercial operating systems, I was unable to take full advantage of the relatively few transformation options then available.

The solution I hit upon instead involved dividing the transformation task into two distinct steps (extraction and translation) and adding an extra processing level between the witness and virtual edition levels in my original schema: [Begin p. 56]

Figure 2. Implemented Schema

Instead of acting as the locus of the transformation, the editorial filters in this revised model provided a context for text that had been previously extracted from the witness archive and transformed for use in such circumstances. The text these filters called upon was stored in a textual database as part of the project’s entity extension file (project.ent, see Sperberg-McQueen and Burnard 2004, § 3.3), and hence resident in the project DTD. The database itself was built by extracting marked-up readings from the original witness transcription files (using grep) and converting them (using macros and similar scripts) to entities that could be called by name anywhere in the project. Transformations involving a change in markup syntax or semantics (e.g. from a diplomatic-linguistic definition of a word in witness transcriptions to a syntactic and morphological definition in the edition files) also generally were performed in this DTD extension file. [Begin p. 57]

First two lines of a TEI SGML transcription of Cædmon’s Hymn witness T1:

〈l id=“t1.1” n=“1“〉
 〈seg type=“MSWord” id=“t1.1a.1“〉Nu〈space extent=“0“〉〈/seg〉
 〈seg type=“MSWord” id=“t1.1a.2“〉〈damage type=“stain” degree=“moderate“〉sculon〈/damage〉〈space〉〈/seg〉
 〈note id=“t1.1a.3.n” type=“transcription” target=“t1.1a.2 t1.1a.4 t1.1b.1 t1.2b.3 t1.3a.1 t1.4a.1 t1.4a.2 t1.4b.1 t1.6a.1 t1.6a.2 t1.7b.1 t1.7b.2 t1.9b.2“〉&copyOft1.1a.2;…&copyOft1.9b.2;] Large stain obscures some text down inside (right) margin of p. 195 in facsimile. Most text is readable, however.〈/note〉
 〈seg type=“MSWord” id=“t1.1a.3“〉〈damage type=“stain” degree=“moderate“〉herigean〈/damage〉〈space〉〈/seg〉
 〈caesura〉
 〈seg type=“MSWord” id=“t1.1b.1“〉〈damage type=“stain” degree=“light“〉he〈/damage〉ofon〈lb〉rices〈space〉〈/seg〉
 〈seg type=“MSWord” id=“t1.1b.2“〉&wynn;eard〈space〉〈/seg〉
〈/l〉
〈l id=“t1.2” n=“2“〉
 〈seg type=“MSWord” id=“t1.2a.1“〉meotodes〈space〉〈/seg〉
 〈seg type=“MSWord” id=“t1.2a.2“〉me〈corr sic=“u” cert=“50%“〉〈del rend=“overwriting“〉u〈/del〉〈add rend=“overwriting” place=“intralinear“〉a〈/add〉〈/corr〉hte〈space〉〈/seg〉
 〈note type=“transcription” id=“t1.2a.2.n” target=“t1.2a.2” resp=dpod〉&copyOft1.2a.2;] Corrected from 〈foreign〉meuhte〈/foreign〉?〈/note〉
 〈caesura〉
 〈seg type=“MSWord” id=“t1.2b.1“〉&tyronianNota;〈space extent=“0“〉〈/seg〉
 〈seg type=“MSWord” id=“t1.2b.2“〉his〈space〉〈/seg〉
 〈seg type=“MSWord” id=“t1.2b.3“〉〈damage type=“stain” degree=“severe“〉〈unclear reason=“stain in facsimile” cert=“90%“〉mod〈/unclear〉〈/damage〉〈damage type=“stain” degree=“moderate“〉geþanc〈/damage〉〈space〉〈/seg〉
 〈note type=“transcription” id=“t1.2b.3.n” target=“t1.2b.3“〉&copyOft1.2b.3;] 〈c〉mod〈/c〉 obscured by stain in facsimile.〈/note〉
〈/l〉

Same text after conversion to entity format (Information from the original l, w, caesura, and note elements are stored separately).

〈!ENTITY t1.1a.1 ‘Nu〈space type=“wordBoundary” extent=“0“〉‘〉
〈!ENTITY t1.1a.2 ‘sc〈damage type=“stain” rend=“beginning“〉ulon〈/damage〉〈space type=“wordBoundary” extent=“1“〉‘〉

[Begin p. 58]

〈!ENTITY t1.1a.3 ‘〈damage type=“stain” rend=“middle“〉herıgean〈/damage〉〈space type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.1b.1 ‘〈damage type=“stain” rend=“end“〉heo〈/damage〉fon〈lb〉rıces〈space
type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.1b.2 ‘&mswynn;eard〈space type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.2a.1 ‘meotodes〈space type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.2a.2 ‘me〈damage type=“stain” rend=“complete“〉a〈/damage〉hte〈space type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.2b.1 ‘〈abbr type=“scribal” expan=“ond/and/end“〉&tyronianNota;〈/abbr〉〈expan type=“scribal“〉ond〈/expan〉〈space type=“wordBoundary” extent=“0“〉‘〉
〈!ENTITY t1.2b.2 ‘hıs〈space type=“wordBoundary” extent=“1“〉‘〉
〈!ENTITY t1.2b.3 ‘〈damage type=“stain” rend=“beginning“〉〈unclear rend=“complete“〉mod〈/unclear〉geþanc〈/damage〉〈space type=“wordBoundary” extent=“1“〉‘〉

Same text after conversion to editorial format for use in editions.

〈!ENTITY ex.1a.1 ‘Nu‘〉
〈!ENTITY ex.1a.2 ‘sculon‘〉
〈!ENTITY ex.1a.3 ‘herigean‘〉
〈!ENTITY ex.1b.1 ‘heofonrices‘〉
〈!ENTITY ex.1b.2 ‘&edwynn;eard‘〉
〈!ENTITY ex.2a.1 ‘meotodes‘〉
〈!ENTITY ex.2a.2 ‘meahte‘〉
〈!ENTITY ex.2b.1 ‘ond‘〉
〈!ENTITY ex.2b.2 ‘his‘〉
〈!ENTITY ex.2b.3 ‘modgeþanc‘〉

Citation from the text of T1 (bold) in an introductory chapter (simplified for demonstration purposes).

〈p id=“CH6.420” n=“6.42“〉Old English 〈mentioned lang=“ANG“〉swe〈/mentioned〉, 〈mentioned
lang=“ANG“〉swæ〈/mentioned〉, 〈mentioned lang=“ANG“〉swa〈/mentioned〉 appears as 〈mentioned
rend=“postcorrection” lang=“ANG“〉&t1.3b.1;〈/mentioned〉 (&carmsx; 〈mentioned rend=“postcorrection”
lang=“ANG“〉&ar.3b.1;〈/mentioned〉) in all West-Saxon witnesses of the poem on its sole occurrence in 3b. The expected West-Saxon development is 〈mentioned lang=“ANG“〉swæ〈/mentioned〉, found in early West-Saxon. As in most dialects, however, 〈mentioned lang=“ANG“〉swa〈/mentioned〉 develops
irregularly in the later period. 〈mentioned [Begin p. 59] lang=“ANG“〉Swa〈/mentioned〉 is the usual late West-Saxon reflex (see &hogg1992;, § 3.25, n. 3).〈/p〉

Citation from the text of T1 (bold) in a textual apparatus (simplified for demonstration purposes)

〈app id=“EX.1A.1.APP” n=“1” from=“EX.1A.1“〉
 〈lem id=“EX.1A.1.LEM” n=“1a“〉&ex.1a.1;〈/lem〉
 〈rdggrp〉
  〈rdggrp〉
   〈rdggrp〉
    〈rdg id=“T1.1A.1.RDG” wit=“T1“〉&t1.1a.1;〈/rdg〉〈wit〉〈xptr doc=“t1”
from=“T1.1A.1” n=“T1” rend=“eorthan“〉〈/wit〉
    〈rdg id=“O.1A.1.RDG” wit=“O (Pre-Correction)“〉〈seg rend=“precorrection“〉&o.1a.1;〈/seg〉〈/rdg〉〈wit〉〈xptr doc=“o” from=“O.1A.1” n=“O (Pre-Correction)”
rend=“eorthan“〉〈/wit〉
   〈/rdggrp〉
  〈/rdggrp〉
  〈rdggrp〉
   〈rdggrp〉
    〈rdg id=“N.1A.1.RDG” wit=“N“〉&n.1a.1;〈/rdg〉〈wit〉〈xptr doc=“n” from=“N.1A.1” n=“N” rend=“eorthan“〉〈/wit〉
   〈/rdggrp〉
  〈/rdggrp〉
 〈/rdggrp〉
 〈rdggrp〉
  〈rdggrp〉
   〈rdggrp〉
    〈rdg id=“B1.1A.1.RDG” wit=“B1“〉&b1.1a.1;&b1.1a.2;〈/rdg〉〈wit〉〈xptr doc=“b1” from=“B1.1A.1” n=“B1” rend=“eorthan“〉〈/wit〉
    〈rdg id=“TO.1A.1.RDG” wit=“To“〉&to.1a.1;&to.1a.2;〈/rdg〉〈wit〉〈xptr doc=“to” from=“TO.1A.1” n=“To” rend=“eorthan“〉〈/wit〉
    〈rdg sameas=“O.1A.1.RDG” wit=“O (Post-Correction)“〉〈seg
rend=“postcorrection“〉&o.1a.1;&o.1a.2;〈/seg〉〈/rdg〉〈wit〉〈xptr doc=“o” from=“O.1A.1” n=“O (Post-Correction)” rend=“eorthan“〉〈/wit〉
    〈rdg id=“CA.1A.1.RDG” wit=“Ca“〉&ca.1a.1;&ca.1a.2;〈/rdg〉〈wit〉〈xptr doc=“ca” from=“CA.1A.1” n=“Ca” rend=“eorthan“〉〈/wit〉
   〈/rdggrp〉
  〈/rdggrp〉
 〈/rdggrp〉
〈/app〉

[Begin p. 60]

Implementation (2005)

The solutions I developed in 1998 to the problem of SGML transformation are no longer of intrinsic interest to Humanities Computing specialists except, perhaps, from a historical perspective. With the publication of the first XSL draft in November 1999, and, especially, the subsequent rapid integration of XSL and XML into commercial and academic digital practice, editors soon had far more powerful languages and tools available to accomplish the same ends.

Where my solutions are valuable, however, is as proof-of-concept. By dividing the editorial process into distinct phases, I was able to achieve, albeit imperfectly, both my original goals: no Old English text from the primary witnesses was input more than once in my edition and I did to a certain extent find in the “Virtual Editions” an appropriate and explicit locus for the encoding of editorial information.

With the use of XSLT, however, it is possible to improve upon this approach in both practice and theory. In practical terms, XSLT functions and instructions such as document() and xsl:result-document eliminate the need for a pre-compiled textual database: scholars using XSLT today can work, as I originally had hoped to, directly with the original witness transcriptions, extracting readings, processing them, and outputing them to different display texts using a single language and processor – and indeed perhaps even a single set of stylesheets.

In theoretical terms, moreover, the adoption of XSLT helps clarify an ambiguity in my original proposal. Because, in 1998, I saw the process of generating an edition largely as a question of translation from diplomatic to editorial encoding, my original model distinguished between the first two levels on largely semantic grounds. The Witness Archive was the level that was used to store primary readings from the poem’s manuscripts; the filter or Virtual Edition level was used to store everything else, from transformations necessary to translate witness readings into
editorial forms to secondary textual content such as introductory chapters, glossary entries, and bibliography.

In XSLT terms, however, there is no significant reason for maintaining such a distinction: to the stylesheet, both types of content are simply raw material for the transformation. What this raw material is, where it came from, or who its author is, are irrelevant to the stylesheet’s task of
[Begin p. 61] organisation, adaptation, interpretation, and re-presentation. While poor quality or poorly constructed data will affect the ultimate quality of its output, data composition and encoding remain, in the XSLT world, distinct operations from transformation.

This is significant because it helps us refine our theoretical model of the editorial process and further isolate the place where editorial intelligence is encoded in a digital edition. For organisation, adaptation, interpretation, and re-presentation are the defining tasks of the scholarly editor as much as they are that of the XSLT stylesheet. Change the way a standard set of textual data is interpreted, organised, adapted, or presented, and you change the nature of the final “edition”. Editions of literary works are usually based on very similar sets of primary data – there is only one Beowulf manuscript, after all, and even better attested works usually have a relatively small set of textually significant witnesses, editions, or recensions. What differences arise between modern editions of literary texts tend for the most part to hinge on the reinterpretation of existing evidence, rather than any real change in the available data5. In traditional editions, the evidence for this observation can be obscured by the fact that the “editor” also usually is responsible for much of the secondary textual content. That the observation is true, however, is demonstrated by emerging forms of digital editions in which the editorial function is largely distinct from that of content creation: multigenerational and derivative editions such as those discussed by Ore (2004), as well as interactive models such as that proposed by the Virtual Humanities Lab (e.g. Armstrong & Zafrin 2005), or examples in which users reinterpret data in already existing corpora or databases (e.g. Green 2005).

Taken together, this suggests that my 1998 model was correct in its division of the editorial process into distinct tasks, but imprecise in its understanding of the editorial function. [Begin p. 62]

Figure 3. Revised Schema

In the revised version, the original “Witness Archive” is now reconceived of more generally as a collection of textual data used in the edition, regardless of source or type. This data is then organised, interpreted, adapted, and prepared for presentation using stylesheets (and perhaps other organisational tools) provided by an “editor” – regardless of whether this “editor” is the person responsible for assembling and/or authoring the original content, an invited collaborator, or even an end user. As in the original model, this reorganisation is then presented using
an appropriate display media.

Conclusion

Technical advances of the last eight years have greatly improved our ability to extract and manipulate textual data – and our ability to build editions in ways simply impossible in print. The model for the editorial [Begin p. 63] process proposed in O’Donnell (1998) represented an early attempt to understand how the new technology might affect the way editors work, and, more importantly, how this technology might be harnessed more efficiently. With suitable modifications to reflect our field’s growing sophistication, the model appears to have stood the test of time, and proven itself easily adapted to include approaches developed since its original publication. From my perspective, however, a real sign of strength is that it continues to satisfy my original two goals: it suggests a method for avoiding reinputting primary source documents, and it provides a description of the locus of editorial activity; in an increasingly collaborative and interactive scholarly world, it appears that the ghost in the machine may reside in the stylesheet.

Daniel Paul O’Donnell is an Associate Professor of English at the University of Lethbridge, Alberta, Canada. He is also director of the Digital Medievalist Project 〈http://www.digitalmedievalist.org/〉 and editor of Cædmon’s Hymn: A Multimedia Study, Edition, and Archive (D.S. Brewer, forthcoming 2005). His research interests include Old English poetry, Textual and Editorial Studies, Humanities Computing, and the History of the Book. E-mail: daniel.odonnell@uleth.ca Web page: http://people.uleth.ca/~daniel.odonnell/ [Begin p. 64]

Notes

1 A bibliography of studies and editions of Cædmon’s Hymn can be found in O’Donnell (forthcoming).

2 In the event, the final text of O’Donnell (forthcoming) has eight critical editions, all of which have several apparatus, and “semi-normalised” editions of all twenty-one witnesses.

3 This choice was unfortunate, as it seems that it led to my model being understood far more radically than I intended (e.g. in Dahlström 2000, 17, cited above). A perhaps better formulation would be that editors (print and digital) function in a manner analogous to (and perhaps reproducable in) progamming instructions.

4 In practice, of course, this type of modelling would work best in the case of simple, linguistically oriented exemplars. It becomes increasingly difficult – though still theoretically possible – with more complex or highly eclectic editorial approaches. A rule-based replica of Kane and Donaldson (1988), for example, is possible probably only in theory.

5 While this obviously does not apply in those few cases in which editions are made after the discovery of significant new textual evidence, such discoveries are few and far between. Most editorial differences are the result of a reinterpretation of essentially similar sets of textual data.

[Begin p. 65]

References

[Begin p. 66]

[Begin p. 67]

Appendix: O’Donnell (1998)

The following is a reprint of O’Donnell (1998). It has been reformatted for publication, but is otherwise unchanged from the original text with the exception of closing brackets that were missing from some of the code examples in the original and that have been added here. The Editorial Schema diagram has been redrawn without any deliberate substantive alteration. The original low resolution version can be found at 〈http://people.uleth.ca/~daniel.odonnell/research/caedmon-job.html〉.

The Electronic Cædmon’s Hymn: Editorial Method

Daniel Paul O’Donnell

The Electronic Cædmon’s Hymn will be an archive based, virtual critical edition. This means users will:

The following is a rough schema describing how the edition will work:

[Begin p. 68]

Figure 1.

This schema reflects my developing view of the editing process. The terms (Witness Level, Processor Level, etc.) are defined further below.

In my view, the editor of a critical edition is understood as being functionally equivalent to a filter separating the final reader from the uninterpreted data contained in the raw witnesses. Depending on the nature of the instructions this processor is given, different types of manipulation will occur in producing the final critical edition. An editor interested in producing a student edition of the poem, for example, can be understood to be manipulating the data according to the instructions choose the easiest (most sensible) readings and ignore those which raise advanced textual problems; an editor interested in producing the ‘original’ text can be seen as a processor performing the instruction choose readings from the earliest manuscript(s) when these are available and sensible; emend or normalise readings as required; and an editor interested in producing an edition of a specific dialectal version of a text is working to the instruct[Begin p. 69]tion choose readings from manuscripts belong to dialect x; when these are not available, reconstruct or emend readings from other manuscripts, ensuring that they conform to the spelling rules of the dialect. If editors can be reduced to a set of programming instructions, then it ought to be possible, in an electronic edition, to automate the manipulations necessary to produce various kinds of critical texts. In the above schema, I have attempted to do so. Instead of producing a final interpretation of ‘the text’, I instead divide the editorial process into a series of discrete steps:

Because the critical edition is not seen as an actual text but rather as a simple view of the raw data, different textual approaches are understood as being complementary rather than competing. It is possible to have multiple ‘views’ coexisting within a single edition. Readers will be expected to choose the view most appropriate to the type of work they wish to do. For research requiring a reconstruction of the hypothetical ‘author’s original’, a ‘reconstruction filter’ might be applied; a student can apply the ‘student edition filter’ and get a readable simplified text.
And the oral-formulaicist can apply the ‘single manuscript x filter’ and get a formatted edition of the readings of a single manuscript. Because different things are expected of the different levels, each layer has its own format and protocol. Because all layers are essential to the
development of the text, all would be included on the CDRom containing the edition. Users could program their own filters at the filter level, or change the processing instructions to use other layouts or formats; they could also conduct statistical experiments and the like on the raw
SGML texts in the witness archive or filter level as needed.

[Begin p. 70]

Witness Archive

The witness archive consists of facsimiles and diplomatic transcriptions of all relevant witnesses marked up in SGML (TEI) format. TEI is better for this initial stage of the mark-up because it is so verbose. Information completely unnecessary to formatting – linguistic, historical, metrical,
etc. – can be included for use search programs and manipulation by other scholars.

The following is a sample from a marked-up transcription at the witness archive level:

bq..〈l id=“ld.1” n=“1“〉
 〈w〉Nu〈/w〉
 〈w〉&wynn;e〈/w〉〈space extent=0〉
 〈w〉sceolan〈/w〉
 〈w〉herian〈/w〉
 〈w〉〈del type=“underlined“〉herian〈/del〉〈/w〉
 〈caesura〉
 〈w〉heo〈lb〉〈add hand=“editorial” cert=“90“〉f〈/add〉on〈space extent=1〉rices〈/w〉
 〈w〉&wynn;eard〈/w〉.〈space extent=0〉
〈/l〉

Virtual Editions

Virtual Editions are the filters that contain the editorial processing instructions. They are not so much texts in themselves as records of the intellectual processes by which a critical text interprets the underlying data contained in the witness archive. They are SGML (TEI) encoding
documents which provide a map of which witness readings are to be used in which critical texts. For most readings in most types of editions, these instructions will consist of empty elements using the ‘sameAs’ and ‘copyOf’ attributes to indicate which witness is to provide a specific
reading: e.g. 〈w copyOf=CaW2〉〈/w〉 where CaW2 is the identifier for the reading of a specific word from manuscript Ca. One of the advantages of this method is that eliminates one potential source of error (cutting and pasting from the diplomatic transcriptions into the critical editions); it also allows for the near instantaneous integration of new manuscript readings into the finished editions – changes in the witness transcriptions are automatically incorporated in the final texts via the filter.

[Begin p. 71]

In some cases, the elements will contain emendations or normalisation instructions: e.g. 〈w sameAs=CaW2〉þa〈w〉. The sample is from a virtual edition. It specifies that line 1 of this critical text is to be taken verbatim from manuscript ld (i.e. the text reproduced above):

〈l id=“Early.1” n=“1” copyOf=“ld.1“〉〈/l〉

Processing Level and Display Texts

The ‘Virtual Editions’ are a record of the decisions made by an editor in producing his or her text rather than a record of the text itself. Because they consists for the most part of references to specific readings in other files, the virtual editions will be next-to-unreadable to the human eye. Turning these instructions into readable, formatted text is the function of the next layer – in which the processing instructions implied by the virtual layer are applied and in which final formatting is applied. This processing is carried out using a transformation type processor – like Jade – in which the virtual text is filled in with actual readings from the
witness archive, and these readings then formatted with punctuation and capitalisation etc. as required. The final display text is HTML or XML. While this will involve a necessary loss of information – most TEI tags have nothing to do with formatting, few HTML tags have much to do with content – it is more than compensated for by the ability to include the bells and whistles which make a text useful to human readers: HTML browsers are as a rule better and more user friendly than SGML browsers. Users who need to do computer analysis of the texts can always use the TEI encoded witness transcriptions or virtual editions.

Here is my guess as to how HTML would display the same line in the final edition (a critical apparatus would normally also be attached at this layer containing variant readings from other manuscripts [built up from the manuscript archive using the ‘copyOf’ attribute rather than by
cutting and pasting]; notes would discuss the various corrections etc. ignored in the reading text of this view):

〈P〉Nu we sceolan herian heofonrices weard〈/P〉

----  

Why should I write for your Wiki? Towards an economics of collaborative scholarship.

Posted: Dec 15, 2006 17:12;
Last Modified: Jan 04, 2017 16:01

---

Originally presented at the conference of the Renaissance of America. San Francisco, CA. March, 2006.

I’d like to begin today by telling you the story of how I came to write this paper. Ever since I was in high school, I have used a process called “constructive procrastination” to get things done. This system involves collecting a bunch of projects due at various times and then avoiding work on the one that is due right now by finishing something else instead. Or as my wife, who actually teaches this system says: “if you want me to get your project done today, give me something more important to avoid working on.”

In this particular case, the important thing I wanted to avoid doing was this lecture. And the thing I did instead in order to avoid it was work on an article for the Wikipedia. Or rather—and to be honest, worse—work on revising an article I put up on the Wikipedia almost a year ago when I was was trying to avoid working on an article on Fonts for the Digital Medievalist.

The goal of my procrastination this time was to get my entry recognised as a “Featured article”. A “Featured article” at the Wikipedia is one considered suitable for displaying on the site’s front page. Such articles are supposed to represent the very best of the encyclopaedia, and an unofficial policy, frequently cited by reviewers, restricts them to approximately 0.1% of the total database.

Getting an article recognised as a “Feature” turns out to be a remarkably difficult process. You nominate your work for consideration, at which point it is opened for review by the community at large. And they basically tell you to take it back and make it better. Very few articles seem to sail right through. The ones I saw on their way to featured status had all gone through the process at least once before.

In my case the reviewers didn’t like my referencing style, thought the writing was aimed at too specialised an audience, and generally wanted much more background detail. After two weeks of hard work, and about 100 edits, the article is now beginning to get good rather than lukewarm to negative reviews and now seems on its way to getting recognition as a “feature”. I’m debating resubmitting next time I have something else to avoid doing1.

In addition to being surprisingly conscientious, the comments I received on my piece were also remarkably astute. Unbeknownst to the reviewers, indeed, they accurately reflected the article’s history. I the first added the entry—which is on Cædmon, the Anglo-Saxon poet and subject of my recent book from Boydell and Brewer—last year, when, trying to avoid researching an entry on fonts for the Digital Medievalist, I decided to see how the Wikipedia handled something I knew something about. The original entries on Cædmon and his Hymn were quite inaccurate, and relied on very old sources; the one on Cædmon’s Hymn also made an odd claim about hidden paganism in the poem. In the interests of procrastination, I decided to correct the entry on Cædmon’s Hymn, and replace the account of the poet’s life with an entry I had just written for a specialist print encyclopaedia, The Sources of Anglo-Saxon Literary Culture. With my print editor’s permission, I quickly revised the entry I had submitted to him, cutting out unnecessarily detailed information and adding some background material, and pasted the results into the Wikipedia. There were a couple of minor problems—I forgot to remove some books I was no longer citing from the works cited list, and some of the italics and character encoding were messed up—but on the whole the article received good comments on its discussion page, and was left alone for the most part by other users. This is generally a good sign in the Wikipedia, and in fact a criteria for recognition as a featured article.

My entry for Cædmon’s Hymn didn’t fare as well: the author of the original piece kept reversing my edits until others recommended that the piece be merged with the larger Cædmon article. I never did finish my work for the wiki entry on Fonts that I was supposed to be researching for the Digital Medievalist… though I do have an article due at the beginning of May that I’d like to avoid.

I’ve told you the story of how I came to write this paper—or rather avoid doing so—because I think it illustrates several important things about the possibilities and challenges involved in creating an information commons.

Information commons are a relatively hot topic right now. These are collaborative environments in which content is developed and published interactively—by the users of the community for which it is intended. Such communities can take various forms, but the most common are probably blog farms, RSS servers, Wikis, other types of collaborative tools such as Version Control Systems, annotation engines, and the more familiar chat rooms and email lists.

More importantly, such environments are beginning to become more popular in the scholarly world as well. A number of projects, such as STOA, the Digital Medievalist and Digital Classicist Projects, the Virtual Humanities Lab at Brown, and the Text Encoding Initiative are beginning to use tools like Wikis as a central part of their environment, or experiment with more sophisticated types of collaborative tools such as annotation and editing engines.

What my experience with the Wikipedia shows is that these commons can indeed end up with—if I say so myself—detailed and scholarly work of a relatively high standard. I don’t work for the Wikipedia after all, and I have—for whatever twisted psychological reasons—devoted a reasonable amount of time and expertise contributing a thoroughly researched and referenced entry on my subject.

Moreover, my experience also shows that such communities can be collaborative in the best sense: my article is now much better suited for its intended audience—and better written, I think—as a result of the criticism I received from the Wikipedia reviewers after I nominated it for feature status.

And, a final positive point: it shows that such communities can be self-policing. The person who thought Cædmon was really praising Pagan gods in the original entry (a very non-standard view) was eventually outvoted and reined in by a consensus among other users. And to his credit he accepted this consensus. and moved on.

But my experience also shows some of the difficulties involved in running a community of this sort:

First of all, my best and most careful work appeared only with the prospect of a reward. The reward is not great—neither my dean nor my colleagues are going to care if my article is selected as a “Feature.” But it was only once I decided to go for Feature status that I did the kind of detailed slogging that I normally do in my day-to-day research, and indeed had done in the print entry from which I revised my Wikipedia article.

Secondly, while I did contribute up-to-date scholarship to the Wikipedia, I didn’t do any research for the Wikipedia: I contributed my Cædmon article because I had something suitable lying around which I had already researched and written for a different purpose. Nobody—even the hobbyists who contribute most of the Wikipedia’s material—would put the kind of research I did into a piece written exclusively for it. If they did, it is highly doubtful that they would devote the kind of time to checking citation style and the like that print editors demand from professional scholars.

And finally, although the community is self-policing, it is not always safe to walk the streets at night: the person whose work I corrected, did, after all, come back and undo my revisions. Even though he ultimately gave in to the consensus opinion of the users—and what if the consensus had been wrong?—his inaccuracies nevertheless did replace my corrections for a significant amount of time.

I am not the first person to notice these positive and negative aspects of the commons: having used wikis on a number of projects for a couple of years, I can tell you that the problem of quality control is the second thing most academics comment on when they are introduced to wiki software, after first expressing their admiration for the concept of a user-edited environment. But because these environments are becoming more popular in a scholarly context, it is worthwhile revisiting what are in my view the two most important organisational issues facing scholarly intellectual commons:

  1. How do you get knowledgeable people to contribute their best work?
  2. How do you prevent abuse/vandalism/and/nonsense from the well-meaning but incompetent?

For the rest of this paper, I’m going to address these problems in a fairly speculative way. There are some practical steps we can take right now to find solutions to them but it is worthwhile also thinking about how they might be solved given enough time and technical expertise. Indeed in some ways, my goal is to contribute to a debate in the much the same way one contributes to the Wikipedia: throw something out there and hope that somebody can improve on it.

Although these are crucial problems for intellectual commons, they are by no unique to them. The question of how you get good quality work in and keep the bad out is also central to the operation of peer-reviewed journals or, indeed, any kind of organised communication.

These are crucial problems for an intellectual commons, however, because, in its purest state, a commons has no gatekeeper: the Wikipedia is the encyclopaedia that “_anybody_ can edit” (emphasis added). That is what makes it so exciting but also causes all the problems. Traditionally, scholarly journals and academic presses (organisations that rarely pay their authors) have addressed this problem with a combination of carrots and sticks: they encourage people to contribute by providing enough prestige to make it worth their while to submit well researched articles, and they keep the bad stuff out by getting disciplinary referees to review the submitted articles before they are printed.

A true intellectual commons lacks both a system providing rewards and preventing folly. Perhaps for this reason, most academic commons rely on some kind of gatekeeper: you need to be approved by a moderator if you want to join a mailing list; you need to submit a CV if you want to be able to annotate an online edition; you need to have your login approved by a Sysop if you want to contribute to a scholarly wiki. Even beyond this, such projects also usually engage in editorial control: spammers are cut off, trolls and flamers are banned, and wiki or annotation contributions are reviewed for quality by some central person or group.

These approaches are effective on the whole at preventing or mitigating abuse by unqualified or out-of-control people. They do, however, suffer from two main problems:

  1. They scale very badly: while a gate keeper or moderator can vet or edit contributions from a small number of people, this gets progressively more difficult as the community expands.
  2. They represent a compromise on the thing that makes commons different and exciting in the first place: the possibility for unnegotiated collaboration and exchange.

Scaling is probably not an issue for most academic projects. Digital Medievalist is a relatively big project now, for example, and it is only approaching 250 members. Numbers like this are relatively easy to control. The costs one would incur in trying to develop an automatic vetting system for a market this size would vastly outweigh any future benefit.

Other disciplines, however, have been faced by this scaling problem—and managed to find partial solutions that in my opinion do a better job of maintaining the unnegotiated quality that make successful commons what they are.

One solution commonly proposed solution is to rely on distributed moderation—or, in simple terms—allow the users to police themselves. This has the advantage of being completely scalable—the number of moderators increases with the number of users. As we saw in my experience with the Wikipedia, moreover, this system actually actually can work: many (perhaps most) errors on the Wikipedia are corrected after a while and unqualified or insincere contributors often do get reined in.

But of course my experience with the Wikipedia also shows the problem with this approach. If everybody can be a moderator, then the unqualified can be as well. They can, as a result, replace good work with bad as easily as others can replace bad work with good.

A solution to this problem is to allow moderation only by respected members of the community. This is the system at Slashdot.org, a newservice for technological news. There contributors acquire a reputation based on other’s opinions of their contributions; those with high reputation scores are then added to a pool from which moderators are drawn each week (the system is actually much more complex, but the details are not important here).

Systems such as this tend to suffer from complexity: Slashdot also has meta-moderation and nobody seems very happy with anybody else even then. Moreover, members have a tendency both to game the system in order to increase their own reputations and lower those of their “enemies”.

At Digital Medievalist, we have been thinking of a slightly different model of distributed moderation, which we describe as an apprenticeship model: in this solution, newcomers are assigned relatively limited editorial, moderation, and compositional powers. These powers then increase as one’s contributions are allowed to stand by other members of the community. Initially, one might be allowed only to correct typos; as people accept your corrections, you are allowed greater editorial powers—perhaps you can rewrite entire sections or contribute new articles. If, however, your contributions begin to be rolled back, your powers shrink accordingly: the idea is ultimately a version of the Peter Principle: you rise to the point at which you are perceived to become incompetent. The main difference is that we then try to push you back down a step to the last place in the hierarchy in which you knew what you were doing.

This method would require considerable software design, and so, currently, is outside our ability. It would have the advantage over the Slashdot method, however, both of scoring ‘reputation’ on the basis of audience’s real behaviour (reducing your ‘enemy’s’ score requires you to take the time to reverse his or her edits?) and of keeping track of reputation not by points (which encourage people to be competitive) but by permissions. A points system encourages people ask themselves how much they are worth; a permissions system encourages them to take on extra responsibility.

Moderation systems are essentially negative: they exist to prevent people from messing things up. As I noted earlier, however, commons also have the positive problem of trying to encourage good work: the most strictly refereed journal in the world, as the PMLA discovered a few years back, is no good if nobody submits articles to be vetted.

This is an area in which scholarly projects seem to have devoted less attention. While most projects with commons-type environments have explicit moderation policies, few if any I have seen have explicit reward policies. They tend to have gatekeepers but no paymasters. Perhaps as a result most also seem to be the work of a very small number of people—even in the case of organisations with large numbers of members.

Once again, solutions for this problem can be found in other disciplines. The Open Source software movement, for example, relies on high quality contributions from volunteers. Open Source groups often reward significant contributors by treating work on the project as a form of “sweat equity” that allows them special privileges: eligibility for membership on the executive or board, for example, or voting rights or even basic membership.

A second solution commonly used is to give significant contributors some kind of token that sets them apart from others. This can be as valuable as the right to administer or moderate others (Slashdot), or as minor as extra stars beside your user name in the forum (Ubuntu).

Both of these solutions can be adapted to the scholarly commons. At Digital Medievalist, we are currently putting together new bylaws that will treat contributions as a condition of membership: people who contribute to the wiki, mailing list, or journal, will be eligible to assume responsibility as board members or officers of the project (I suspect giving away extra stars for good contributions might not be as effective in the academic world as it seems to be in the Open Source one—though given how important psychologically such distinctions are, perhaps they would). A second possibility for reward—albeit one fraught with difficulties—might be to award named authorship of material on which an contributor has worked: either by naming people at the top of the article or adding them as contributors to a colophon on the project as a whole.

The two easiest solutions to this problem of reward, however, are probably those used by the Wikipedia to get me to revise my article on Cædmon rather than work on this talk: offer special status for particularly well done work, and design the project as a whole so that it is a natural outlet for work that otherwise might not be used. At the Digital Medievalist, we already run a peer-reviewed journal alongside our wiki-based commons. A project with a different focus might certify certain articles in some way: as “refereed” vs. “open forum”, perhaps, and identify the authors and major contributors. Our project, moreover, is set up to provide a forum in which users can publish material they might otherwise find hard to use in furthering their careers: solutions to technical problems in digital humanities such as the development of stylesheets and databases, that are not commonly published by the major disciplinary journals.

The intellectual commons represents a new, purely digital approach to publication and the dissemination of scholarship. It is a model that cannot be replicated in print, and it is a model that many scholars feel intuitively at least will become a major force in the future of scholarly communication. In order for it to realise its potential, however, we must first find an economic model that encourages us to contribute our best work and provides for some kind of quality control—without sacrificing the very spontaneity that defines this new means of communication.

So why should I write for your Wiki? Until we answer this question, the Wiki will not live up to its full scholarly potential.

1 Update: The entry ultimately gained feature status.

----  

O Captain! My Captain! Using Technology to Guide Readers Through an Electronic Edition

Posted: Dec 15, 2006 16:12;
Last Modified: May 23, 2012 20:05

---

Original Publication Information: Heroic Age 8 (2005). http://www.heroicage.org/issues/8/em.html.

O CAPTAIN! my Captain! our fearful trip is done;
The ship has weather’d every rack, the prize we sought is won;
The port is near, the bells I hear, the people all exulting,
While follow eyes the steady keel, the vessel grim and daring:
  But O heart! heart! heart!
    O the bleeding drops of red,
      Where on the deck my Captain lies,
        Fallen cold and dead.

Walt Whitman, Leaves of Grass

Digital vs. Print editions

§1. Most theoretical discussions of electronic editing attribute two main advantages to the digital medium over print: interactivity and the ability to transcend the physical limitations of the page1. From a production standpoint, printed books are static, linearly organised, and physically limited. With a few expensive or unwieldy exceptions, their content is bound in a fixed, unchangeable order, and required to fit on standard-sized, two dimensional pages. Readers cannot customise the physical order in which information is presented to them, and authors are restricted in the type of material they can reproduce to that which can be presented within the physical confines of the printed page2.

§2. Electronic editions, in contrast, offer readers and authors far greater flexibility. Content can be reorganised on demand in response to changing user needs through the use of links, search programs, and other utilities. The physical limitations of the screen can be overcome in part through the intelligent use of scrolling, dynamically generated content, frames, and other conventions of the electronic medium. The ability to organise and present non-textual material, indeed, has expanded the scope of the edition itself: it is becoming increasingly possible to edit physical objects and intellectual concepts as easily as literary or historical texts.

§3. Not surprisingly, this greater flexibility has encouraged electronic editors to experiment with the conventions of their genre. As McGann has argued, the traditional print-based critical edition is a machine of knowledge (McGann 1995). Its conventions developed over several centuries in response to a complex interplay of intellectual pressures imposed by the requirements of its subject and technical pressures imposed by requirements of its form:

Scholarly editions comprise the most fundamental tools in literary studies. Their development came in response to the complexity of literary works, especially those that had evolved through a long historical process (as one sees in the bible, Homer, the plays of Shakespeare). To deal with these works, scholars invented an array of ingenious machines: facsimile editions, critical editions, editions with elaborate notes and contextual materials for clarifying a work’s meaning. The limits of the book determined the development of the structural forms of these different mechanisms; those limits also necessitated the periodic recreation of new editions as relevant materials appeared or disappeared, or as new interests arose.

With the elimination of (many) traditional constraints faced by their print predecessors, electronic editors have been free to reconceive the intellectual organisation of their work. The ability to construct electronic documents dynamically and interactively has allowed editors to reflect contemporary doubts about the validity of the definitive critical text. Cheap digital photography and the ability to include sound and video clips has encouraged them to provide far more contextual information than was ever possible in print. With the judicious use of animation, virtual reality, and other digital effects, electronic editions are now able to recreate the experience of medieval textuality in ways impossible to imagine in traditional print editions.

Print Convention vs. Electronic Innovation

§4. The increased freedom enjoyed by electronic editors has brought with it increased responsibility. Because they work in a well established and highly standardised tradition, print-based editors are able to take most organisational aspects of their editions for granted. With some minor variation, print-based textual editions are highly predictable in the elements they contain, the order in which these elements are arranged, and they way in which they are laid out on the page (for examples and facsimiles of the major types, see Greetham 1994). In print editions, the textual introduction always appears before the critical text; the apparatus criticus always appears at the bottom of the page or after the main editorial text; glossaries, when they appear, are part of the back matter; contextual information about witnesses or the literary background to the text appears in the introduction. Publishers commonly require these elements to be laid out in a house style; beginning editors can look up the required elements in one of several standard studies (e.g. Greetham 1994, West 1973, Willis 1972).

§5. No such standardisation exists for the electronic editor (Robinson 2005)3. Few if any publishing houses have a strong house style for electronic texts, and, apart from a sense that electronic editions should include high quality colour images of all known witnesses, there are, as yet, few required elements. Electronic editions have been published over the last several years without textual introductions (Kiernan 1999), without critical texts (Solopova 2000), without a traditional textual apparatus (De Smedt and Vanhoutte 2000), and without glossaries (Adams et al. 2000)4. There are few standards for mise en page: some editions attempt to fit as much as possible into a single frameset (Slade 2002); others require users to navigate between different documents or browser tabs (Stolz 2003). Facsimiles can appear within the browser window or in specialised imaging software (cf. McGillivray 1997 vs. Adams et al. 2000): there are as yet few universally observed standards for image resolution, post-processing, or file formats. User interfaces differ, almost invariably, from edition to edition, even among texts issued by the same project or press (cf. Bordalejo 2003 vs. Solopova 2000). Where readers of print editions can expect different texts to operate in an approximately similar fashion, readers approaching new electronic texts for the first time cannot expect their text’s operation to agree with that of other editions they have consulted5.

Technology for Technology’s Sake?

§6. The danger this freedom brings is the temptation towards novelty for novelty’s sake. Freed largely from the constraints of pre-existing convention, electronic editors can be tempted to towards technological innovations that detract from the scholarly usefulness of their projects.

Turning the Page (British Library)

§7. Some innovations can be more annoying than harmful. The British Library Turning the Pages, series, for example, allows readers to mimic the action of turning pages in a manuscript facsimile (http://www.bl.uk/collections/treasures/digitisation4.html). When users click on the top or bottom corner of the manuscript page and drag the cursor to the opposite side of the book, they are presented with an animation showing the page being turned over. If they release the mouse button before the page has been pulled approximately 40% of the way across the visible page spread, virtual gravity takes over and the page falls back into its original position.

§8. This is an amusing animation, and well suited to its intended purpose as an interactive program that allows museums and libraries to give members of the public access to precious books while keeping the originals safely under glass ( http://www.armadillosystems.com/ttp_commercial/home.htm). Scholars interested in the texts as research objects, however, are likely to find the system less attractive. The page-turning system uses an immense amount of memory—the British Library estimates up to 1 GB of RAM for high quality images ( http://www.armadillosystems.com/ttp_commercial/techspec.htm)—and the requirement that users drag pages across the screen makes paging through an edition a time- and attention-consuming activity: having performed an action that indicates that they wish an event to occur (clicking on the page in question), users are then required to perform additional complex actions (holding the mouse button down while dragging the page across the screen) in order to effect the desired result. What was initially amusing rapidly becomes a major and unnecessary irritation.

A Wheel of Memory: The Hereford Mappamundi (Reed Kline 2001)

§9. Other innovations can be more harmful to the intellectual usefulness of a given project. A Wheel of Memory: The Hereford Mappamundi uses the Mappamundi as a conceit for the exploration of the medieval collective memory… using our own collective rota of knowledge, the CD-ROM (Reed Kline 2001, I audio). The edition has extremely high production values. It contains original music and professional narration. Images from the map6 and associated documents are displayed in a custom-designed viewing area that is itself in part a rota. Editorial material is arranged as a series of chapters and thematically organised explorations of different medieval Worlds: World of the Animals, World of the Strange Races, World of Alexander the Great, etc. With the exception of four numbered chapters, the edition makes heavy use of the possibilities for non-linear browsing inherent in the digital medium to organise its more than 1000 text and image files.

§10. In this case, however, the project’s innovative organisation and high production values are ultimately self-defeating.!(left)http://www.heroicage.org/issues/8/images/herefordwholemap.png| Despite its heavy reliance on a non-linear structural conceit, the edition itself is next to impossible to use or navigate in ways not anticipated by the project designers. Text and narration are keyed to specific elements of the map and edition and vanish if the user strays from the relevant hotspot: because of this close integration of text and image, it is impossible to compare text written about one area of the map with a facsimile of another. The facsimile of the map itself is also very difficult to study. The customised viewing area is of a fixed size (I estimate approximately 615×460 pixels) with more than half this surface given over to background and navigation: when the user chooses to view the whole map on screen, the 4 foot wide original is reproduced with a diameter of less than 350 pixels (approximately 1/10 actual size). Even then, it remains impossible to display the map in its entirety: in keeping with the project’s rota conceit, the facsimile viewing area is circular, even though the Hereford map itself is pentagonal: try as I might, I was unable ever to get a clear view of the border and image in the facsimile’s top corner.

Using Technology to Transcend Print

§11. The problem with the British Library and Hereford editions is not that they use innovative technology to produce unconventional editions. Rather, it is that they use this innovative technology primarily for effect rather than as a means of contributing something essential to the presentation of the underlying artifact. In both cases this results in editions that are superficially attractive, but unsuited to repeated use or serious study7. The British Library facsimiles lose nothing if the user turns off the “Turning the Page” technology (indeed, in the on-line version, an accessibility option allows users precisely this possibility); leaving the technology on comes at the cost of usability and memory. In the case of the Hereford Mappamundi, the emphasis on the rota navigational conceit and the project’s high production values get in the way of the map itself: the use of the round viewing area and fixed-width browser actually prevents the user from exploring the entire map, while the close integration of text, narration, and images ironically binds readers more closely to the editor’s view of her material than would be possible in a print edition.

Bayeux Tapestry (Foys 2003)

§12. Appropriately used, innovative technology can create editions that transcend the possibilities of print, however. This can be seen in the third edition discussed in this paper, The Bayeux Tapestry: Digital Edition8.

§13. On the one hand, the Bayeux Tapestry edition uses technology in ways that, at first glance, seem very similar to the Hereford Mappamundi and British Library facsimiles. Like the Mappamundi project, the Bayeux edition has very high production values and is presented using a custom-designed user interface (indeed, the Hereford and Bayeux projects both use the same Macromedia presentation software). Like the British Library facsimiles, the Bayeux project uses technology to imitate the physical act of consulting the medieval artifact: users of the Bayeux Tapestry edition, like visitors to the Bayeux Tapestry itself, move along what appears to be a seamless presentation of the entire 68 metre long object.

§14. The difference between The Bayeux Tapestry: Digital edition and the other two projects, however, is that in the Bayeux edition this technology plays an essential role in the representation of the underlying object. I am aware of no medieval manuscript that incorporates the act of turning the page into its artistic design; the Bayeux tapestry, however, was designed to be viewed as a single continuous document. By integrating hundreds of digital images into what behaves like a single facsimile, the Bayeux project allows users to consult the tapestry as its makers originally intended: moving fluidly from scene to scene and pausing every-so-often to examine individual panels or figures in greater detail.

§15. The organisation of the Bayeux edition is similarly well thought out. In contrast to the Hereford Mappamundi project, the Bayeux project is constructed around the object it reproduces. The opening screen shows a section from the facsimile (few screens would be able to accommodate the entire facsimile in reasonable detail) above a plot-line that provides an overview of the Tapestry’s entire contents in a single screen. Users can navigate the Tapestry scene-by-scene using arrow buttons at the bottom left of the browser window, centimetre by centimetre using a slider on the plot-line, or by jumping directly to an arbitrary point on the tapestry by clicking on the plot-line at the desired location. Tools, background information, other facsimiles of the tapestry, scene synopses, and notes are accessed through buttons at the bottom left corner of the browser. The first three types of material are presented in a separate window when chosen; the last two appear under the edition’s plot-line. Where the organisational conceit of the rota prevented users from accessing the entire Hereford map, the structure of the Bayeux edition encourages users to explore the entire length of the Tapestry.

§16. The Bayeux project also does its best to avoid imposing a particular structure on its editorial content. Where the Hereford project proved extremely difficult to navigate in ways not anticipated by its editor, The Bayeux Tapestry contains a slideshow utility that allows users to reorder elements of the edition to suit their own needs. While few readers perhaps will need to use this in their own study, the utility will prove of the greatest benefit to teachers and lecturers who wish to use the edition to illustrate their own work.

Conclusion

§17. The interactivity, flexibility, and sheer novelty of digital media bring with them great challenges for the electronic editor. Where scholars working in print can rely on centuries of precedent in designing their editions, those working in digital media still operate for the most part in the absence of any clear consensus as to even the most basic expectations of the genre. This technological freedom can, on the one hand, be extremely liberating: electronic editors can now produce editions of a much wider range of texts, artifacts, and concepts than was ever possible in print. At the same time, however, this freedom can also lead to the temptation of using technology for its own sake.

§18. The three projects discussed in this column have been produced by careful and technologically sophisticated researchers. The differences among them lie for the most part in the way they match their technological innovation to the needs of the objects they reproduce. The British Library and Hereford Mappamundi projects both suffer from an emphasis on the use of advanced technology for largely decorative purposes; both would be easier to use without much of their most superficially attractive technological features. The Bayeux Tapestry project, on the other hand, succeeds as an electronic text because it uses advanced technology that is well suited to its underlying object and allows it to be presented in a fashion difficult, if not impossible, in any other medium. Users of the British Library and Hereford facsimiles may find themselves wishing for a simpler presentation; few users of the Bayeux tapestry would wish that this edition had been published in book form.

Notes

1 This is a commonplace. For an influential discussion, see McGann 1995. Strictly speaking, print and digital/electronic in this discussion refer to accidentals of display rather than essential features of composition and storage. Texts composed and stored digitally can be displayed in print format, in which case they are subject to the same limitations as texts composed and stored entirely on paper. The importance of this distinction between composition and display is commonly missed in early theoretical discussions, which tend to concentrate exclusively on possibilities for on-screen display. In fact, as recent commercial and scholarly applications of xml are demonstrating, the real advantage of electronic composition and storage is reusability. Properly designed electronic texts can be published simultaneously in a number of different formats, allowing users to take advantage of the peculiar strengths and weaknesses of each. In my view, the most profound difference between electronic and print texts lies in the separation of content and presentation which makes this reuse of electronic texts possible.

2 It is easy to overemphasise the limitations of print and the flexibility of digital display. While books are for the most part physically static and two dimensional (the main exception are books published as loose pages intended for storage in binders or picture books with three dimensional figures), they are intellectually flexible: readers are free to reorganise them virtually by paging back and forth or using a table of contents or index to find and extract relevant information. In certain cases, e.g. dictionaries and encyclopedias, this intellectual flexibility is an essential feature of the genre. Not surprisingly, these genres were also among the first and most successful titles to be published in electronic format. Screens, for all their flexibility and interactivity, remain two-dimensional display devices subject to many of the same limitations of the printed page.

3 Robinson’s important article came to my attention after this column was in proof.

4 The observation that these examples are missing one or more traditional elements of a print edition is not intended as a criticism. Not all editions need all traditional parts, and, in several cases, the editorial approach used explicitly precludes the inclusion of the missing element. What the observation does demonstrate, however, is that no strong consensus exists as to what must appear in an electronic critical edition. The only thing common to all is the presence of facsimiles.

5 One possible objection to the above list of examples is that I am mixing editions produced using very different technologies over the greater part of a decade (a long time in humanities computing). This technical fluidity is one of the reasons for the lack of consensus among electronic editors, however. Since in most cases, moreover, the technology has aged before the editorial content (eight years is a relatively short time in medieval textual studies), the comparison is also valid from a user’s perspective: as a medievalist, I am as likely to want to consult the first disc in the Canterbury Tales Project as I am the most recent.

6 As is noted in the introduction to the edition, the facsimile reproduces a nineteenth-century copy of the Hereford map rather than the medieval Mappamundi itself. The images in the Bayeux disc discussed below are similarly based on facsimiles—albeit in this case photographs of the original tapestry.

7 This is less of a problem in the case of the British Library series, which presents itself primarily as an aid for the exhibition of manuscripts to the general public rather than a serious tool for professional scholars. The intended audience of the Mappamundi project is less certain: it is sold by a university press and seems to address itself to both scholars and students; much of its content, however, seems aimed at the high school level. The design flaws in both texts seem likely to discourage repeated use by scholars and members of the general public alike.

8 In the interests of full disclosure, readers should be aware that I am currently associated with Foys in several on-going projects. These projects began after the publication of Foys 2003, with which I am not associated in any way.

Works Cited

----  

The Doomsday Machine, or, "If you build it, will they still come ten years from now?": What Medievalists working in digital media can do to ensure the longevity of their research

Posted: Dec 15, 2006 13:12;
Last Modified: May 23, 2012 20:05

---

Original Publication Information: Heroic Age 7 (2004). http://www.heroicage.org/issues/7/ecolumn.html.

Yes, but the… whole point of the doomsday machine… is lost… if you keep it a secret!

Dr. Strangelove

It is, perhaps, the first urban myth of humanities computing: the Case of the Unreadable Doomsday Machine. In 1986, in celebration of the 900th anniversary of William the Conqueror’s original survey of his British territories, the British Broadcasting Corporation (BBC) commissioned a mammoth 2.5 million electronic successor to the Domesday Book. Stored on two 12 inch video laser discs and containing thousands of photographs, maps, texts, and moving images, the Domesday Project was intended to provide a high-tech picture of life in late 20th century Great Britain. The project’s content was reproduced in an innovative early virtual reality environment and engineered using some of the most advanced technology of its day, including specially designed computers, software, and laser disc readers (Finney 1986).

Despite its technical sophistication, however, the Domesday Project was a flop by almost any practical measure. The discs and specialized readers required for accessing the project’s content turned out to be too expensive for the state-funded schools and public libraries that comprised its intended market. The technology used in its production and presentation also never caught on outside the British government and school system: few other groups attempted to emulate the Domesday Project’s approach to collecting and preserving digital material, and no significant market emerged for the specialized computers and hardware necessary for its display (Finney 1986, McKie and Thorpe 2003). In the end, few of the more than one million people who contributed to the project were ever able to see the results of their effort.

The final indignity, however, came in March 2003 when, in a widely circulated story, the British newspaper The Observer reported that the discs had finally become “unreadable”:

16 years after it was created, the £2.5 million BBC Domesday Project has achieved an unexpected and unwelcome status: it is now unreadable.

The special computers developed to play the 12in video discs of text, photographs, maps and archive footage of British life are — quite simply — obsolete.

As a result, no one can access the reams of project information — equivalent to several sets of encyclopedias — that were assembled about the state of the nation in 1986. By contrast, the original Domesday Book — an inventory of eleventh-century England compiled in 1086 by Norman monks — is in fine condition in the Public Record Office, Kew, and can be accessed by anyone who can read and has the right credentials. ‘It is ironic, but the 15-year-old version is unreadable, while the ancient one is still perfectly usable,’ said computer expert Paul Wheatley. ‘We’re lucky Shakespeare didn’t write on an old PC.’ (McKie and Thorpe 2003)

In fact, the situation was not as dire as McKie and Thorpe suggest. For one thing, the project was never actually “unreadable,” only difficult to access: relatively clean copies of the original laser discs still survive, as do a few working examples of the original computer system and disc reader (Garfinkel 2003). For another, the project appears not to depend, ultimately, on the survival of its obsolete hardware. Less than ten months after the publication of the original story in The Observer, indeed, engineers at Camileon, a joint project of the Universities of Leeds and Michigan, were able to reproduce most if not quite all the material preserved on the original 12 inch discs using contemporary computer hardware and software (Camileon 2003a; Garfinkel 2003).

The Domesday Project’s recent history has some valuable, if still contested, lessons for librarians, archivists, and computer scientists (see for example the discussion thread to Garfinkel 2003; also Camileon 2003b). On the one hand, the fact that engineers seem to be on the verge of designing software that will allow for the complete recovery of the project’s original content and environment is encouraging. While it may not yet have proven itself to be as robust as King William’s original survey, the electronic Domesday Project now at least does appear have been saved for the foreseeable future-even if “foreseeable” in this case may mean simply until the hardware and software supporting the current emulator itself becomes obsolete.

On the other hand, however, it cannot be comforting to realise that the Domesday Project required the adoption of such extensive and expensive restoration measures in the first place less than two decades after its original composition: the discs that the engineers at Camileon have devoted the last ten months to recovering have turned out to have less than 2% the readable lifespan enjoyed by their eleventh-century predecessor. Even pulp novels and newspapers published on acidic paper at the beginning of the last century have proved more durable under similarly controlled conditions.1 While viewed in the short term, digital formats do appear to offer a cheap method of preserving, cataloguing, and especially distributing copies of texts and other cultural material, their effectiveness and economic value as a means of long-term preservation has yet to be demonstrated completely.

These are, for the most part, issues for librarians, archivists, curators, computer scientists, and their associations: their solution will almost certainly demand resources, a level of technical knowledge, and perhaps most importantly, a degree of international cooperation far beyond that available to most individual humanities scholars (Keene 2003). In as much as they are responsible for the production of an increasing number of electronic texts and resources, however, humanities scholars do have an interest in ensuring that the physical record of their intellectual labour will outlast their careers. Fortunately there are also some specific lessons to be learned from the Domesday Project that are of immediate use to individual scholars in their day-to-day research and publication.

1. Do not write for specific hardware or software.

Many of the preservation problems facing the Domesday Project stem from its heavy reliance on specific proprietary (and often customized) hardware and software. This reliance came about for largely historical reasons. The Domesday Project team was working on a multimedia project of unprecedented scope, before the Internet developed as a significant medium for the dissemination of data.2 In the absence of suitable commercial software and any real industry emphasis on inter-platform compatibility or international standards, they were forced to custom-build or commission most of their own hardware and software. The project was designed to be played from a specially-designed Phillips video-disc player and displayed using custom-built software that functioned best on a single operating platform: the BBC Master, a now obsolete computer system which, with the related BBC Model B, was at the time far more popular in schools and libraries in the United Kingdom than the competing Macintosh, IBM PC, or long forgotten Acorn systems.3

With the rise of the internet and the development of well-defined international standard languages such as Standard General Markup Language (SGML), HyperText Markup Language (HTML), eXtensible Markup Language (XML), and Hypermedia/Time-based Structuring Language (HyTime), few contemporary or future digital projects are likely to be as completely committed to a single specific hardware or software system as the Domesday Project. This does not mean, however, that the temptation to write for specific hardware or software has vanished entirely. Different operating systems allow designers to use different, often incompatible, shortcuts for processes such as referring to colour, assigning fonts, or referencing foreign characters (even something as simple as the Old English character thorn can be referred to in incompatible ways on Windows and Macintosh computers). The major internet browsers also all have proprietary extensions and idiosyncratic ways of understanding supposedly standard features of the major internet languages. It is very easy to fall into the trap of adapting one’s encoding to fit the possibilities offered by non-standard extensions, languages, and features of a specific piece of hardware or software.

The very real dangers of obsolescence this carries with it can be demonstrated by the history of the Netscape and tags. Introduced with the Netscape 4.0 browser in early 1997, the and tags were proprietary extensions of HTML that allowed internet designers to position different parts of their documents independently of one another on the screen: to superimpose one piece of a text over another, to place text over (or under) images, or to remove one section of a line from the main textual flow and place it elsewhere (Netscape Communications Corporation 1997). The possibilities this extension opened up were exciting. In addition to enlivening otherwise boring pages with fancy typographic effects, the and elements also allowed web designers to create implicit intellectual associations among otherwise disparate elements in a single document. For example, one could use these tags to create type facsimiles of manuscript abbreviations by superimposing their component parts or create annotated facsimile editions by placing textual notes or transcriptions over relevant manuscript images.

As with the Domesday Project, however, projects that relied on these proprietary extensions for anything other than the most incidental effects were doomed to early obsolescence: the and tags were never adopted by the other major browsers and, indeed, were dropped by Netscape itself in subsequent editions of its Navigator browser. Thus an annotated manuscript facsimile coded in mid 1997 to take advantage of the new Netscape 4.0 and tags would, with the release of Netscape 5.0 at the end of 1999, already be obsolete. Users who wished to maintain the presumably intellectually significant implicit association between the designer’s notes and images in this hypothetical case would need either to maintain (or recreate) a working older version of the Netscape browser on their system (an increasingly difficult task as operating systems themselves go through subsequent alterations and improvements) or to convert the underlying files to a standard encoding.

2. Maintain a distinction between content and presentation

A second factor promoting the early obsolescence of the Domesday Project was its emphasis on the close integration of content and presentation. The project was conceived of as a multimedia experience and its various components-text, video, maps, statistical information-often acquired meaning from their interaction, juxtaposition, sequencing, and superimposition (Finney 1986, “Using Domesday”; see also Camileon 2003b). In order to preserve the project as a coherent whole, indeed, engineers at Camileon have had to reproduce not only the project’s content but also the look and feel of the specific software environment in which it was intended to be searched and navigated (Camileon 2003b).

Here too the Domesday Project designers were largely victims of history. Their project was a pioneering experiment in multimedia organisation and presentation and put together in the virtual absence of now standard international languages for the design and dissemination of electronic documents and multimedia projects — many of which, indeed, were in their initial stages of development at the time the BBC project went to press.4

More importantly, however, these nascent international standards involved a break with the model of electronic document design and dissemination employed by the Domesday Project designers. Where the Domesday Project might be described as an information machine — a work in which content and presentation are so closely intertwined as to become a single entity — the new standards concentrated on establishing a theoretical separation between content and presentation (see Connolly 1994 for a useful discussion of the distinction between “programmable” and “static” document formats and their implications for document conversion and exchange). This allows both aspects of an electronic to be described separately and, for the most part, in quite abstract terms which are then left open to interpretation by users in response to their specific needs and resources. It is this flexibility which helped in the initial popularization of the World Wide Web: document designers could present their material in a single standard format and, in contrast to the designers of the Domesday Project, be relatively certain that their work would remain accessible to users accessing it with various software and hardware systems — whether this was the latest version of the new Mosaic browser or some other, slightly older and non-graphical interface like Lynx (see Berners-Lee 1989-1990 for an early summary of the advantages of multi-platform support and a comparison with early multi-media models such as that adopted by the Domesday Project). In recent years, this same flexibility has allowed project designers to accommodate the increasingly large demand for access to internet documents from users of (often very advanced) non-traditional devices: web activated mobile phones, palm-sized digital assistants, and of course aural screen readers and Braille printers.

In theory, this flexibility also means that where engineers responsible for restoring the Domesday Project have been forced to emulate the original software in order to recreate the BBC designer’s work, future archivists will be able to restore current, standards-based, electronic projects by interpreting the accompanying description of their presentation in a way appropriate to their own contemporary technology. In some cases, indeed, this restoration may not even require the development of any actual computer software: a simple HTML document, properly encoded according to the strictest international standards, should in most cases be understandable to the naked eye even when read from a paper printout or text-only display.

In practice, however, it is still easy to fall into the trap of integrating content and presentation. One common example involves the use of table elements for positioning unrelated or sequential text in parallel “columns” on browser screens (see Chisholm, Vanderheiden, et al. 2000, § 5). From a structural point of view, tables are a device for indicating relations among disparate pieces of information (mileage between various cities, postage prices for different sizes and classes of mail, etc.). Using tables to position columns, document designers imply in formal terms the existence of a logical association between bits of text found in the same row or column — even if the actual rationale for this association is primarily aesthetic. While the layout technique, which depends on the fact that all current graphic-enabled browsers display tables by default in approximately the same fashion, works well on desktop computers, the same trick can produce nonsensical text when rendered on the small screen of a mobile phone, printed by a Braille output device, or read aloud by an aural browser or screen-reader. Just as importantly, this technique too can lead to early obsolescence or other significant problems for future users. Designers of a linguistic corpus based on specific types of pre-existing electronic documents, for example, might be required to devote consider manual effort to recognising and repairing content arbitrarily and improperly arranged in tabular format for aesthetic reasons.

3. Avoid unnecessary technical innovation

A final lesson to be learned from the early obsolescence of the Domesday Project involves the hidden costs of technical innovation. As a pioneering electronic document, the Domesday Project was in many ways an experiment in multimedia production, publication, and preservation. In the absence of obvious predecessors, its designers were forced to develop their own technology, organisational outlines, navigation techniques, and distribution plans (see Finney 1986 and Camileon 2003a for detailed descriptions). The fact that relatively few other projects adopted their proposed solutions to these problems — and that subsequent developments in the field led to a different focus in electronic document design and dissemination — only increased the speed of the project’s obsolescence and the cost and difficulty of its restoration and recovery.

Given the experimental status of this specific project, these were acceptable costs. The Domesday Project was never really intended as a true reference work in any usual sense of the word.5 Although it is full of information about mid-1980s Great Britain, for example, the project has never proved to be an indispensable resource for study of the period. While it was inspired by William the Conqueror’s great inventory of post-conquest Britain, the Domesday Project was, in the end, more an experiment in new media design than an attempt at collecting useful information for the operation of Mrs. Thatcher’s government.

We are now long past the day in which electronic projects can be considered interesting simply because they are electronic. Whether they are accessing a Z39.50 compliant library catalogue, consulting an electronic journal on JSTOR, or accessing an electronic text edition or manuscript facsimile published by an academic press, users of contemporary electronic projects by-and-large are now more likely to be interested in the quality and range of an electronic text’s intellectual content than the novelty of its display, organisation or technological features (Nielsen 2000). The tools, techniques, and languages available to producers of electronic projects, likewise, are now far more standardised and helpful than those available to those responsible for electronic incunabula such as the Domesday Project.

Unfortunately this does not mean that contemporary designers are entirely free of the dangers posed by technological experimentation. The exponential growth of the internet, the increasing emphasis on compliance with international standards, and the simple pace of technological change over the last decade all pose significant challenges to the small budgets and staff of many humanities computing projects. While large projects and well-funded universities can sometimes afford to hire specialized personnel to follow developments in computing design and implementation and freeing other specialists to work on content development, scholars working on digital projects in smaller groups, at less well-funded universities, or on their own often find themselves responsible for both the technological and intellectual components of their work. Anecdotal evidence suggests that such researchers find keeping up with the pace of technological change relatively difficult — particularly when it comes to discovering and implementing standard solutions to common technological problems (Baker, Foys, et al. 2003). If the designers of the Domesday Project courted early obsolescence because their pioneering status forced them to design unique technological solutions to previously unresolved problems, many contemporary humanities projects appear to run same risk of obsolescence and incompatibility because their inability to easily discover and implement best practice encourages them to continuously invent new solutions to already solved problems (HATII and NINCH 2002, NINCH 2002-2003, Healey 2003, Baker, Foys, et al. 2003 and O’Donnell 2003).

This area of humanities computing has been perhaps the least well served by the developments of the last two decades. While technological changes and the development of well-designed international standards have increased opportunities for contemporary designers to avoid the problems which led to the Domesday Project’s early obsolescence, the absence of a robust system for sharing technological know-how among members of the relevant community has remained a significant impediment to the production of durable, standards-based projects. Fortunately, however, improvements are being made in this area as well. While mailing lists such humanist-l and tei-l long have facilitated exchange of information on aspects of electronic project design and implementation, several new initiatives have appeared over the last few years which are more directly aimed at encouraging humanities computing specialists to share their expertise and marshal their common interests. The Text Encoding Initiative (TEI) has recently established a number of Special Interest Groups (SIGs) aimed at establishing community practice in response to specific types of textual encoding problems. Since 1993, the National Initiative for a Networked Cultural Heritage (NINCH) has provided a forum for collaboration and development of best practice among directors and officers of major humanities computing projects. The recently established TAPoR project in Canada and the Arts and Humanities Data Service (AHDS) in the United Kingdom likewise seek to serve as national clearing houses for humanities computing education and tools. Finally, and aimed more specifically at medievalists, the Digital Medievalist Project (of which I am currently director) is seeking funding to establish a “Community of Practice” for medievalists engaged in the production of digital resources, through which individual scholars and projects will be able to pool skills and practice acquired in the course of their research (see Baker, Foys, et al. 2003). Although we are still in the beginning stages, there is increasing evidence that humanities computing specialists are beginning to recognise the extent to which the discovery of standardised implementations and solutions to common technological problems is likely to provide as significant a boost to the durability of electronic resources as the development of standardised languages and client-side user agents in the late 1980s and early 1990s. We can only benefit from increased cooperation.

The Case of the Unreadable Doomsday Machine makes for good newspaper copy: it pits new technology against old in an information-age version of nineteenth-century races between the horse and the locomotive. Moreover, there is an undeniable irony to be found in the fact that King William’s eleventh-century parchment survey has thus far proven itself to be more durable than the BBC’s 1980s computer program.

But the difficulties faced by the Domesday Project and its conservators are neither necessarily intrinsic to the electronic medium nor necessarily evidence that scholars at work on digital humanities projects have backed wrong horse in the information race. Many of the problems which led to the Domesday Project’s early obsolescence and expensive restoration can be traced to its experimental nature and the innovative position it occupies in the history of humanities computing. By paying close attention to its example, by learning from its mistakes, and by recognising the often fundamentally different ways in which contemporary humanities computing projects differ from such digital incunabula, scholars can contribute greatly to the likelihood that their current projects will remain accessible long after their authors reach retirement age.

Notes

1 See the controversy between Baker 2002 and [Association of Research Libraries] 2001, both of whom agree that even very acidic newsprint can survive “several decades” in carefully controlled environments.

2 The first internet browser, “WorldWideWeb,” was finished by Tim Berners-Lee at CERN (Conseil Européen pour la Recherche Nucléaire) on Christmas Day 1990. The first popular consumer browser able to operate on personal computer systems was the National Center for Supercomputing Applications (NCSA) Mosaic (a precursor to Netscape), which appeared in 1993. See [livinginternet.com] 2003 and Cailliau 1995 for brief histories of the early browser systems. The first internet application, e-mail, was developed in the early 1970s ([www.almark.net] 2003); until the 1990s, its use was restricted largely to university researchers and the U.S. military.

3 Camileon 2003; See McMordie 2003 for a history of the Acorn platform.
fn4. SGML, the language from which HTML is derived, was developed in the late 1970s and early 1980s but not widely used until the mid-to-late 1980s ([SGML Users’ Group] 1990). HyTime, a multimedia standard, was approved in 1991 ([SGML SIGhyper] 1994).
fn5. This is the implication of Finney 1986, who stresses the project’s technically innovative nature, rather than its practical usefulness, throughout.

Reference List

----  

Disciplinary impact and technological obsolescence in digital medieval studies

Posted: Dec 15, 2006 13:12;
Last Modified: May 23, 2012 20:05

---

Daniel Paul O’Donnell
University of Lethbridge

Forthcoming in The Blackwell Companion to the Digital Humanities, ed. Susan Schriebman and Ray Siemens. 2007.

Final Draft. Do not quote without permission of the author.

In May 2004, I attended a lecture by Elizabeth Solopova at a workshop at the University of Calgary on the past and present of digital editions of medieval works1. The lecture looked at various approaches to the digitisation of medieval literary texts and discussed a representative sample of the most significant digital editions of English medieval works then available: the Wife of Bath’s Prologue from the Canterbury Tales Project (Robinson and Blake 1996), Murray McGillivray’s Book of the Duchess (McGillivray 1997), Kevin Kiernan’s Electronic Beowulf (Kiernan 1999), and the first volume of the Piers Plowman Electronic Archive (Adams et al. 2000). Solopova herself is an experienced digital scholar and the editions she was discussing had been produced by several of the most prominent digital editors then active. The result was a master class in humanities computing: an in-depth look at mark-up, imaging, navigation and interface design, and editorial practice in four exemplary editions.

From my perspective in the audience, however, I was struck by two unintended lessons. The first was how easily digital editions can age: all of the CD-ROMs Solopova showed looked quite old fashioned to my 2004 eyes in the details of their presentation and organisation and only two, Kiernan’s Beowulf and McGillivray’s Book of the Duchess, loaded and displayed on the overhead screen with no difficulties or disabled features.

For the purposes of Solopova’s lecture these failures were not very serious: a few missing characters and a slightly gimpy display did not affect her discussion of the editions’ inner workings and indeed partially illustrated her point concerning the need to plan for inevitable technological obsolescence and change at all stages of edition design. For end-users consulting these editions in their studies or at a library, however, the problems might prove more significant: while well-designed and standards-based editions such as these can be updated in order to accommodate technological change, doing so requires skills that are beyond the technological capabilities of most humanities scholars; making the necessary changes almost certainly requires some post-publication investment on the part of the publisher and/or the original editors. Until such effort is made, the thought and care devoted by the original team to editorial organisation and the representation of textual detail presumably is being lost to subsequent generations of end users.

The second lesson I learned was that durability was not necessarily a function of age or technological sophistication. The editions that worked more-or-less as intended were from the middle of the group chronologically and employed less sophisticated technology than the two that had aged less well: they were encoded in relatively straightforward HTML (although Kiernan’s edition makes sophisticated use of Java and SGML for searching) and rendered using common commercial web browsers. The projects that functioned less successfully were encoded in SGML and were packaged with sophisticated custom fonts and specialised rendering technology: the Multidoc SGML browser in the case of the Piers Plowman Electronic Archive and the Dynatext display environment in the case of the Canterbury Tales Project. Both environments were extremely advanced for their day and allowed users to manipulate text in ways otherwise largely impossible before the development and widespread adoption of XML and XSL-enabled browsers.

Neither of these lessons seems very encouraging at first glance to medievalists engaged in or investigating the possibilities of using digital media for new projects. Like researchers in many humanities disciplines, medievalists tend to measure scholarly currency in terms of decades, not years or months. The standard study of the Old English poem Cædmon’s Hymn before my recent edition of the poem (O’Donnell 2005a) was published nearly 70 years ago. Reference works like Cappelli’s Dizionario di abbreviature latine ed italiane (first edition, 1899) or Ker’s Catalogue of manuscripts containing Anglo-Saxon (first edition, 1959) also commonly have venerable histories. In the case of the digital editions discussed above—especially those already showing evidence of technological obsolescence—it is an open question whether the scholarship they contain will be able to exert nearly the same long-term influence on their primary disciplines. Indeed, there is already some evidence that technological or rhetorical problems may be hindering the dissemination of at least some of these otherwise exemplary projects’ more important findings. Robinson, for example, reports that significant manuscript work by Daniel Mosser appearing in various editions of the Canterbury Tales Project is cited far less often than the importance of its findings warrant (Robinson 2005: § 11).

The lesson one should not draw from these and other pioneering digital editions, however, is that digital projects are inevitably doomed to early irrelevance and undeserved lack of disciplinary impact. The history of digital medieval scholarship extends back almost six decades to the beginnings of the Index Thomisticus by Roberto Busa in the mid 1940s (see Fraser 1998 for a brief history). Despite fundamental changes in focus, tools, and methods, projects completed during this time show enough variety to allow us to draw positive as well as negative lessons for future work. Some digital projects, such as the now more than thirty-year-old Dictionary of Old English (DOE), have proven themselves able to adapt to changing technology and have had an impact on their disciplines—and longevity—as great as the best scholarship developed and disseminated in print. Projects which have proven less able to avoid technological obsolescence have nevertheless also often had a great effect on our understanding of our disciplines, and, in the problems they have encountered, can also offer us some cautionary lessons (see Keene n.d. for a useful primer in conservation issues and digital texts).

Premature obsolescence: The failure of the Information Machine

Before discussing the positive lessons to be learned from digital medieval projects that have succeeded in avoiding technological obsolescence or looking ahead to examine trends that future digital projects will need to keep in mind, it is worthwhile considering the nature of the problems faced by digital medieval projects that have achieved more limited impact or aged more quickly than the intrinsic quality of their scholarship or relevance might otherwise warrant—although in discussing projects this way, it is important to realise that the authors of these often self-consciously experimental projects have not always aimed at achieving the standard we are using to judge their success: longevity and impact equal to that of major works of print-originated and disseminated scholarship in the principal medieval discipline.

In order to do so, however, we first need to distinguish among different types of obsolescence. One kind of obsolescence occurs when changes in computing hardware, software, or approach render a project’s content unusable without heroic efforts at recovery. The most famous example of this type is the Electronic Domesday Book, a project initiated by the BBC in celebration of the nine hundredth anniversary of King William’s original inventory of post-conquest Britain (Finney 1986-2006; see O’Donnell 2004 for a discussion). The shortcomings of this project have been widely reported: it was published on video disks that could only be read using a customised disk player; its software was designed to function on the BBC Master personal computer—a computer that at the time was more popular in schools and libraries in the United Kingdom than any competing system but is now hopelessly obsolete. Costing over £ 2.5 million, the project was designed to showcase technology that it was thought might prove useful to schools, governments, and museums interested in producing smaller projects using the same innovative virtual reality environment. Unfortunately, the hardware proved too expensive for most members of its intended market and very few people ended up seeing the final product. For sixteen years, the only way of accessing the project was via one of a dwindling number of the original computers and disk readers. More recently, after nearly a year of work by an international team of engineers, large parts of the project’s content finally has been converted for use on contemporary computer systems.

The Domesday project is a spectacular example of the most serious kind of technological obsolescence, but it is hardly unique. Most scholars now in their forties and fifties probably have disks lying around their studies containing information that is for all intents and purposes lost due to technological obsolescence—content written using word processors or personal computer database programmes that are no longer maintained, recorded on difficult to read media, or produced using computers or operating systems that ultimately lost out to more popular competitors. But the Domesday project did not become obsolete solely because it gambled on the wrong technology: many other digital projects of the time, some written for main-frame computers using languages and operating systems that that are still widely understood, have suffered a similar obsolescence even though their content theoretically could be recovered more easily.

In fact the Domesday Book project also suffered from an obsolescence of approach—the result of a fundamental and still ongoing change in how medievalists and others working with digital media approach digitisation. Before the second half of the 1980s, digital projects were generally conceived of information machines_—programs in which content was understood to have little value outside of its immediate processing context. In such cases, the goal was understood to be the sharing of results rather than content. Sometimes, as in the case of the Domesday Book, the goal was the arrangement of underlying data in a specific (and closed) display environment; more commonly, the intended result was statistical information about language usage and authorship or the development of indices and concordances (see for example, the table of contents in Patton and Holoien 1981, which consists entirely of database, concordance, and statistical projects). Regardless of the specific processing goal, this approach tended to see data as raw material rather than an end result2. Collection and digitisation were done with an eye to the immediate needs of the processor, rather than the representation of intrinsic form and content. Information not required for the task at hand was ignored. Texts encoded for use with concordance or corpus software, for example, commonly ignored capitalisation, punctuation, or mise-en-page. Texts encoded for interactive display were structured in ways suited to the planned output (see for example the description of database organisation and video collection in Finney 1986-2006). What information was recorded was often indicated using _ad hoc and poorly documented tokens and codes whose meaning now can be difficult or impossible to recover (see Cummings 2006).

The problem with this approach is that technology ages faster than information: data that require a specific processing context in order to be understood will become unintelligible far more rapidly than information that has been described as much as possible in its own terms without reference to a specific processing outcome. By organising and encoding their content so directly to suit the needs of a specific processor, information machines like the Domesday Project condemned themselves to relatively rapid technological obsolescence.

Content as end-product: Browser-based projects

The age of the information machine began to close with the development and popular acceptance of the first Internet browsers in the early 1990s. In an information machine, developers have great control over both their processor and how their data is encoded. They can alter their encoding to suit the needs of their processors and develop or customise processors to work with specific instances of data. Developers working with browsers, however, have far less control over either element: users interact with projects using their own software and require content prepared in ways compatible with their processor. This both makes it much more difficult for designers to produce predictable results of any sophistication and requires them to adhere to standard ways of describing common phenomena. It also changes the focus of project design: where once developers focussed on producing results, they now tend to concentrate instead on providing content.

This change in approach explains in large part the relative technological longevity of the projects by McGillivray and Kiernan. Both were developed during the initial wave of popular excitement at the commercialisation of the Internet. Both were designed to be used without modification by standard Internet web browsers operating on the end-users’ computer and written in standard languages using a standard character set recognised by all Internet browsers to this day. For this reason—and despite the fact that browsers available in the late 1990s were quite primitive by today’s standards—it seems very unlikely that either project in the foreseeable future will need anything like the same kind of intensive recovery effort required by the Domesday Project: modern browsers are still able to read early HTML-encoded pages and Java routines, and are likely to continue to do so, regardless of changes in operating system or hardware, as long as the Internet exists in its current form. Even in the unlikely event that technological changes render HTML-encoded documents unusable in our lifetime, conversion will not be difficult. HTML is a text-based language that can easily be transformed by any number of scripting languages. Since HTML-encoded files are in no way operating system or software dependent, future generations—in contrast to the engineers responsible for converting the Electronic Domesday Book—will be able to convert the projects by Kiernan and McGillivray to new formats without any need to reconstruct the original processing environment.

SGML-based editions

The separation of content from processor did not begin with the rise of Internet browsers. HTML, the language which made the development of such browsers possible, is itself derived from work on standardised structural mark-up languages in the 1960s through the 1980s. These languages, the most developed and widely used at the time being Standard General Mark-up Language (SGML), required developers to make a rigid distinction between a document’s content and its appearance. Content and structure were encoded according to the intrinsic nature of the information and interests of the encoder using a suitable standard mark-up language. How this mark-up was to be used and understood was left up to the processor: in a web browser, the mark-up could be used to determine the text’s appearance on the screen; in a database program it might serve to delimit it into distinct fields. For documents encoded in early HTML (which used a small number of standard elements), the most common processor was the web browser, which formatted content for display for the most part without specific instructions from the content developer: having described a section of text using an appropriate HTML tag as 〈i〉 (italic) or 〈b〉 (bold), developers were supposed for the most part to leave decisions about specific details of size, font, and position up to the relatively predictable internal stylesheets of the user’s browser (though of course many early webpages misused structural elements like 〈table〉 to encode appearance).

SGML was more sophisticated than HTML in that it described how mark-up systems were to be built rather than their specific content. This allowed developers to create custom sets of structural elements that more accurately reflected the qualities they wished to describe in the content they were encoding. SGML languages like DocBook were developed for the needs of technical and other publishers; the Text Encoding Initiative (TEI) produced a comprehensive set of structural elements suitable for the encoding of texts for use in scholarly environments. Unfortunately, however, this flexibility also made it difficult to share content with others. Having designed their own sets of structural elements, developers could not be certain their users would have access to software that knew how to process them.

The result was a partial return to the model of the information machine: in order to ensure their work could be used, developers of SGML projects intended for wide distribution tended to package their projects with specific (usually proprietary) software, fonts, and processing instructions. While the theoretical separation of content and processor represented an improvement over that taken by previous generations of digital projects in that it treated content as having intrinsic value outside the immediate processing context, the practical need to supply users with special software capable of rendering or otherwise processing this content tended nevertheless to tie the projects’ immediate usefulness to the lifespan and weaknesses of the associated software. This is a less serious type of obsolescence, since rescuing information from projects that suffer from it involves nothing like the technological CPR required to recover the Domesday Project. But the fact that it must occur at all almost certainly limits these projects’ longevity and disciplinary impact. Users who must convert a project from one format to another or work with incomplete or partially broken rendering almost certainly are going to prefer texts and scholarship in more convenient formats.

XML, XSLT, Unicode, and related technologies

Developments of the last half-decade have largely eliminated the problem these pioneering SGML-based projects faced in distributing their projects to a general audience. The widespread adoption of XML, XSLT, Unicode, and similarly robust international standards on the Internet means that scholars developing new digital projects now can produce content using mark-up as flexible and sophisticated as anything possible in SGML without worrying that their users will lack the necessary software to display and otherwise process it. Just as the projects by Kiernan and McGillivray were able to avoid premature technological obsolescence by assuming users would make use of widely available Internet browsers, so to designers of XML-based projects can now increase their odds of avoiding early obsolescence by taking advantage of the ubiquitousness of the new generation of XML-, XSLT-, and Unicode-aware Internet clients3.

Tools and community support

The fact that these technologies have been so widely accepted in both industry and the scholarly world has other implications beyond making digital projects easier to distribute, however. The establishment of robust and stable standards for structural mark-up has also encouraged the development of a wide range of tools and organisations that also make such projects easier to develop.

Tools

Perhaps the most striking change lies in the development of tools. When I began my SGML-based edition Cædmon’s Hymn in 1997, the only SGML-aware and TEI-compatible tools I had at my disposal were GNU-Emacs, an open source text editor, and the Panorama and later Multidoc SGML browsers (what other commercial tools and environments were available were far beyond the budget of my one scholar project). None of these were very user friendly. Gnu-Emacs, though extremely powerful, was far more difficult to set up and operate than the word processors, spreadsheets, and processors I had been accustomed to use up to that point. The Panorama and Multidoc browsers used proprietary languages to interpret SGML that had relatively few experienced users and a very limited basis of support. There were other often quite sophisticated tools and other kinds of software available, including some—such as TACT, Collate, TUSTEP, and various specialised fonts like Peter Baker’s original Times Old English—that were aimed primarily at medievalists or developers of scholarly digital projects. Almost all of these, however, required users to encode their data in specific and almost invariably incompatible ways. Often, moreover, the tool itself also was intended for distribution to the end user—once again causing developers to run the risk of premature technological obsolescence.

Today, developers of new scholarly digital projects have access to a far wider range of general and specialised XML-aware tools. In addition to GNU-Emacs—which remains a powerful editor and has become considerably more easy to set up on most operating systems—there are a number of full-featured, easy to use, open source or relatively inexpensive commercial XML-aware editing environments available including Oxygen, Serna, and Screem. There are also quite a number of well-designed tools aimed at solving more specialised problems in the production of scholarly projects. Several of these, such as Anastasia and Edition Production and Presentation Technology (EPPT), have been designed by medievalists. Others, such as the University of Victoria’s Image Markup Tool and other tools under development by the TAPoR project, have been developed by scholars in related disciplines.

More significantly, these tools avoid most of the problems associated with those of previous decades. All the tools mentioned in the previous paragraph (including the commercial tools) are XML-based and have built-in support for TEI XML, the standard structural markup language for scholarly projects (this is also true of TUSTEP, which has been updated continuously). This means both that they can often be used on the same underlying content and that developers can encode their text to reflect their interests or the nature of the primary source rather than to suit the requirements of a specific tool. In addition, almost all are aimed at the developer rather than the end user. With the exception of Anastasia and EPPT, which all involve display environments, none of the tools mentioned above is intended for distribution with the final project. Although these tools—many of which are currently in the beta stage of development—ultimately will become obsolete, the fact that almost all are now standards compliant means that the content they produce almost certainly will survive far longer.

Community support

A second area in which the existence of stable and widely recognised standards has helped medievalists working with digital projects has been in the establishment of community-based support and development groups. Although Humanities Computing, like most other scholarly disciplines, has long had scholarly associations to represent the interests of their members and foster exchanges of information (e.g. Association for Literary and Linguistic Computing [ALLC]; Society for Digital Humanities / Société pour l‘étude des médias interactifs [SDH-SEMI]), the last half-decade has also seen the rise of a number of smaller formal and informal Communities of Practice aimed at establishing standards and providing technological assistance to scholars working in more narrowly defined disciplinary areas. Among the oldest of these are Humanist-l and the TEI—both of which pre-date the development of XML by a considerable period of time. Other community groups, usually more narrow in focus and generally formed after the development of XML, Unicode, and related technologies, include MENOTA (MEdieval and NOrse Text Archive), publishers of the Menota handbook: Guidelines for the encoding of medieval Nordic primary sources; MUFI (Medieval Unicode Font Initiative), an organisation dedicated to the development of solutions to character encoding issues in the representation of characters in medieval Latin manuscripts; and the Digital Medievalist, a community of practice aimed at helping scholars meet the increasingly sophisticated demands faced by designers of contemporary digital projects, that organises a journal, wiki, and mailing list devoted to the establishment and publication of best practice in the production of digital medieval resources.

These tools and organisations have helped reduce considerably the technological burden placed on contemporary designers of digital resources. As Peter Robinson has argued, digital projects will not come completely into their own until “the tools and distribution… [are] such that any scholar with the disciplinary skills to make an edition in print can be assured he or she will have access to the tools and distribution necessary to make it in the electronic medium” (Robinson 2005: abstract). We are still a considerable way away from this ideal and in my view unlikely to reach it before a basic competence in Humanities computing technologies is seen as an essential research skill for our graduate and advanced undergraduate students. But we are also much farther along than we were even a half-decade ago. Developers considering a new digital project can begin now confident that they will be able to devote a far larger proportion of their time to working on disciplinary content—their scholarship and editorial work—than was possible even five years ago. They have access to tools that automate many jobs that used to require special technical know-how or support. The technology they are using is extremely popular and well-supported in the commercial and academic worlds. And, through communities of practice like the Text Encoding Initiative, Menota, and the Digital Medievalist Project, they have access to support from colleagues working on similar problems around the globe.

Future Trends: Editing non-textual objects

With the development and widespread adoption of XML, XSLT, Unicode, and related technologies, text-based digital medieval projects can be said to have emerged from the incunabula stage of their technological development. Although there remain one or two ongoing projects that have resisted incorporating these standards, there is no longer any serious question as to the basic technological underpinnings of new text-based digital projects. We are also beginning to see a practical consensus as to the basic generic expectations for the “Electronic edition”: such editions almost invariably include access to transcriptions and full colour facsimiles of all known primary sources, methods of comparing the texts of individual sources interactively, and, in most cases, some kind of guide, reading, or editorial text. There is still considerable difference in the details of interface (Rosselli Del Turco 2006), mise en moniteur, and approach to collation and recension. But on the whole, most developers and presumably a large number of users seem to have an increasingly strong sense of what a text-based digital edition should look like.

Image, Sound, and Animation: Return of the information machine?

Things are less clear when digital projects turn to non-textual material. While basic and widely accepted standards exist for the encoding of sounds and 2D and 3D graphics, there is far less agreement as to the standards that are to be used in presenting such material to the end user. As a result, editions of non-textual material often have more in common with the information machines of the 1980s than contemporary XML-based textual editions. Currently, most such projects appear to be built using Adobe’s proprietary Flash and Shockwave formats (e.g. Foys 2003; Reed Kline 2001). Gaming applications, 3D applications, and immersive environments use proprietary environments such as Flash and Unreal Engine or custom-designed software. In each case, the long-term durability and cross-platform operability of projects produced in these environments is tied to that of the software for which they are written. All of these formats require proprietary viewers, none of which are shipped as a standard part of most operating systems. As with the BBC Domesday Project, restoring content published in many of these formats ultimately may require restoration of the original hard- and software environment.

Using technology to guide the reader: Three examples4

Current editions of non-textual material resemble information machines in another way, as well: they tend to be over-designed. Because developers of such projects write for specific processors, they—like developers of information machines of the 1980s—are able to control the end-user’s experience with great precision. They can place objects in precise locations on the user’s screen, allow or prevent certain types of navigation, and animate common user tasks.

When handled well, such control can enhance contemporary users’ experience of the project. Martin Foy’s 2003 edition of the Bayeux Tapestry, for example, uses Flash animation to create a custom-designed browsing environment that allows the user to consult the Bayeux Tapestry as a medieval audience might—by moving back and forth apparently seamlessly along its 68 metre length. The opening screen shows a section from the facsimile above a plot-line that provides an overview of the Tapestry’s entire contents in a single screen. Users can navigate the Tapestry scene-by-scene using arrow buttons at the bottom left of the browser window, centimetre by centimetre using a slider on the plot-line, or by jumping directly to an arbitrary point on the tapestry by clicking on the plot-line at the desired location. Tools, background information, other facsimiles of the tapestry, scene synopses, and notes are accessed through buttons at the bottom left corner of the browser. The first three types of material are presented in a separate window when chosen; the last two appear under the edition’s plot-line. Additional utilities include a tool for making slideshows that allows users to reorder panels to suit their own needs.

If such control can enhance a project’s appearance, it can also get in the way—encouraging developers to include effects for their own sake, or to control end-users’ access to the underlying information unnecessarily. The British Library Turning the Pages series, for example, allows readers to mimic the action of turning pages in an otherwise straightforward photographic manuscript facsimile. When users click on the top or bottom corner of the manuscript page and drag the cursor to the opposite side of the book, they are presented with an animation showing the page being turned over. If they release the mouse button before the page has been pulled approximately 40% of the way across the visible page spread, virtual “gravity” takes over and the page falls back into its original position.

This is an amusing toy and well suited to its intended purpose as an “interactive program that allows museums and libraries to give members of the public access to precious books while keeping the originals safely under glass” (British Library n.d.). It comes, however, at a steep cost: the page-turning system uses an immense amount of memory and processing power—the British Library estimates up to 1 GB of RAM for high quality images on a stand alone machine—and the underlying software used for the Internet presentation, Adobe Shockwave, is not licensed for use on all computer operating systems (oddly, the non-Shockwave Internet version uses Windows Media Player, another proprietary system that shares the same gaps in licensing). The requirement that users drag pages across the screen, moreover, makes paging through an edition unnecessarily time- and attention-consuming: having performed an action that indicates that they wish an event to occur (clicking on the page in question), users are then required to perform additional complex actions (holding the mouse button down while dragging the page across the screen) in order to effect the desired result. What was initially an amusing diversion rapidly becomes a major and unnecessary irritation.

More intellectually serious problems can arise as well. In A Wheel of Memory: The Hereford Mappamundi (Reed Kline 2001), Flash animation is used to control how the user experiences the edition’s content—allowing certain approaches and preventing others. Seeing the Mappamundi “as a conceit for the exploration of the medieval collective memory… using our own collective rota of knowledge, the CD-ROM” (§ I [audio]), the edition displays images from the map and associated documents in a custom-designed viewing area that is itself in part a rota. Editorial material is arranged as a series of chapters and thematically organised explorations of different medieval Worlds: World of the Animals, World of the Strange Races, World of Alexander the Great, etc. With the exception of four numbered chapters, the edition makes heavy use of the possibilities for non-linear browsing inherent in the digital medium to organise its more than 1000 text and image files.

Unfortunately, and despite its high production values and heavy reliance on a non-linear structural conceit, the edition itself is next-to-impossible to use or navigate in ways not anticipated by the project designers. Text and narration are keyed to specific elements of the map and edition and vanish if the user strays from the relevant hotspot: because of this close integration of text and image it is impossible to compare text written about one area of the map with a facsimile of another. The facsimile itself is also very difficult to study. The customised viewing area is of a fixed size (I estimate approximately 615×460 pixels) with more than half this surface given over to background and navigation: when the user chooses to view the whole map on screen, the 4 foot wide original is reproduced with a diameter of less than 350 pixels (approximately 1/10 actual size). Even then, it remains impossible to display the map in its entirety: in keeping with the project’s rota conceit, the facsimile viewing area is circular even though the Hereford map itself is pentagonal: try as I might, I never have been able to get a clear view of the border and image in the facsimile’s top corner.

Future standards for non-textual editions?

It is difficult to see at this point how scholarly editions involving non-textual material ultimately will evolve. Projects that work most impressively right now use proprietary software and viewers (and face an obvious danger of premature obsolescence as a result); projects that adhere to today’s non-proprietary standards for the display and manipulation of images, animation, and sound currently are in a situation analogous to that of the early SGML-based editions: on the one hand, their adherence to open standards presumably will help ensure their data is easily converted to more popular and better supported standards once these develop; on the other hand, the lack of current popular support means that such projects must supply their own processing software—which means tying their short term fate to the success and flexibility of a specific processor. Projects in this field will have emerged from the period of their technological infancy when designers can concentrate on their content, safe in the assumption that users will have easy access to appropriate standards-based processing software on their own computers.

Collaborative content development

The development of structural markup languages like HTML were crucial to the success of the Internet because they allowed for unnegotiated interaction between developers and users. Developers produce content assuming users will be able to process it; users access content assuming it will be suitable for use with their processors. Except when questions of copyright, confidentiality, or commerce intervene, contact between developers and users can be limited to little more than the purchase of a CD-ROM or transfer of files from server to browser.

The last few years have seen a movement towards applying this model to content development as well. Inspired by the availability of well-described and universally recognised encoding standards and encouraged no doubt by the success of the Wikipedia and the open source software movement, many projects now are looking for ways to provide for the addition and publication of user-contributed content or the incorporation of work by other scholars. Such contributions might take the form of notes and annotations, additional texts and essays, links to external resources, and corrections or revision of incorrect or outdated material.

An early, pre-wiki, model of this approach is the Online Reference Book for Medieval Studies (ORB). Founded in 1995 and run by a board of section editors, ORB provides a forum for the development and exchange digital content by and for medievalists. Contributors range from senior scholars to graduate students and interested amateurs; their contributions belong of a wide variety of genres: encyclopaedia-like articles, electronic primary texts, on-line textbooks and monographs, sample syllabi, research guides, and resources for the non-specialist. Despite this, the project itself is administered much like a traditional print-based encyclopaedia: it is run by an editorial board that is responsible for soliciting, vetting, and editing contributions before they are published.

More recently, scholars have been exploring the possibilities of a different, unnegotiated approach to collaboration. One model is the Wikipedia—an on-line reference source that allows users to contribute and edit articles with little editorial oversight. This approach is frequently used on a smaller scale for the construction of more specialised reference works: the Digital Medievalist, for example, is using wiki software to build a community resource for medievalists who use digital media in their research, study, or teaching. Currently, the wiki contains descriptions of projects and publications, conference programmes, calls for papers, and advice on best practice in various technological areas.

Other groups, such as a number of projects at the Brown Virtual Humanities Lab, are working on the development of mechanisms by which members of the community can make more substantial contributions to the development of primary and secondary sources. In this case, users may apply for permission to contribute annotations to the textual database, discussing differences of opinion or evidence in an associated discussion form (Armstrong and Zafrin 2005; Riva 2006).

A recent proposal by Espen Ore suggests an even more radical approach: the design of unnegotiated collaborative editions—i.e. projects that are built with the assumption that others will add to, edit, and revise the core editorial material: texts, introductory material, glossaries and apparatus (Ore 2004). In a similar approach, the Visionary Rood Project has proposed building its multi-object edition using an extensible architecture that will allow users to associate their own projects with others to form a matrix of interrelated objects, texts, and commentary (Karkov, O’Donnell, Rosselli Del Turco, et al 2006). Peter Robinson has recently proposed the development of tools that would allow this type of editorial collaboration to take place (Robinson 2005).

These approaches to collaboration are still very much in their earliest stages of development. While the technology already exists to enable such community participation in the development of intellectual content, questions of quality control, intellectual responsibility, and especially incentives for participation remain very much unsettled. Professional scholars traditionally achieve success—both institutionally and in terms of reputation—by the quality and amount of their research publications. Community-based collaborative projects do not easily fit into this model. Project directors cannot easily claim intellectual responsibility for the contributions of others to “their” projects—reducing their value in a profession in which monographs are still seen as a standard measure of influence and achievement. And the type of contributions open to most participants—annotations, brief commentary, and editorial work—are difficult to use in building a scholarly reputation: the time when a carefully researched entry on the Wikipedia or annotation to an on-line collaborative edition will help scholars who are beginning or building their careers is still a long way away (see O’Donnell 2006 who discusses a number of the economic issues involved in collaborative digital models).

Conclusion

Digital scholarship in Medieval Studies has long involved finding an accommodation between the new and the durable. On the one hand, technology has allowed scholars to do far more than was ever possible in print. It has allowed them to build bigger concordances and more comprehensive dictionaries, to compile detailed statics about usage and dialectal spread, and to publish far more detailed collations, archives, and facsimiles. At the same time, however, the rapidly changing nature of this technology and its associated methods has brought with it the potential cost of premature obsolescence. While few projects, perhaps, have suffered quite so spectacularly as the BBC’s Domesday Book, many have suffered from an undeserved lack of attention or disciplinary impact due to technological problems. The emphasis on information as a raw material in the days before the development of structural mark-up languages often produced results of relatively narrow and short-term interest—often in the form of information machines that could not survive the obsolescence of their underlying technology without heroic and costly efforts at reconstruction. Even the development of early structural markup languages like SGML did not entirely solve this problem: while theoretically platform-independent and focussed on the development of content, SGML-based projects commonly required users to acquire specific and usually very specialised software for even the most basic processing and rendition.

Of the projects published in the initial years of the internet revolution, those that relied on the most widely supported technology and standards—HTML and the ubiquitous desktop Internet browsers—survived the best. The editions by Kiernan and McGillivray showcased by Solopova in her lecture that summer still function well—even if their user interfaces now look even more old fashioned two years on.

In as much as the new XML and Unicode-based technologies combine the flexibility and sophistication of SGML with the broad support of early HTML, text-based medieval digital scholarship is now leaving its most experimental period. There remain economic and rhetorical issues surrounding the best ways of delivering different types of scholarly content to professional and popular audiences; but on the whole the question of the core technologies required has been settled definitively.

The new areas of experimentation in medieval digital studies involve editions of non-textual material and the development of new collaborative models of publication and project development. Here technology both has even more to offer the digital scholar and carries with it even greater risks. On the one hand, the great strides made in computer-based animation, gaming, and 3-D imaging in the commercial world offer projects the chance to deal with material never before subject to the kind of thorough presentation now possible. We already have marvellous editions of objects—maps, tapestries, two dimensional images—that allow the user to explore their subjects in ways impossible in print. In the near future we can expect to see a greater use of 3D and gaming technology in the treatment of sculpture, archaeological digs, and even entire cities. With the use of wikis and similar types of collaborative technologies, such projects may also be able to capture much more of the knowledge of the disciplinary experts who make up their audiences.

For projects dealing with non-textual objects, the risk is that the current necessity of relying on proprietary software intended for the much shorter-term needs of professional game designers and computer animators will lead to the same kind of premature and catastrophic obsolescence brought on by the equally-advanced-for-its-day Domesday Project. Sixteen years from now, animation design suites like Director (the authoring suite used for producing Shockwave files) and gaming engines like Unreal engine (an authoring engine used to produce current generations of video games) are likely to be different from and perhaps incompatible with current versions in a way that XML authoring technologies and processors will not. While we can hope that reconstruction will not be as difficult as it proved to be in the case of the Domesday Project, it seems likely that few of today’s non-textual editions will still be working without problems at an equivalent point in their histories, two decades from now.

In the case of experimentation with collaborative software, the challenge is more economic and social than technological. In my experience, most professional scholars initially are extremely impressed by the possibilities offered by collaborative software like wikis and other forms of annotation engines—before almost immediately bumping up against the problems of prestige and quality control that currently make them infeasible as channels of high level scholarly communication. Indeed at one recent conference session I attended (on the future of collaborative software, no less!) the biggest laugh of the morning came when one of the speakers confessed to having devoted most of the previous month to researching and writing a long article for the Wikipedia on his particular specialism in Medieval Studies.

That current text-based digital editions seem likely to outlive the technology that produced them can be attributed to the pioneering efforts of the many scholars responsible for editions like those by Adams, Kiernan, McGillivray, and Robinson discussed by Solopova in her lecture. The current generation of scholars producing editions of non-textual objects and experimenting with collaborative forms of scholarship and publication are now filling a similar role. The solutions they are developing may or may not provide the final answers; but they certainly will provide a core of experimental practice upon which the final answers most certainly will be built.

Notes

1 The focus of this chapter is on theoretical and historical problems that have affected digital scholarship in Medieval studies in the past and likely to continue to do so for the foreseeable future. Scholars seeking more specific advice on technological problems or best practice have access to numerous excellent Humanities Computing societies, mailing lists, and internet sites. For some specific suggestions, see the section “Community Support,” pp. 000-000, below. I thank Roberto Rosselli Del Turco for his help with this article.

2 Exceptions to this generalisation prove the rule: pre-Internet age projects, such as the Dictionary of Old English (DOE) or Project Gutenberg that concentrated more on content than processing have aged much better than those that concentrated on processing rather than content. Both the DOE and Project Gutenberg, for example, have successfully migrated to HTML and now XML. The first volume of the DOE was published on microfiche in 1986—the same year as the BBC’s Domesday Book; on-line and CD-ROM versions were subsequently produced with relatively little effort. Project Gutenberg began with ASCII text in 1971.

3 Not all developers of XML-encoded medieval projects have taken this approach. Some continue to write for specific browsers and operating systems (e.g. Muir 2004a); others have developed or are in the process of developing their own display environments (e.g. Anastasia, Elwood [see Duggan and Lyman 2005: Appendix]). The advantage of this approach, of course, is that—as with information machines like the BBC Domesday Book—developers acquire great control over the end user’s experience (see for example McGillivray 2006 on Muir 2004b); the trade off, however, is likely to be more rapid than necessary technological obsolescence or increased maintenance costs in the future.

4 The discussion in this section has been adapted with permission from a much longer version in O’Donnell 2005b.

References and Further Reading

Organisations and Support

Further reading

----  

Back to content

Search my site

Sections

Current teaching

Recent changes to this site

Tags

anglo-saxon studies, caedmon, citation, citation practice, citations, composition, computers, digital humanities, digital pedagogy, exercises, grammar, history, moodle, old english, pedagogy, research, student employees, students, study tips, teaching, tips, tutorials, unessay, universities, university of lethbridge

See all...

Follow me on Twitter

At the dpod blog