Reverse detail from Kakelbont MS 1, a fifteenth-century French Psalter. This image is in the public domain. Daniel Paul O'Donnell

Forward to Navigation

The bird in hand: Humanities research in the age of open data (Digital Science Report)

Posted: Oct 24, 2016 13:10;
Last Modified: Oct 24, 2016 13:10

---

Originally published as Daniel Paul O’Donnell. 2016. “The Bird in Hand: Humanities Research in the Age of Open Data.” In The State of Open Data: A Selection of Analyses and Articles about Open Data, Edited by Figshare, 34–35. Digital Science Report. London: Digital Science.


Traditionally, humanities scholars have resisted describing their raw material as
“data” 10.

Instead, they speak of “sources” and “readings.” “Primary sources” are the
texts, objects, and artifacts they study; “secondary sources” are the works
of other commentators used in their analyses; “readings” can be either the
arguments that represent the end product of their research or the extracts
and quotations they use for support.

These definitions are contextual. The primary source for one argument can be
the secondary source for another or, as in the case of a “critical edition” of a
historical text, simultaneously primary and secondary. Almost any document,
artifact or record of human activity can be a topic of study. Arguments proposing
previously unrecognized sources (“high school yearbooks, cookbooks, or wear
patterns in the floors of public places”) are valued acts of scholarship. 1

This resistance to “data” is a recognition of real differences in the way humanists
collect and use such material. In other domains, data are generated through
experiment, observation, and measurement. Darwin goes to the Galapagos
Islands, observes the finches, and fills notebooks with what he sees. His notes
(i.e. his “data”) “represent information in a formalized manner suitable for
communication, interpretation, or processing” 2 . They are “the facts, numbers,
letters, and symbols that describe an object, idea, condition, situation, or other
factors” 3. Given the extent to which they are generated, it has been argued that
they might be described better as capta, “taken,” than data, “given”. 4

The material of humanities research traditionally is much more datum than
captum, finch than note. Since the humanities involve the study of the meaning
of human thought, culture, and history, such material typically involves other
people’s work. It is often unique and its interpretation is usually provisional,
depending on broader understandings of purpose, context and form that are
themselves open to analysis, argument and modification. In the humanities, we
more often end up debating why we think something is a finch than what we
can conclude from observing it.

Perhaps most telling is the fact that humanities sources, unlike scientific
data, are usually practically as well as theoretically non-rivalrous 5. Humanities
researchers rarely have an incentive (or capability) to prevent others from
accessing their raw material and entire research domains (e.g. Jane Austen
studies) can work for centuries from the same few primary sources. Priority
disputes that occur regularly in the sciences 6 are almost non-existent within
the humanities. 1

The digital age is changing one aspect of this traditional disciplinary difference.
Mass digitalization and new tools make it possible to extract material
algorithmically from large numbers of cultural artifacts. Where researchers
used to be limited to sources in archives and libraries to which they had
physical access, digital archives and metadata now make it easier to work
across complete historical or geographic corpora: all surviving periodicals from
19th century England, for example, or every known pamphlet from the Civil
War. In the digital age, humanities resources can be capta as well as data.

Such changes allow for new types of research and improve the efficacy of some
traditional approaches. But they also raise existential questions about long-
standing practices. Traditionally, humanities researchers have tended to work
with details from a limited corpus to make larger arguments: “close readings” of
selected passages in a given text to produce larger interpretations of the work
as a whole; or of passages from a few selected works to support arguments
about larger events, movements or schools. In one famous but far from atypical
example, author Ian Watt uses readings from five novels and three authors as the
main primary sources in his discussion of the Rise of the Novel. 7

In the age of open data, it is tempting to see this as being, in essence, a small-
sample analysis lacking in statistical power. 8 But such data-centric criticism of
traditional humanities arguments can be a form of category error. Humanities
research is as a rule more about interpretation than solution. It is about why
you understand something the way you do rather than why something is
the way it is. It treats its sources as examples to support an argument rather
phenomena to be observed in the service of a solution. While Watt’s title,
“The Rise of the Novel,” can be understood as implying a historical scope
that his sample cannot support, his subtitle, “Studies in Defoe, Richardson,
and Fielding,” shows that he actually was making an argument about the
interpretation of three canonical authors based on his understanding of
the novel’s early history – an understanding that by definition always will be
provisional and open to amendment.

The real challenge for the humanities in the age of digital open data is
recognizing the value of both types of sources: the material we can now
generate algorithmically at previously unimaginable scales and the continuing
value of the exemplary source or passage. As the raw material of humanities
research begins to acquire formal qualities associated with data in other fields,
the danger is going to be that we forget that our research requires us to be
sensitive to both object and observation, datum and captum, finch and note. In
asking ourselves what we can do with a million books 9, we need to remember
that we remain interested in the meaning of individual titles and passages.

Works cited

1 Borgman, Christine L. 2007. Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, Mass: MIT Press.

2 Consultative Committee for Space Data Systems. 2012. “Reference Model for an Open Archival Information System (OAIS).” CCSDS 650.0-M-2.
NASA. http://public.ccsds.org/publications/archive/650×0m2.pdf.

3 National Research Council. 1999. Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington:
National Academies Press. http://public.eblib.com/choice/publicfullrecord.aspx?p=3375284.

4 Jensen, H. E. 1950. “Editorial Note.” In Through Values to Social Interpretation: Essays on Social Contexts, Actions, Types, and Prospects, vii – xi.
Sociological Series. Duke University Press.

5 Kitchin, Rob. 2014. The Data Revolution. Thousand Oaks, CA: SAGE Publications Ltd.

6 Casadevall, Arturo, and Ferric C. Fang. 2012. “Winner Takes All.” Scientific American 307 (2): 13. doi:10.1038/scientificamerican0812-13.

7 Watt, Ian P. (1957) 1987. The Rise of the Novel: Studies in Defoe, Richardson, and Fielding. London: Hogarth.

8 Jockers, Matthew L. 2013. Macroanalysis : Digital Methods and Literary History. Urbana, IL: University of Illinois Press.

9 Crane, Gregory. 2006. “What Do You Do with a Million Books?” D-Lib Magazine 12 (3). doi:10.1045/march2006-crane.

10 Marche, Stephen. 2012. “Literature Is Not Data: Against Digital Humanities.” Los Angeles Review of Books, October. https://lareviewofbooks.org/ article/literature-is-not-data-against-digital-humanities/
article/literature-is-not-data-against-digital-humanities/.

----  

Cædmon Citation Network - Week 12+13

Posted: Aug 23, 2016 10:08;
Last Modified: Aug 23, 2016 10:08

---

Hi all!

Summer is winding to a close, and our project continues to progress. The database is working, and is currently being made faster for even easier use. Books and articles are still being collected and scanned, and I am trying to split my time between scanning sources and collecting data.

At our last meeting Dan and I went over the exact specifications for the references I am collecting. Information is sorted into four types:

Text Quotes (TQ)

Text Mentions ™

Scholarly References (SR)

Other References (OR)

Text Quotes and Text Mentions come from editions, facsimiles, translations, and manuscripts, and only refer to Cædmon’s Hymn itself. Quotes are direct quotations from the poem, while mentions are references to other editions.

Scholarly References will consist of references made to anything other than Cædmon’s own words. This can include books and articles about the hymn or other topics, as well as supplementary text from the editions of the hymn.

Other References is simply a catch-all category for anything that does not fit into the previous three categories.

Unfortunately I have been having laptop issues and had to reinstall the operating system on my computer, losing some programs in the process. I am not sure if this will affect my GLOBUS endpoint, but I will try transferring some files later to determine if I need to figure all of that out again.

My goals for this week are to scan the ILL books that I currently have checked out, transfer all the files I have scanned to GLOBUS, and fine tune the way I collect my data from the books and articles. I have been finding that it is quicker to write down a large chunk of references on paper and then input the info to the database in one go. This may change as Garret makes the database quicker. The faster version of the database should be ready this week, but I do not currently have access to it so today will be a scanning day. I also plan to request another chunk of ILL books and articles from the library.

For next week’s blog I hope to write a sort of how-to guide on collecting information from the sources and inputting the info into the database. As the new semester starts in two weeks, I will have less time to spend on the project and I believe Dan plans on hiring more students to help the collection go faster. The how-to guide should ensure that we are all collecting data in the same way, and should ease any confusion that might cause errors. As the semester progresses I and whoever else might be working on the project can go through the data collection at a steady pace, and I can continue to collect and scan the sources needed to complete the bibliography.

Things seem to be on track, and hopefully the transition into the new semester will be smooth!

Until next week,

Colleen

----  

Cædmon Citation Network - Week 10

Posted: Jul 25, 2016 10:07;
Last Modified: Jul 25, 2016 10:07

---

Hi all,

It is week 10 already, and I feel like I am nowhere near where I thought I would be with regards to this project. While the list of the sources we need for our data collection on Zotero are as complete as we can know at present, not everything on the list has been collected yet. I was in high spirits at the beginning of last week thinking that the collection of sources was nearly complete, however I realised later on that I had missed a good chunk of the list. It turned out that I had some filters set that were omitting a portion of the 700-ish books and articles. To make a long story short, more collection is still needed!

This will mean more inter-library loan books will need to be ordered and scanned, and more articles will need to be transferred to the GLOBUS folder. Thankfully the book scanner is back up and running again! If it holds out it should make the process painless and a good deal quicker than scanning things on the photocopier.

My plan for this week is to:

- finish scanning the inter-library loan books I currently have checked out (there are about four left to scan)

- finish collecting EVERYTHING on the Zotero list, keeping track of how many inter-library loan books are due to come in so I can account for future scanning time.

- transfer all electronic copies of articles and books to the GLOBUS folder (the internet guy is FINALLY coming on Tuesday to hook up my apartment, so I can work on this every night starting tomorrow)

- And then, if by some miracle I finish everything before the end of the week, I will begin data collection.

To be quite honest, I have been very frustrated with myself and the fact that I have not begun the data collection sooner. I suppose that collecting and organising hundreds of articles just takes longer than I imagined. I really have been working at it steadily throughout the summer, trying to maintain a level of organisation that allows information on the project to transfer easily between myself, Dan, Garret, and anyone else that might happen to work on the project. Although scanning and photocopying seem like menial tasks, I think I need to remind myself that such tasks take time and are necessary to keep our project organised and moving forward.

I do hope though that if Dan has any concerns with the pace of the project that he will let me know, as I do not want to drag things out way longer than he was expecting. The project IS moving forward, however slow it may have seemed the past couple of weeks. Although collecting and organising the sources is not the most exciting part of the job to write updates about, you all can be assured that it is almost complete and the data collection will begin very soon! I am very much looking forward to beginning this part of the project, seeing what we find, and facing the challenges that I am sure we will encounter.

Until later,

Colleen

----  

Cædmon Citation Network - Week 9

Posted: Jul 18, 2016 09:07;
Last Modified: Jul 18, 2016 09:07

---

Hi all!

I finally get to start reading this week!!! While I am still not 100% complete in my sourcing of all the books and articles, it is looking as though I will definitely be able to start reading by Wednesday if not earlier.

I also have a bunch of books from inter-library loans that I need to scan portions of. That will be part of my job today.

The database will be ready this week as well. Garret says that there will be a few improvements that he will want to make, but I will be able to start using it this week. All the information that I collect will still be available as the database is upgraded.

You may have noticed that I have switched to blogging at the beginning of the week as opposed to the end. I have found that at this point it is more beneficial to myself to post at the start of the week outlining some goals and then adding an update post sometime during the middle of the week. I am going to continue this model for the next while.

Until next time!

Colleen

----  

Cædmon Citation Network - The Return

Posted: May 19, 2016 10:05;
Last Modified: May 19, 2016 11:05

---

Hello, Readers of Dan’s Blog!

My name is Colleen Copland, and I am a student of Dan’s who will be working with him on the Cædmon Citation Network which he and Rachel Hanks began work on last summer. I will be blogging here weekly, and thought I’d use this first post to introduce myself and more-or-less explain the project as I understand it so far. I am still familiarizing myself with everything, so my descriptions may fall short of the actual scope of the project or they might be totally off-base altogether, but as I learn more I will let you know all the juicy details!

Little intro on myself: I am an undergraduate student at the University of Lethbridge, majoring in English and hoping to be accepted into the English/Language Arts Education program this fall (cross your fingers for me, internet!). I have taken three courses with Dan in the past two years, Medieval English, Intro to Old English, and Advanced Old English in which we spent an entire semester reading Beowulf. Suffice to say I think Dan is a pretty excellent prof and I am excited to work for him this summer so I can continue to learn from him!

The Cædmon Citation Network (also known as the Cædmon Bibliography Project and possibly a few other names – I will need to ask Dan if there is something he’d like me to call it officially) is a gathering of data on the citations of various editions of Cædmon’s Hymn. The project is interested in tracking how long it takes a new edition of a work to start being cited in studies of said work. Cædmon’s Hymn, since it is such a short piece, has been re-translated and re-published a great many times since 1644, which should allow us to notice some patterns in the way each new edition is cited.

The project is also interested in looking at the differences between the citing of digital editions of works as opposed to print editions. Many people assume that it takes longer for digital editions to begin being cited, but this project aims to suggest that they are actually cited more quickly. It will be interesting to see what the data shows us.

Where are we right now with regards to the project? Personally, I am becoming oriented with the project’s goals and working to gain access to all of the excellent data collected by Rachel Hanks who worked on the project last year – figuring out where everything was left off and where Dan would like it to go this summer.

I am excited about gathering more information and will share it with you as I progress. It often seems that I gain a better understanding of a project when I explain what is happening to someone else, so I think this blog will be an excellent tool. It will also serve as a good record of what went on at different points during the project for Dan and I. Any questions you might have can be left in the comments section that I believe is located below this post…

Until next week,

Colleen

----  

Essential computer tools and skills for humanities students

Posted: Nov 30, 2014 15:11;
Last Modified: Dec 27, 2014 22:12

---

The Digital Humanities is a hot new field within the Arts. Its practitioners are often at the forefront of developing new topics within ICT itself.

But what about if you are not interested in the Digital Humanities? Or are interested in them, but don’t consider yourself particularly computer literate? What are the computer skills you need to thrive in the traditional humanities or get started in DH?

This is the first in what I hope will be a series of tutorials on basic computer skills and tools for students of the Humanities. It should be of use to those just beginning their undergraduate careers, for graduate students hoping to professionalise their research and study, and for researchers and teachers who have other things to do that follow the latest trends and software.

Contents

What kind of thing can I learn from this series?

The focus of this series is going to be on basic tools. It is going to assume you know nothing other than how to turn on a computer and get on the internet. It will make some recommendations about basic software, starting with such simple things as browsers. It will also cover some basic techniques: how to use styles in word processors, how to use a citation manager or spreadsheet.

How often will they appear?

I’m going to mark this as a special cluster in my blog (using a special tag, basic computer skills). But I’ll publish them irregularly, as the mood strikes and I have the time. I’m also hoping to get some guest authors involved. Mostly students who have done presentations on these things in my classes.

What if I have an idea for a tutorial? What if I disagree with you?

If you have an idea for an article in this series, I’d love to hear from you. If you have already written something on a topic I’m covering and would like me to know about it or link to you, please let me know as well!

Articles in this series

The following are links to the other articles in this series. You can also find them using the tag basic computer skills

----  

The Lethbridge Journal Incubator: A new business model for Open Access journal publication (Elsevier Labs Online Lectures February 18, 2014)

Posted: Feb 19, 2014 16:02;
Last Modified: Feb 19, 2014 16:02

---

The Lethbridge Journal Incubator: A new business model for Open Access journal publication by Daniel Paul O’Donnell with contributions from Gillian Ayers, Kelaine Devine, Heather Hobma, Jessica Ruzack, Sandra Cowen, Leona Jacobs, Wendy Merkeley, Rhys Stevens, Marinus Swanepoel, and Maxine Tedesco. Elsevier Labs Online Lectures February 18, 2014.

The Lethbridge Journal Incubator: A new business model for Open Access journal publication by Daniel O'Donnell with contributions from Gillian Ayers, Kelaine Devine, Heather Hobma, Jessica Ruzack, Sandra Cowen, Leona Jacobs, Wendy Merkeley, Rhys Stevens, Marinus Swanepoel, and Maxine Tedesco.

----  

Back to content

Search my site

Sections

Current teaching

Recent changes to this site

Tags

anglo-saxon studies, caedmon, citation, citation practice, citations, composition, computers, digital humanities, digital pedagogy, exercises, grammar, history, moodle, old english, pedagogy, research, student employees, students, study tips, teaching, tips, tutorials, unessay, universities, university of lethbridge

See all...

Follow me on Twitter

At the dpod blog