Thinking Big about Medieval Data

My hat’s off to Burnable Books’s 5-post series on “Medieval Studies in the Age of Big Data” that highlights the vibrancy of work that medievalists are doing with new technologies. The posts are peppered with countless links to cool projects, just as they are infused with a sensible balance of optimism/promise and pessimism/perils regarding the union of big (relatively speaking, of course) data and medieval studies. Yet, in a few important ways, I think the posts at times lost the medieval forest for the manuscript trees, or perhaps the flock of medieval studies for the skins of the sheep. Or something like that.

The posts rarely strayed far from manuscripts, yet it seems like some most important challenges facing medievalists in the 21st century have little to do with the manuscripts themselves. We need to think much more broadly about the future of medieval studies, its data, the community practice of producing and interacting with that data, and how training in medieval studies can become much more relevant both to students who go to careers in a related field and to students who end up working in a totally different one.

The posts nicely complemented each other in the different ways they addressed the nature of digital texts, and what that means for how we read, edit, and interpret them. A quick summary (with quibbles): The first focused on how medievalists have always had big data, a claim that embraces the magnitude of the multitudes of books that line library stacks, although it uses a definition of data (= physical books) that I do not find particularly helpful. The second post tries to bridge the gap between material culture and digital data with a plea for more digitization, even if the responsibility of digitization may be inadvertently misplaced. The third addresses potential gains for paleographical study when coupled with OCR technologies, though it remains focused on individual research rather than community collaboration. The fourth wonders if interest in the digital will only increase our interest in (and lament for) material culture of books that we will merely view in a flattened and lifeless facsimile. The last post addressed the issue of new critical editions and text editing techniques—and the nearly heretical suggestion of moving beyond Lachmann—that’s near and dear to me.

Below: 3 issues that I think must also be considered in addition to those that the authors have raised.

Firstly, any calls to digitization must also include the challenge of creating new community practices around the process and products of digitization. Timothy Stinson’s post (II of the series) rightly suggests that more digitization is a crucial next step in creating big medieval data. But the post also implies (to me, anyway) that the future of digitization remains something of an institutional problem. While libraries and funding agencies have crucial roles to play, it is the the community of practitioners that must take this upon themselves. Two ways: (1) To create their own open archive of manuscript images, transcriptions, and metadata; (2) To pressure libraries that continue to enforce antiquated research practices into letting them do it. Libraries simply cannot be allowed to complain about limited resources to digitize artifacts, but then refuse the free labor that could help them do it, and make no effort to facilitate a willing and eager labor force.

Secondly, medievalists must embrace more modular forms (and forums) of communication that more appropriately fit the often highly specialized (but no less important or significant) work they do. One post questioned whether manuscript descriptions are still “print worthy”. The answer is clearly “no”, but that’s more a judgment of an outmoded medium than the message itself. These descriptions are far more important than to be bottlenecked by the expense and time required to physically print them. These descriptions are better suited to online publication where they will be far more easily found and more widely used. As medievalists can be uniquely and sometimes oppressively limited in what they have access to—or can reasonably use in a project—forcing research into book and article forms is not necessarily a good idea anymore. Major studies and analyses can and should stay in physical monographs at least for now. But as textual editing environments (many of which are mentioned throughout the posts) make more texts more accessible, much more of the pure descriptive and transcription work can be both accreted and accredited. We need to emphasize the value of connecting unusual manuscripts—not through expensive and labor-intensive TEI mark-up and canonical names necessarily—but simple plain text descriptions that can be refined over time.

Following the prompt from the series title, many of the posts focused on the big of “big data”. However, I’m not sure that the posts adequately consider the nature of data. For each of these posts, data = literary texts. I certainly don’t dispute that texts can and should be considered as (potential) data, but we also need to take a much broader view of data and how to manage the various genres of data, whether demographic, economic, parish records, death registers, etc, and link them to each other. Thinking about data also means reconsidering (or beginning to consider) the processes by which we create, record, organize, publish, reuse, and link data, both manuscript content and corresponding metadata. Obviously these concerns are being fervently addressed in libraries and archives, and there is no need to reinvent the wheel. But even as data curators think about these issues and data generally, medievalists are not exactly the target audience they have in mind. Medievalists, therefore, must be sure that we are actively working with each other and with data curators to make sure that the big data that seems tantalizingly close actually comes into reach. We might, for example, make sure that our sometimes unique needs are addressed in larger solutions for managing metadata that are often geared for modern and well standardized data. And let’s be honest: we might never quite be satisfied. But for the reasons that all the authors writing for the series laid out, it’s certainly worth trying.

Lastly, another crucial change that the idea of medieval data must impress upon the community is the necessity rethinking the way we train graduate students. Only Deborah McGrady’s post (IV of the series), which comments on experiences of teaching a particular course, mentions the word “student.” Yet our education practices must take a central place when rethinking scholarly practices. Especially in a field like medieval studies that, despite the vibrancy of scholars and their interesting literary and historical analyses, remains a rather methodologically conservative field. This does not imply lack of interest in technology, or fear of adopting new tools and techniques. After all, humanities computing was invented to study Thomas Aquinas. But technological adoption, as one post lamented (without much hope for the future), has been little more than medievalists adapting tools to become more efficient at doing what we’ve always done.

More than important that any new approach to (re)mediated manuscripts, and perhaps the most important shift in the age of big data is to train both undergraduate and graduate students with a variety of careers in mind. For those who want to stay in medieval or manuscript studies, we should encourage them to think big, and to dream up historical, literary, and textual projects that they cannot possibly do alone. This isn’t to say that massively open online collaborations is all they should do, but it’s something they should keep in mind when they have the vocational stability to do so. For students who have other futures in mind (willingly or not), we can make more of the fact that the skills required to deal effectively with manuscripts transfer remarkably well to the digital world. Integrating technology into our courses—and mostly importantly our research agendas—helps map the the medieval manuscript to virtual vellum. Manuscript studies, in the way that it has always been fundamentally about literary traditions, textual networks, and production circumstances, can be much more explicitly about information literacy in the digital age. But only if we take our own research agendas into new technological territory that force us to map our manuscript sensibilities onto the new challenges of data. The real promise of bigger medieval data isn’t in its size, but its connectedness, which presents innumerable challenges that students can take far beyond classroom walls.

As a complement to the 5 posts, I’d suggest that it’s not digitized manuscripts themselves that hold the key to big data. To get at seriously big data—or at least the potential for much bigger (digital and manipulable) data than we’ve had before—we need to blaze new trails across the whole terrain of our scholarly practices, and adapt our methods to technology as much as we’ve adapted technology to our methods.