Coding in the Humanities

One longstanding debate in the Digital Humanities has been the value of teaching programming skills in humanities courses. The main argument in favor of it: 21st century humanists need skills to harness growing amounts of (digital) data. The main argument against: it’s too technical a skill for a methodology that’s largely antithetical to why people go into the humanities.

On this issue I have remained on the fence for some time, but as I continue to experiment with various text mining projects, and continue to fiddle with my digital history course, I am now convinced that basic techniques for data manipulation should be taught as part of the humanities curriculum. Firstly, it’s as fundamental as any other skill related to reading texts (broadly conceived), including sorting and organizing source material. Of course not all humanists use texts as their primary object of inquiry, but because textual sources often feature prominently (if not exclusively) in research, humanists in general have much to gain by learning how to manipulate the growing body of digital texts with simple but powerful tools. Secondly, not only is data manipulation not antithetical to humanistic research methodology, but it facilitates exactly what the humanities are about: embracing multiple perspectives and engaging with source material in multiple ways. As more and more sources become available online as data (not just as images), humanists need tools to manage and explore it. As Ann Blair has recently described of the early modern period, information overload is hardly a new problem. But if the problem of abundance worries the humanist who relies on project-delimiting scarcity, it is a problem to be embraced rather than avoided.

Stephen Ramsay’s provocative post on using the command line (and a follow up) extols the freedom one gains from not being limited to any particular graphical interface. The command line is “faster, easier to understand, easier to integrate, more scalable, more portable, more sustainable, more consistent, and many, many times more flexible than even the most well-thought-out graphical apps.” I fully support Ramsay’s energetic and engaging plea to be more efficient and autonomous with our digital tools. (I often wonder why more people don’t learn simple keyboard shortcuts for things they do all the time…) But his comparison of the command line and graphical interfaces can sound like a replacement argument that may simply be going too far for most humanists. Design matters—especially for new or infrequent tasks and for visual learners—and minimizing the impact of design because the result is faster and simpler does not make processes more efficient in terms of practical use.

Even if I disagree with the extent of Ramsay’s argument, his point about flexibility is spot on. Whether with the command line as he argues, or, as I argue here—with basic tools and techniques for manipulating texts—researchers gain much greater freedom of exploration. Humanists should not be limited to whatever texts are easily viewable, physically or digitally. Nor should they be limited to using only those texts that are digitally findable or available for download. But one cannot simply ignore them. The combinatorial approach is obviously most powerful here. Furthermore, it is no secret that libraries and archives struggle mightily against budget (and many other) constraints to digitize and to make archival and textual data available to us. We cannot also expect them to provide and maintain comprehensive and intuitive interfaces to access and manipulate that data as well.

Is data manipulation really necessary, though? It’s as necessary as any other methodology we learn. Humanists spend inordinate amounts of time learning how to read texts and how to read between the lines in case their research brings them to certain kinds of sources. We learn how to search, how to identify and explore relevant contexts, and how to fairly extrapolate (or do i mean create?) evidence from the tiniest molecule in a primordial semiotic soup. This isn’t limited to literal texts. Images, art, film, games, and of course the conventional text: each of these has ‘textual’ challenges, but we are trained to deal with them to the extent they are relevant for our research interests. But what happens when we have more sources that we can really deal with by hand? Can we just zero in on the sections that are most relevant? What can we see, not with beautiful visualizations, but simply reformatting a text file to highlight different aspects of it? Like reading 10,000 documents 50 different times, looking for something else each time…but in a few hours. Simple scripting tools give us amazingly powerful tools for this—tools that complement methods we already use.

This freedom to explore sources can be wickedly addictive. Recently, after offering some basic CSS training to a former student who was curious but knew nothing about it, I was reminded of one of the reasons that I’ve always enjoyed learning rudimentary scripting languages and why I’m going to teach them from now on: Even the tiniest ability to make a computer do what one wants rather than only what software allows is tremendously empowering. This experience alone could encourage more historians to take up new digital methodologies; the reason for incorporating scripting and manipulation techniques into courses isn’t necessarily to impart any particular technical skill. Learning any particular scripting language is far less important than taking steps to unlock exploratory potential that really allows you to dig into the new kinds of research questions that everyone has been promising for so many years. Unfortunately, it seems that the rhetoric about new possibilities is far more prominent in project grants than in humanities courses. Humanists get excited about complex tools that can do new things, but then lament that they cannot really use them in they way they need to.

Isn’t writing code just too technical for a historian? I could answer ‘no’, but in fact I reject the premise of the question: that somehow the level of difficulty of learning essential methodologies for source material could be a legitimate criterion for what counts as appropriate or necessary. But even if it’s not too technical, isn’t programming just a different kind of job than what the typical humanist does (or wants to do)? It is difficult to believe that someone who can learn to read, if not speak, several languages, decipher cryptic handwriting, analyze abstract concepts and synthesize hundreds if not thousands of complex documents is somehow fundamentally incapable of learning how to string together a handful of instructions that perform simple tasks like finding a certain string in a text file. As Ramsay points out, scripting languages are orders of magnitude simpler than the ones we use everyday and should not be seen as only for elite power users.

Although it may seem pedantic, a distinction between programming and scripting may be useful to lower the entry barrier. I don’t here attempt to construct a rigid philosophical distinction that will hold up at all levels of scrutiny, but one to illustrate my point. Good scripting might be considered as the creative combination of simple and straightforward commands, even if the syntax can get a little ugly. Programming, on the other hand, might be considered as requiring a much higher level of sophistication and complexity, solving much more elaborate problems (security, accessibility, scalability, reusability) that take considerable design and coding experience to accomplish successfully. Of course scripting can get very complicated, and it can be considered a kind of programming. Scripting is like following a recipe, in many ways not unlike the various research methodologies we learn in order to read different kinds of texts. As one learns basic techniques, creative and exploratory potential increases exponentially.

I don’t mean to suggest that teaching programming to historians is a new idea. About three years ago, Bill Turkel and Alan MacEachern published online the “Programming Historian,” which introduced basic python scripting to the historian. Perhaps they were ahead of their time, and historians could not see the value in learning what looks prima facie like a cryptic language. Maybe they simply didn’t consider themselves programmers. At any rate, the discussion about coding continues, and the idea is worth revisiting. [Importantly, The Programming Historian has taken on a new life.]

I have no desire to change the working habits of historians who enjoy whatever processes and tools that allow them to be successful. But it seems that not teaching students the most basic of tools to query a large and ever-expanding subset of source/data does them a fundamental disservice in limiting the kinds of material they can use, the kinds of questions they can ask, and perhaps even the kinds of careers they can have. Whether it can be done well remains to be seen, but it seems both a necessary and fruitful undertaking.