Lessons from Teaching Historians How to Code
My inaugural version of a course on “programming for historians,” colloquially and better known as #clio3, finished a few weeks ago. As syllabi for teaching difficult technical skills to historians (or other humanists) remain scarce, I thought it might be worth sharing a few of the more important lessons learned. These suggestions mostly come directly from the hardworking, dedicated, and insightful students and the breadcrumbs they’ve left on the course blog. I have simply tried to flesh them out a bit here.
A good first few weeks of any course are usually crucial for its success, but the stakes are even higher for this kind of course that builds on itself so extensively from one week to the next. Some of the most important things that the students taught me had to do with adjusting the focus of the first few lessons.
Strong early focus on data manipulation/scrubbing techniques with standard command-line tools like vim, sed, grep and regular expressions. As a class we had the right idea (the topic was scheduled for the second week), but it should have been at least two if not three weeks, to build core skills and get data ready for the analysis or visualization lessons that come later on. I envisioned that we’d do a lot of data scrubbing with PHP and python scripts eventually, but in practice it simply took too long to learn how to do it effectively because many basic programming concepts hadn’t been covered yet. Besides, using regular expressions via one mechanism or another for cleaning up text files is often the best tool for the job, and can always be later integrated into scripts for reuse. Another reason I didn’t include these kinds of lessons is because I thought that they would be too specific to particular datasets and boring to everyone else. Even if that were true, it would have been valuable to spend more time early on general strategies and a few techniques for fixing systemic errors so it would be easier for students to learn what they needed throughout the semester (and hopefully blog about it).
Incorporate a code repository like GitHub into the course workflow right away. Since the course was already over-ambitious, code repositories seemed like something that could wait. But actually using GitHub really emphasizes the practice of collaboration and encourages thinking about working on larger projects even if the course is mostly about individual or small group projects. In our case, using gists proved invaluable for sharing code between everyone and staying organized, which can be deceptively difficult. In retrospect, it would have been better to formalize and standardize this process early on. Doing so would have made our communal efforts at debugging easier as well.
We needed more short assignments/challenges each week that clearly focused on developing certain skills or addressing clearly-defined problems. I figured that such assignments would be too frustrating even for highly capable students with a wide range of experience and interests and ongoing digital projects. In a future version, it may be worth trying larger group programming assignments that would effectively require students to be constantly sharing and updating code.
I should have given a specific deadlines and concrete examples for students who needed to create their own datasets. Even before the course started, I encouraged students to accumulate data that would be useful to them, but I didn’t give them or have for myself a clear idea of what formats would be useful for later lessons/assignments on data mining or visualization. I should have emphasized more that the course was never about big data but rather extensible data—about learning and reusing techniques for adding and managing data rather than analyzing gigabytes of it—with the hope of not making it seem like anyone had to put together a lot of data to use for the course. It also would have been helpful to provide sample historical data for students whose own projects did have any readily available so that they can jump into assignments without feeling confused as to how they can use a specific technique.
It might be better to forgo a “final project,” which I assigned with hopes that it would serve as a rallying point for students’ work and would be an extension of their dissertations. But it became something to worry about in its own regard, and somewhat discouraged playful exploration of technologies or techniques that obviously wouldn’t have been a part of it. This limitation imposes itself less at the beginning of the course when everyone is optimistic about everything, but it becomes a much larger burden as the semester begins to wind down.
Instead, the class sentiment was that students should divide into groups according to their digital needs and interests. As a whole, students would systematically build up one or maybe a few larger projects over the course of the semester; small groups could be formed to tackle certain aspects of it. I contemplated doing this even this past semester, but thought it would be too frustrating for students to work on projects that were not their own. The students made the very important point that not being able to work on something immediately (many lessons were not immediately applicable to their own projects) requires time later to review and maybe relearn course material. This kind of approach would also make it imperative to introduce source code version control and best practices of social coding.
To end on a positive note, I’ll close with one thing that worked well and will continue next time: Having students create tutorials and present introductions to various topics. This proved valuable and useful not only as learning experiences for the presenter/blogger, but also in creating course resources that people were able to refer to and use throughout the semester. No one was expected to become an instant expert, but rather act as a guide for getting started with a new technique. Tackling unknown technologies was encouraged; I tried to make it very clear that part of the expected course work is to fail, and that when students got stuck they should stop and blog about it. Many of them eventually solved their own problem, but even if not, that’s fine. Explaining the problem helps other people in their own work and the act of writing it out helps the author as well. I hope the students are as proud of their work on the course blog as I am.