Exploring digital history with NLA’s Tim Sherratt

0 Posted by - 10 September 2013 - Feature stories

There are a host of online tools emerging that allow you to query data and share research in exciting and innovative ways. Tim Sherratt, the manager of Trove, is one of the developers and digital historians leading the way, with projects including Mapping Our Anzacs and Invisible Australians. Here, Tim looks at the tools worthy of every family historian’s attention.

I blame history for destroying my eyesight. As a young researcher I spent many hours hunched over microfilm readers, squinting and straining at blurry newspaper text in the hope of uncovering something relevant. Now a simple keyword search in Trove reveals thousands of useful articles. The nature of research has changed. The eyesight of future generations of historians is safe.

Trove, the National Library of Australia’s online discovery service, contains the full text of more than 60 million newspaper articles from 1803 onwards. It’s an incredibly rich resource. but there’s more. You can read millions of digitised books from Google or the Internet Archive. You can explore published research through ever-expanding digital repositories. Web-based collections of historic photographs are growing. archives and museums are opening up their collection databases. What are we waiting for?

With this feast of digital goodness also comes new challenges. The sheer volume of material available online can be overwhelming. How do we find, manage, use and interpret these digital riches? Fortunately new tools and technologies are evolving as “digital history” confronts the unfamiliar challenges of abundance.

Beyond discovery
We all know how to search. Keyword searching quickly follows “point and click” when we first learn to navigate the web. but even this familiar technology can have powerful consequences. As more and more books and documents become available in forms that computers understand, keyword searching can take us deep into the content itself. Trove has liberated the text of millions of newspapers using optical character recognition (OCR) software. As a result we can connect with the small stories and fragmentary details that lurk beneath the headlines. The names of ordinary people become points of access.

Discovery is only the first step though. Once sources are online, we can start to explore them in different ways — looking beyond their individual content to find new patterns and contexts. It’s difficult to see trends in an ordinary list of search results. However, if you take those results and graph them over time you can start to observe large-scale changes.

QueryPic is an online tool that lets you do just that. You feed it a word or phrase and it queries the Trove newspaper database, displaying the number of results year by year on a line graph. It’s very simple, but remarkably useful.

QueryPic, built by Tim Sherratt, charts the usage of "Santa Claus" versus "Father Christmas" in newspaper articles on Trove.

By zooming out and giving you an overview of your search results, QueryPic encourages you to explore. You can follow a hunch, refine a research question, or challenge your own assumptions. If you want to dive deeper, just click any point on the graph to retrieve the first 20 matching articles for that year.

QueryPic provides suggestions, not answers. The newspaper database is not complete. The text of the articles, extracted by OCR software, is often inaccurate. The Trove search engine is optimised for discovery rather than analysis, so results can sometimes be deceptive. One of the most important lessons when working with online historical resources is to take nothing for granted. What is it that I’m actually searching? What’s missing? How does this search interface actually work?

Digital DIY
A second important lesson is that you don’t have to just take what you’re given. QueryPic is a free service developed not by the National Library of Australia (NLA), but by me. I wanted it, so I built it. Digital history and, more broadly, the digital humanities is energised by an exciting and liberating do-it-yourself ethos. If a particular tool doesn’t exist, make it. If it doesn’t do quite what you want, change it. And then share your results.

In this case, I was able to build QueryPic using an application programming interface (API) provided by the NLA. An API allows computer programs to talk to each other. The Trove API delivers search results not as web pages, but in a form that other machines can easily understand and manipulate. APIs make the creation of new digital tools much easier. Perhaps you’d like to use QueryPic to search New Zealand newspapers? Well now you can, because DigitalNZ has an API that searches the National Library of New Zealand’s Papers Past repository. All I had to do was plug it in.

But the determined digital historian isn’t dependent on APIs either. There are other tools and technologies that help you gather and manipulate information from large cultural datasets.

The Invisible Australians project is an example. It aims to bring together information about the workings of the White Australia Policy from a range of sources, including the National Archives of Australia (NAA). As an experiment, I harvested around 12,000 digital images of certificates used in the administration of the policy from the NAA’s collection database. These certificates usually include portrait photographs, so I modified a facial detection script I found via Google to identify and extract the portraits from the certificates.

The faces of those affected by the White Australia Policy: Tim Sherratt's project, Invisible Australians, used digital images to build a patchwork of photograph

What I ended up with was a wall of faces — a compelling and unsettling representation of the White Australia Policy in action. Instead of the records, it’s the people who are brought into focus.

Yes, it’s true — I was only able to do this because I know how to code (to write computer scripts and programs). But I taught myself and you can, too. Resources like The Programming Historian are a great way to get started.

In any case, you’ll often find that someone else has already done a lot of the hard work. Instead of starting from scratch you can modify their work and then, in turn, share your own additions. My changes to the facial detection script were picked up and improved by Chris McDowall at DigitalNZ. Chris created a portrait browser drawing on photos from Auckland Libraries called People From Another Time.

Of course, you don’t have to be able to code to explore the possibilities of digital history. Tools like QueryPic make it easy to get started. All you need is a sense of adventure and a willingness to experiment.

Capture, tag and release
Zotero, for example, is an open-source research manager that captures and organises your sources. It does more than just collect a set of bookmarks; with a single click it can harvest metadata from a wide range of publications, library catalogues and databases. If you’re researching newspapers in Trove you can easily grab article details and save them into your own research database along with a PDF copy and full text of the article.

Once you have your sources in Zotero you can add notes and tags, share your finds and create public or private groups. It’s not just a tool, it’s a collaborative research environment. And yes, it’s free, built by digital historians for all to use, share and improve.

An instant exhibition
Beyond grappling with the growing abundance of online sources, digital history also offers new ways of presenting your research. Have you ever wanted to do something more with the collection of documents or photographs you’ve assembled? Why not create your own online exhibition using Omeka?

With Omeka you can organise collections, add metadata and weave narratives. But beyond this basic functionality, a growing library of plugins created by the Omeka community brings new possibilities for enhancing and exploring your content. There are plugins for enabling transcription projects, for geolocating objects, or for visualising them on a timeline. You can even automatically import items from Zotero.

You can either download and install Omeka on your own web server, or set up an Omeka account at www.omeka.net. Basic accounts are free, so there’s nothing to stop you having a go right now.

