On February 23rd, I attended a morning plenary session on Linked Open Data for Libraries, Archives, and Museums (LOD-LAM), and Prof. Gold asked that I post a few comments here. The session was co-organized by the Metropolitan New York Library Council (METRO), NYPL Labs, and NYU. I chose to attend the session for a few reasons:
1. it was free
2. it was at a time I could arrange child care
3. in library school I often heard murmurs about linked data, but being in the special collections track didn’t actually spend any time learning about it in school
So, I was a little surprised when Prof. Gold asked me to report back because I didn’t immediately see the connection to DH. The session I attended was specifically geared towards cultural heritage organizations–which, one could say, house the raw materials of humanities research. As I soon learned, though, it is the idea of linked open data itself that is relevant to DH. It is a way of dismantling the silos of metadata that each institution has built so that our systems can communicate with each other. That last part is key. In order for systems (i.e., not people) to communicate, things need to be more uniformly identified.
In come the Uniform Resource Identifiers (URIs), which, when combined with the Resource Description Framework (RDF), accounts for the “linked” part of all this. URIs are like unique IDs for people/events/things and the relationships between them. They serve as the translators between different sets of data. When an institution releases their information as linked data, it has been mapped out to URIs that allow other systems to then link their data to the same URIs. It sounds complicated, and there were a lot of scary flow charts up on the screen at various points, but it is really just an effort to translate idiosyncratic descriptions into consistent, shareable data using standardized reference points. It also provides a level of disambiguation, so that a search for Venus (the planet) can be linked to other information on Venus (the planet) instead of Venus (the tennis player) or Venus (the goddess). It is not hard to imagine the possibilities for research in such a world.
The event was interesting and insightful, although the presentations ran long and the panel discussion was cut short. After a brief intro to LODLAM by Corey Harper (Metadata Librarian at NYU and one of the organizers of the session), Rebecca Guenther walked us through controlled vocabularies and showed us some of the resources the Library of Congress has developed in this area. Todd Carter showed us the innovative way that his company, Tagasauris, crowdsourced metadata for Magnum Photos. He spoke of breaking down metadata creation into microtasks, such as asking a user to click whether a photo was taken in day or night. The real fun began when Evan Sandhaus gave us a tour of how the New York Times is using linked data, in which we got a chance to see some of these principles in action. Finally, we were treated to a demonstration of Google Refine as part of the Free Your Metadata presentation by Seth Van Hooland and Erik Mannens. The good folks at METRO have posted video, photos and presentation slides from the event for anyone interested in seeing more.
Part of DH work involves turning textual information into machine-readable data, whether for data mining, geospacial analysis, or creating thematic archival collections. Texts are encoded to enable them to be processed by computers, allowing levels of analysis that we never before imagined. In a way, linked open data is the web equivalent of this. Early web pages consisted mainly of textual information using HTML and CSS to organize and display that information. Many websites now draw on large datasets to function, and linked open data is a vision of how to get these datasets available and in communication with each other to facilitate the development of better tools and provide a more robust and enjoyable web experience. If this all sounds rather utopian, you’ll find that the charge of over-hyping its transformational potential is a criticism that the LOD crowd has in common with DHers.
Any other thoughts on the relationship between linked open data and digital humanities? Please share! And feel free to correct me in any of my (admittedly novice) descriptions above.