Open Mapping Resources

Post about Streeteasy switching from Google Maps

https://plus.google.com/118383351194421484817/posts/foj5A1fURGt

One of the biggest tools mentioned is TileMill, put out by MapBox

http://mapbox.com/tilemill/

And Open Street Maps

http://www.openstreetmap.org/

It looks like Foursqaure very recently made a similar move, also using TileMill/MapBox tools

http://blog.foursquare.com/2012/02/29/foursquare-is-joining-the-openstreetmap-movement-say-hi-to-pretty-new-maps/

Posted in Uncategorized | 1 Comment

Graph, Maps, and Trees/Project Idea

In the spirit of generating discussion I’d like to put an idea I had out there. I am sure it needs a lot of refinement, but I am curious whether other people see this as a feasible or informative line of work.

One of my observations on Moretti was that he seemed to start with a detailed understand of the material (via close reading), and then generate his higher level views based on patterns that he recognized. This may be a kind of distant reading, but still requires a large amount of knowledge and insight into the very particulars of the subject matter (who here can name the 30-odd categories of Victorian literature?). How could we go the other direction, and generate our insights without presupposing prior detailed knowledge?

My idea was using word frequencies to date books. We could reasonably expect some words or phrases to occur more in certain time periods than in others, whether idioms, references to historical events, or new words. Instead of having an expert in a particular period generate a list of key words, we could take a sampling of writing (inside and outside the canon) from a particular period, run it through a computer program, and build a statistical model of the frequencies in which words occur. Then, given a writing sample of an unknown date, we could determine how similar the word frequencies are to our known samples and say how likely it is the new sample is written in the same time period. If we took known samples from multiple time periods, we could generate a frequency models for each period. Then given a writing sample from an unknown time period, we could determine which model it most resembles, and conclude that the writing is likely from the same time period.

If this approach actually works, would it be possible to look at other quantitative aspects of writing, and to apply them to groupings other than date written? There are a number of statistical models that look at the patterns of words themselves (which words are more or less likely to follow others). Could we now generate a model for something as specific as a writer, and determine whether a given sample is the work of a specific writer?

While this may seem far fetched, these techniques have been extremely successful in creating spam filters for email. You take an unknown sample and try to determine if it is similar to known “spam” or to “real” e-mail. This is why spam first starting misspelling things like “viaagra”, then putting those lists of random words at the bottom of email, and now finally putting complete sentences taken from actually books. I will leave out the math for now, but two of the well studied methods are Bayesian filtering and Markov modeling.

Hopefully I haven’t mortally offended anyone by proposing we treat the canon of western literature like spam . . .

Posted in Uncategorized | 4 Comments

Lev Manovich’s “Trending: The Promises & Challenges of Big Social Data

Lev Manovich’s article in DinDH on big social data poses a challenge to DHers to consider the ways in which they (we) can begin to use huge data sets (e.g., Flickr photos or YouTube videos) to deepen and improve our humanities & social science research. LM draws an important distinction between “surface” and “deep” data: an example of the former would be U.S. census data, which offers a once a decade “snapshot” at a macro level of the American population (though, I would suggest, that when you look at census data disaggregated (i.e., on the original enumerated forms filled out by the census takers), you get something closer to “deep” data about individuals); the latter (“deep”) data he illustrates by using a psychologist’s engagement with an individual patient over time to generate a full sense of an individual’s life (not exactly data we have access to, though). He goes on to suggest that the explosion in social media has blurred the boundaries and distinctions between deep and surface data sets.

That said, LM then offers four “objections” to the optimistic view that available social media data will usher in a brave new world of new research vistas. First objection is the obvious fact that large social media companies (e.g., Google, Facebook) limit researchers’ access to their data. Second, LM cautions that we need to be careful about issues of authenticity when we read data over social networks, because so much of what individuals do on social media is performative and thus not necessarily an accurate depiction of their lives. Third, he rejects the current notion that we no longer have to choose between getting and using deep or surface data. LM reminds us that these types of data are, indeed, different and that their uses and purposes can and should vary, depending on the particular research questions researchers decide to ask. He also suggests that we not allow ourselves to be blinded by the sheer size and availability of large data sets in terms of framing (or even worse) limiting the kinds of humanities and social science research questions we choose to ask. And fourth, because many big data questions are technically and organizationally complex and thus difficult to solve, LM notes that a final reason not to be optimistic is that it’s hard for humanists to collaborate with computer scientists, etc., especially across large disciplinary boundaries and silos.

LM notes that these four objections hardly exhaust the possible problems and limitations of big data access (he notes privacy concerns as one big area he hasn’t addressed). That said, he remains, in the conclusion to the piece, optimistic about what can still be done within the limited parameters he describes (“the possibilities are endless”), with the added qualification, “if you know some programming and data analytics and also are open to asking new types of questions about human beings, their social lives, their cultural expressions, and their experiences.” (473)

My final question after finishing reading LM’s very smart article was whether technical limitations on the part of researchers is the biggest problem for DH or whether DHers aren’t yet asking the right “new types of questions?” What do you think?

Posted in Uncategorized | Tagged , , , | 1 Comment

Franco Moretti: Graphs Maps Trees

Moretti’s Graphs Maps Trees opens the door to endless possibilities. I really enjoyed it because it discusses the very things I have always found to be missing in literary criticism. Although the field is largely based on close reading, why can’t we explore the myriad of historical, social, and political contexts of a book through data collection? Better analysis of these aspects would actually greatly enrich any book. Such cataloging and investigation into books also opens up literature to more than just the “canons” as Moretti states. The interdisciplinary nature of his book is perhaps what frightens the traditional literary field, but it is precisely this reason that his ideas are so novel and exciting.

One of the things I thought of as reading this book was that it would interesting to see data on literary criticism. That too, is highly representative of society and culture and would be a nice supplement to a data analysis of books from that era. For example, literary criticism on The Tempest has changed drastically over time, and each criticism speaks to its time period. This is one of many ideas and like I said, the possibilities are endless.

Posted in Uncategorized | 2 Comments

Graphing and mapping literature

Moretti presents a novel idea of doing research. Even though I still believe that in some cases and in some departments, especially English, it’s hard to make academics see quantitative research as academic research, I’ve witnessed a new trend shaping up in the field and I’ve even heard of dissertation that feed off of the very same idea. So, it is happening.

The idea of challenging the present “narrow scope” practices of working only with a limited number of literary works (the canon) and viewing them as representatives of a particular historical period, such as the Victorian literature, is not enough. By widening that scope and making room for a larger representation of the works published during that time would require a shift in the way we are taught to view literature and even the reading and appreciation of literature, but at the same time it would give the other works that have slipped into oblivion a chance to resurface. I can see how this is possible with factual information, because once one enters the domain of close reading and interpretation, hence the world of tropes, it’s a slippery slope. This kind of research is more appropriate for something that can be checked and verified against some existing document. Moretti himself acknowledges the limitations of distant reading when he states, “it provides data, not interpretation.”

Being someone who has to prepare for QPs and having being assigned a reading list to work with, I can see the necessity for such an approach. It’s similar to a reading group, where everyone reads a particular article and there is a sharing session, because it’s practically impossible to read everything within the allotted time frame. The “close reading” approach is much more convenient in a culture where the number of aspiring writers is limited to “white men” and an extremely limited number of women. Since literature has grown to embrace other cultures, genders, races, genres and modes of presentation, it’s hard to keep up with every new book that is released. That’s why we specialize in a particular field. On the other hand, it would be good to have a more comprehensive view of what’s out there, or what has been at a particular point in time. In this sense, Moretti’s proposal presents a different way of looking at literature.

I have to say that this approach gives literature a new life. It no longer resides under the domain of literature as a work of art. It opens the door for other disciplines to partake of the knowledge literature has to give, but not necessarily the aesthetic appreciation. It lends literature a historical flair, which I think is essential in this day and age.

Posted in Uncategorized | Comments Off on Graphing and mapping literature

Visualizing Information. Project: “Mapping the Republic of Letters”.

Please take a look at the project: “Mapping the Republic of Letters”. It started in 2008 and was possible due a three- year Presidential Grant for Innovation in Humanities from Stanford University. The proposal of the project was done by people from different disciplines: History, Classics Studies, Computer Sciences, English and the help of interactive research tools. Collaboration was extremely important to be able to “map” 300 years and 100.000 letters.  https://republicofletters.stanford.edu/

In the website they state that “Making connections and resolving ambiguities in the data is something that can only be done with the help of computing, but cannot be done by computing alone…The Web is our network for exchange of ideas. We share our work in progress with our core team of researchers and the larger community of collaborators through a private wiki and will begin sharing our process and progress publicly through this site. As an outcome of this project, we are committed to sharing our data and our analysis tools, whenever possible, open source and over the Web.” They recognize that the maps are not totally complete because information has been lost and the methods of representation are imperfect, but visualizing information gives a better context to understand that data.

Even though when we look at a map we know that it’s a representation of something, most people tend to think about it as a real illustration of a place, for example. What we don’t usually think it’s that in a map there is also editing of data, there is leaving black spaces and prioritizing some information over other. As Johanna Drucker (2006) said “No visualization can be identical to what it represents.” In her article “Humanities Approaches to Graphical Display” from Digital Humanities Quarterly she points put the different dangers that humanists can encounter when visualizing data.

I think that the project “Mapping the Republic of Letters”, it’s an excellent example of collaboration, open source and the interaction human-computer to create visual representations. I enjoyed reading about it and watching the videos. But the more I went into it, the more dangerous I found the use of maps as a tool.

Please take a look!

What do you guys think? Luckily we have a class with people from different disciplines, what do librarians think about a project like that? What about the ones that are teachers or have a background in pedagogy? Would you use something like this for a class? Any ideas from the computer science student? I’m sorry if I’m forgetting the disciplines of the other peers. I think it’s very interesting how people criticize visual information, especially because visual literacy it’s not a part of the student curriculum.  Sometimes I found interesting things like the project named above and I would like to know what people from different backgrounds think about….

Posted in Uncategorized | Tagged , | Comments Off on Visualizing Information. Project: “Mapping the Republic of Letters”.

Graphs Maps and Trees

As I was reading Franco Moretti’s book, the first thing that struck me was that I do visualize the geography of each novel or book that gives geographical references that I am reading. A map naturally forms in my head to make sense out of the events on the page. In fantasy books such as the LOTR trilogy, maps are extremely helpful to visualize the world you are reading about in print. I wouldn’t have been able to keep the Shire apart from Mordor if I hadn’t looked at the maps. This is a juvenile example but I think there is truth in the validity of maps to assist in our understanding and further our analysis of text. Moretti states, “graphs, maps, and trees place the literary field literally in front of our eyes-and show us how little we still know about it.” After reading his book I’m left with the question of whether the graphs, trees, and maps truly assist in our understandings and further our inquiry? As a person from a humanities background I often find myself glossing over maps, graphs, and trees and heading straight to the text. Even at times in this text, which is all about graphs, maps, and trees I had to be diligent in actually looking at the illustrations. But to be honest I found his use of trees to be the most difficult to engage with. I understand that he is attempting to draw this connect between science, history, and literature (which makes sense in the context of studying D.H) but I just did not find it interesting. This makes me wonder if I fall into the category of a traditional humanist? Is my disinterest in visual descriptions of text limiting what could be thought of as my engagement with a multi-dimensional text? Does my attraction to strictly text exemplify the dilemma that is occurring currently in the academy with resistance to expanding traditional notions of the humanities?

Posted in Uncategorized | 1 Comment

Digital Citizenship for the Next Generation

Thought people might find this interesting:

http://mindshift.kqed.org/2012/03/teaching-and-modeling-good-digital-citizenship/

Posted in Uncategorized | 1 Comment

Teaching the Digital Humanities

In section five it seems to be generally stated by the majority of the authors that the digital humanities has been focused on research more than teaching/pedagogy. In order to make the next transition into the academy it is suggested that pedagogy is the next step. Luke Waltzer states that because the digital humanities values research and scholarship over teaching, learning, and curriculum development it is hard to distinguish it from other academic disciplines. This is one hang up for how to prove that the digital humanities is a unique field of study that should respectively be viewed as such instead of a subfield of the humanities. But what seems to be in flux is how to develop specific curricular design that would transition it into a specific field of study. Obviously the digital humanities is attempting in some cases to address this issue. Our class and the development of an entire certificate program in the CUNY system is proof of such efforts. The digital humanities need to take more steps like this in order to prove that it is a worthy and unique field that should be further developed. NEH seems to be a road block to such advancement because of their funding of projects that do not focus on curriculum development. How are schools supposed to further design departments and programs if there is no funding for research and development? Also, is it important enough to the digital humanities field to fight for the  competitive grants? If the next generation of digital humanists who are earning graduate degrees will theoretically need to be able to teach technical facility and critical pedagogical understanding of technologies, which is stated by Alexander Reid, they will need to be taught themselves. Do we expect them to self teach? We must figure out how to support research on curriculum and pedagogy if we expect the academy to further fund and develop the digital humanities as a separate field of study.

Posted in Uncategorized | Comments Off on Teaching the Digital Humanities

LOD-LAM-NYC and the Digital Humanities: The Greenhorn Edition

On February 23rd, I attended a morning plenary session on Linked Open Data for Libraries, Archives, and Museums (LOD-LAM), and Prof. Gold asked that I post a few comments here.  The session was co-organized by the Metropolitan New York Library Council (METRO), NYPL Labsand NYU.  I chose to attend the session for a few reasons:

1. it was free
2. it was at a time I could arrange child care
3. in library school I often heard murmurs about linked data, but being in the special collections track didn’t actually spend any time learning about it in school

So, I was a little surprised when Prof. Gold asked me to report back because I didn’t immediately see the connection to DH.  The session I attended was specifically geared towards cultural heritage organizations–which, one could say, house the raw materials of humanities research.  As I soon learned, though, it is the idea of linked open data itself that is relevant to DH.  It is a way of dismantling the silos of metadata that each institution has built so that our systems can communicate with each other.  That last part is key.  In order for systems (i.e., not people) to communicate, things need to be more uniformly identified.  

In come the Uniform Resource Identifiers (URIs), which, when combined with the Resource Description Framework (RDF), accounts for the “linked” part of all this.  URIs are like unique IDs for people/events/things and the relationships between them.  They serve as the translators between different sets of data.  When an institution releases their information as linked data, it has been mapped out to URIs that allow other systems to then link their data to the same URIs.  It sounds complicated, and there were a lot of scary flow charts up on the screen at various points, but it is really just an effort to translate idiosyncratic descriptions into consistent, shareable data using standardized reference points.  It also provides a level of disambiguation, so that a search for Venus (the planet) can be linked to other information on Venus (the planet) instead of Venus (the tennis player) or Venus (the goddess).  It is not hard to imagine the possibilities for research in such a world.

The event was interesting and insightful, although the presentations ran long and the panel discussion was cut short.  After a brief intro to LODLAM by Corey Harper (Metadata Librarian at NYU and one of the organizers of the session), Rebecca Guenther walked us through controlled vocabularies and showed us some of the resources the Library of Congress has developed in this area.  Todd Carter showed us the innovative way that his company, Tagasauris, crowdsourced metadata for Magnum Photos.  He spoke of breaking down metadata creation into microtasks, such as asking a user to click whether a photo was taken in day or night.  The real fun began when Evan Sandhaus gave us a tour of how the New York Times is using linked data, in which we got a chance to see some of these principles in action.  Finally, we were treated to a demonstration of Google Refine as part of the Free Your Metadata presentation by Seth Van Hooland and Erik Mannens.  The good folks at METRO have posted video, photos and presentation slides from the event for anyone interested in seeing more.  

Part of DH work involves turning textual information into machine-readable data, whether for data mining, geospacial analysis, or creating thematic archival collections.  Texts are encoded to enable them to be processed by computers, allowing levels of analysis that we never before imagined.  In a way, linked open data is the web equivalent of this.  Early web pages consisted mainly of textual information using HTML and CSS to organize and display that information.  Many websites now draw on large datasets to function, and linked open data is a vision of how to get these datasets available and in communication with each other to facilitate the development of better tools and provide a more robust and enjoyable web experience.   If this all sounds rather utopian, you’ll find that the charge of over-hyping its transformational potential is a criticism that the LOD crowd has in common with DHers.

Any other thoughts on the relationship between linked open data and digital humanities?  Please share!  And feel free to correct me in any of my (admittedly novice) descriptions above.

Posted in Uncategorized | Comments Off on LOD-LAM-NYC and the Digital Humanities: The Greenhorn Edition