- XSEDE helps digital humanists map early modern British social networks
- Support from XSEDE's ECSS enable management of massive 200-year data mine
- National Endowment for the Humanities grant will open the project to the public
Most of us have heard about the six degrees of Kevin Bacon, based on the 'six degrees of separation' concept, which posits that any two people on Earth are six or fewer acquaintance links apart.
Now, there's a similar game in town: Who knew whom in Renaissance Britain?
This is the answer that the project Six Degrees of Francis Bacon seeks to uncover.
To model social networks, Carnegie Mellon University (CMU) and Georgetown University researchers created the digital humanities project to look at big historical data and see how often names are mentioned together in the history of scholarship.
"Our website allows scholars, students, and citizen humanists to improve the network — that is, add relationships to validate some of the inferences that we've made, and in many cases to reject some of the statistical inferences. This means that over time we get a more accurate representation of the social networks of the period," says Christopher Warren, associate professor of English at CMU.
In essence, Warren and his colleagues take the digitized history of scholarship and run it through algorithms to see how often any two names have been mentioned together. Machine learning then finds ways to model these past relationships. The hope is to find a model that accords with what they've learned through years of study and helps extend that knowledge to new networks.
Once you employ computational techniques you can start to assemble relationships at a much greater scale than a human could ever have in their head. ~Christopher Warren
Trying to understand the historical context of the major literary and artistic works and ideas that emerged in the 16th and 17th centuries is no easy feat. The 200-year period that brought us the Reformation and the scientific method also brought us Hamlet, calculus, and the microscope.
"The only way you can understand any of these things is by understanding the context from which they emerge. If we want to understand how we got something like Paradise Lost or the separation of church and state, it's going to require us to pay attention to who knew whom, how ideas spread, and how our modern world is, in crucial ways, a function of historical social networks," Warren says.
Not surprisingly, a project like this generates tons of data, so Warren and his colleagues became users of the Extreme Science and Engineering Discovery Environment (XSEDE) to help analyze the data and to expand their data sources.
To verify the validity of the relationships they had found in their primary source, the Oxford Dictionary of National Biography (ODNB), they leaned on XSEDE's Extended Collaborative Support Services (ECSS) at the Texas Advanced Computing Center (TACC). David Walling, the ECSS expert at TACC, is helping them see if the process they used on the ODNB can be extended to other corpora such as historical journals.
"If we look at a large corpus of journal articles and we ask how often names appear near one another do we get a similar result as the ODNB or do we get something different?" Warren asks.
Now, with ECSS on board, they have 15,000 people in the database and on the order of 100 million possible relationships.
"We couldn't develop the project in the direction that is most useful without ECSS," Warren says. "ECSS allowed us to extend our early work and move forward with it rather than spin our wheels. I can't say enough about the impact that the ECSS program has had for the project."
Although most advanced computing is used for the hard sciences like physics and chemistry, this project is a unique collaboration between computer scientists and humanists.
"I'm not sure we would have become involved in XSEDE if it were not for ECSS," Warren says. "The collaborative support model was attractive because someone with my background and training was intimidated by the prospect of using supercomputers. Knowing that there was a process to get our team up to speed was incredibly influential in bringing us on board."
Walling is fairly new to this type of deep collaboration as well. "To me, the interesting parts of the project are the machine learning algorithms and the statistical analysis that goes into building these social networks from text documents. I'm excited to have the chance to dig deeper into the consulting roles of different groups, getting to see what people are actually doing with our systems."
Since the website went live in September 2015, there have been more than 50,000 hits and about 500 active users who have created accounts and are contributing to the picture of the past. In addition, this research project is being taught in classrooms across the United States, and has been the focus of several workshops.
With the help of ECSS, the website has been redesigned so the researchers can release the code to the broader scholarly community so other people can build and create similar networks for other time periods.
"We're doing a lot of documenting of the existing code to make it more user friendly, helping anyone who might be interested in doing something similar," Warren concludes.