The Stanford Literary Lab
by Seth Winger
In 2004, English professors Matt Jockers and Franco Moretti teamed up to teach ENGLISH 366E: “Electronic Data, Literary Theory,” a graduate-level seminar on the burgeoning overlap between the study of literature and the field of digital humanities—combining traditional discussions of novels or plays with computational tools and analysis algorithms.
Only one student enrolled.
But Jockers and Moretti were undaunted and, confident that there was something to this new way to look at classic works, they continued to forge new ground in the marriage of written word and compiled code. Class sizes grew and, eventually, piqued enough interest to sustain projects beyond the limitations of the ten-week quarter. The Stanford Literary Lab was founded out of one of Jockers and Moretti’s seminars in 2010, a loose collective of graduate students, lecturers, and professors with interests in computationally analyzing works of literature. The first publication came in January of 2011—a pamphlet with the surprisingly honest title “Quantitative Formalism: An Experiment.”
And now, at the Lit Lab meetings in a seminar room on the top floor of Margaret Jacks Hall, enrollment is certainly not a problem. The room is filled to bursting, every seat plus extra folding chairs occupied. Jockers and Moretti have their choir; at last, it’s time to preach.
As befits a Stanford laboratory, a Lit Lab meeting certainly has the feel of a collaborative research effort. Sitting beside Stanford professors and grad students are visiting scholars from San Jose State and the University of Kansas, a professor of literature from France, and a doctoral student from Columbia University whose voice materializes out of a Skype-connected computer monitor in the corner. The whiteboards are covered in vertex-and-edge drawings of plays published in English, French, Danish, and ancient Greek. Jockers studies Irish-American literature, Moretti did his doctoral work in Rome, and the two are proud that their pamphlets have together been translated into more than a dozen languages, from Italian to Turkish.
Jockers and Moretti began the lab in part to explore Moretti’s idea of “distant reading”—essentially, literature analysis without any actual reading at all. You can leave that to the computer, which counts key words and phrases, clusters data, classifies it. These machine learning techniques can crunch through thousands—tens of thousands—of books in hours, reducing them to any of a number of key components for the enterprising literature professor to explicate. To open today’s meeting, Moretti explains the methodology of his most recent work on the networks and interactions between characters in dramas from Antigone to Macbeth. There are three main goals:
- Visualization, making the internal differentiations in play perceivable, or even, at times, exaggerated;
- Articulation, a study of the morphology of the play and how its parts fit into a cohesive whole; and
- Semantics, or how to reconstruct the events of a play from the meanings of its words.
“But,” Moretti warns, standing in front of the room’s single large table, his deep, Christopher Lee voice echoing from behind a graying beard, “this process is not a machine.” There’s human analysis behind it. Moretti then spends the next two hours talking about “network theory,” “structural equivalence modeling,” and “anisotropic intensity of dialogue.”
Machine learning algorithms have found a new disciple from a novel discipline. Turing would be proud.
Work at the Lit Lab currently focuses on nineteenth-century and earlier works—things that have fallen into public domain and exist in a nicely curated digital corpus. But that is far from limiting. Moretti believes techniques like the Lit Lab’s can increase the sheer quantity of literature analyzed tenfold.
“It’s the same thing that’s happening in science under the label of ‘big data,’” adds Jockers, one of the English department’s Academic Technology Specialists in addition to serving as Moretti’s rock-climbing, rugby-playing, ultra-distance-running counterpart. “You couldn’t do this ten years ago. It’s opened up a whole new realm of questions and dissertation topics.”
And though the lab may seem to straddle the “techie” and “fuzzy” divide on campus—Jockers’ personal blog is equal parts Herman Melville and Latent Dirichlet Allocation—Moretti and Jockers know just where they stand.
“We’re not writing papers about algorithms,” Jockers says. “Our interest is about applying those algorithms to something new.” Can you use natural language processing to discern the sentiments expressed by characters? Can principal component analysis decipher changes in the structure of dramas over the form’s twenty-five century history? Can machine learning teach us about the human condition?
Jockers and Moretti are among the first to admit that they don’t know—but that the results are promising. The Lit Lab’s publications (“Pamphlets” 1, 2, and 3) are written with a candor and narrative that’s uncommon in all forms of academic writing, but especially so in the humanities. There’s an acceptance of failure in them, and a true chronicling of the research behind the pamphlet.
This hasn’t been universally embraced by the literary world, nor has their quantitative approach reached full maturity. One critical editorial ridiculed Moretti’s model for its ability to deduce that the protagonist of Hamlet was (wait for it) Hamlet. But that was a half-sentence sidebar in Moretti’s larger discussion of the relationships of Hamlet—though even he admits that the discussion eventually “drifted” back to the qualitative.
And aye, there’s the rub—he admits his model’s flaws. Moretti believes traditional literary analysis focuses too much on questions that have known answers. “Failure is the present state of your knowledge,” he says, and there’s nothing wrong with that. “In literary studies, this is not the tradition. Knowledge has been an all-or-nothing affair, and scholarship not incremental.”
“Our work tends to build on previous work,” says Jockers, meaning the models and techniques and insights will only get better. It’s a fairly scientific method.
Ultimately, the Lit Lab is built in emulation of the natural sciences—hence the “lab” title in spite of the Margaret Jacks address. So, too, are its methods of extracting meaning from its data. The notion of a “project” or an “experiment” are rare in the humanities, which center on papers and theses. The Lit Lab is attempting to create new paradigms in literary research, using established paradigms in science: exploration, hypothesis testing, model building.
Perhaps even more impressive, it’s doing this while run off of passion alone. No grad student can come and get a doctorate in the Lit Lab; anyone working on a project is there because they want to work on that project in addition to their (funded) dissertation. But because of this, Jockers and Moretti can work with grad students from all over the country. They’re eager to bring in new students or colleagues and expose them to their methods—Moretti calls them “envoys,” who can spread his teachings around the country—and while Jockers and Moretti are mentors and teachers for everyone who comes through the lab, the two directors are primarily equals and collaborators.
One form of literary scholarship or another is not going to die out any time soon, but what Jockers and Moretti hope to prove is simple: that quantitatively studying a massive collection of works can illuminate the evolution of literature over long periods of time, or the change of themes over vast geographical distances, or even just something novel about the representation of social status in Hamlet—the same things traditional, qualitative literary analysis sets out to investigate, but now aided by new tools, new methods, new minds.
In his pamphlet on network theory, Moretti compares western works like Our Mutual Friend, by Charles Dickens, to eastern ones like The Story of the Stone, a Chinese novel. He found profound differences in the form of the stories, how their protagonists interact with the other characters: Dickens’ networks are symmetric, built on pairs of interacting characters, while Stone’s networks have no symmetry at all. In his conclusion, Moretti writes:
A different role for the protagonist, resulting from a different set of narrative relations: what networks make visible are the opposite foundations of novel-writing East and West. One day, after we add to these skeletons the layers of direction, weight, and semantics, those richer images will perhaps make us see different genres—tragedies and comedies; picaresque, gothic, Bildungsroman…—as different shapes; ideally, they may even make visible the micro-patterns out of which these larger network shapes emerge.
Moretti is palpably excited about the future of the field, and excited about a younger crop of scholars bringing “scientific imagination” to the discipline of textual interpretation. “We have to prove these new tools and new mind frame can produce literary scholarship as good or better than traditional scholarship,” says Moretti. “This is the generation that is going to change literary study.”
One pamphlet at a time, if necessary.