Arakelian 02/03/2014


Cooney, Charles, Glenn Roe, and Mark Olsen. "The Notion of the Textbase: Design and Use of Textbases in the Humanities." MLA Commons Literary Studies in the Digital Age An Evolving Anthology (2013): n. pag. Web.


“Our goal in this essay is twofold. First, we define and describe a selection of humanities textbases, paying particular attention to the design principles that underlie their structure and inform their use. Then, keeping the collections we are most familiar with in mind—those built for text analysis—we outline the scholarly approaches to textbases that we have historically supported at ARTFL (Project for American and French Research on the Treasury of the French Language) and that will continue to inform our decisions as we make use of new algorithm-based analytic tools. Essentially, we argue that traditional modes of humanistic scholarship must be in the forefront of our minds as we build and improve on standard text-analysis tools like PhiloLogic, the word-search-and-retrieval software developed at ARTFL to query its textbases. The text-mining and machine-learning applications we have begun implementing will not entirely replace traditional philological approaches to text analysis. Instead, as textbases continue to grow, we believe these new tools can offer necessary alternatives to scholars beyond simple word search, allowing them to discover and explore unseen connections among texts, trace the evolution of ideas over large collections and historical periods, or identify the contextual and intertextual relations of individual texts to any number of other works.”


textbases, encoding, digital humanities corpora, tradition, algorithm, unsupervised machine learning, cluster

Key Cites

Buzzetti, Dino, and Jerome McGann. “Electronic Textual Editing: Critical Editing in a Digital Horizon.” Text Encoding Initiative. TEI, n.d. Web. 1 Dec. 2009. <__>.

Flanders, Julia. “Electronic Textual Edition: The Women Writers Project: A Digital Anthology.” Text Encoding Initiative. TEI, n.d. Web. 1 Dec. 2009. <__>.

Crucial Quotes

“Textbase, or textual database, is a term that denotes a coherent collection of semi- or “unstructured digital documents.__1__ These documents, as textual artifacts, can come from literature, periodicals, historical or philosophical writings, legislative proceedings, or any other realm that produces written discourse” (1).

“The varied approaches to text encoding often reveal intellectual biases about the purpose of textbases and the ways in which computers and computer-assisted text analysis can best help scholarly activity in the humanities”

“The authority of tradition was by and large maintained during the eighteenth century, when it was recast as a form of knowledge that could be verified, most remarkably by the philosophies.”

Questions Raised by the Text

What short-falls can large-scale text analysis pose to scholars?