Jump to ratings and reviews
Rate this book

Distant Horizons: Digital Evidence and Literary Change

Rate this book
Just as a traveler crossing a continent won’t sense the curvature of the earth, one lifetime of reading can’t grasp the largest patterns organizing literary history. This is the guiding premise behind Distant Horizons , which uses the scope of data newly available to us through digital libraries to tackle previously elusive questions about literature. Ted Underwood shows how digital archives and statistical tools, rather than reducing words to numbers (as is often feared), can deepen our understanding of issues that have always been central to humanistic inquiry.  Without denying the usefulness of time-honored approaches like close reading, narratology, or genre studies, Underwood argues that we also need to read the larger arcs of literary change that have remained hidden from us by their sheer scale. Using both close and distant reading to trace the differentiation of genres, transformation of gender roles, and surprising persistence of aesthetic judgment, Underwood shows how digital methods can bring into focus the larger landscape of literary history and add to the beauty and complexity we value in literature.
 

200 pages, Paperback

First published February 14, 2019

6 people are currently reading
88 people want to read

About the author

Ted Underwood

4 books3 followers
Ted Underwood is a professor in the School of Information Sciences and also holds an appointment with the Department of English in the College of Liberal Arts and Sciences. After writing two books that describe eighteenth- and nineteenth-century literature using familiar critical methods, he turned to new opportunities created by large digital libraries. Since that time, his research has explored literary patterns that become visible across long timelines, when we consider hundreds or thousands of books at once. He recently used machine learning, for instance, to trace the consolidation of detective fiction and science fiction as distinct genres, and to describe the shifting assumptions about gender revealed in literary characterization from 1780 to the present.

He has authored three books about literary history, Distant Horizons (The University of Chicago Press Books, 2019), Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (Stanford University Press, 2013), and The Work of the Sun: Literature, Science and Political Economy 1760-1860 (New York: Palgrave, 2005).

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
17 (37%)
4 stars
19 (42%)
3 stars
8 (17%)
2 stars
1 (2%)
1 star
0 (0%)
Displaying 1 - 9 of 9 reviews
Profile Image for Yennie.
85 reviews4 followers
December 25, 2019
Overall - I enjoyed this book! Prof. Underwood has a very pleasant literary style (which makes sense, I guess, since he is a lit prof). He writes in a way that is not difficult to understand or process, and I think that he is able to reach the "delicate balance" he mentions at the end of writing about a very interdisciplinary process/style while not getting too bogged down in the details. As a reader who is well versed in both historical ideas and computer science concepts, this book was quite a pleasure to read. The technical details of his methods were explained in a way that was easy to understand but also not too simplified.

He talks about his "long-timeline" and his "distant reading" methods without losing sight of the ACTUAL reason that he (and other literary and humanities scholars) are conducting research. These statistical and machine learning tools are just a means to an end - which is to say, to ask humanities questions. He stresses that we cannot and do not need to black box these machine learning algorithms, and he strives to "crack them open" in his inquiries. Also, he stresses that there is no need to use the most complicated statistical model out there. Sometimes, simplicity is good.
Profile Image for Michael.
264 reviews57 followers
June 24, 2020

This is a good book that could have been great. Ted Underwood is a doyen of 'distant reading', and writes one of the most popular and interesting digital humanities blogs. His fans—me included—waited years for Distant Horizons. It is characteristically bold, modest, clearly written and interesting. Underwood has assembled several corpora of novels and poems, and makes intriguing arguments about how novelistic description, the genre-system of modern fiction, book reviewers' attitudes and the portrayal of gender have all changed over the last 200 years in Anglo-American literature. Throughout the book, he uses logistic regression to model the data, producing elegant graphs and (hopefully) reproducible results.

The book's missing greatness lies in the presentation of the argument. In essence, Underwood presents his evidence in a seriously incomplete and often ambiguous form. One example should suffice to indicate the problem. Underwood tries to show that his models are valid by quoting the accuracy: for example, he has trained one model that can distinguish detective stories from other fiction correctly 90% of the time. But this is not a sufficient strategy to validate a model. Was this accuracy figure calculated for all the data, or were validation and test sets extracted from the corpus prior to training? How significant is 90% accuracy given the size of the test set? (Achieving 90% accuracy on 1000 examples is better evidence the model works than achieving 90% accuracy on 6.) And what exactly does the model understand by 'detective stories'? At times, Underwood does peek under the hood, and there are a few instances throughout the book where he quotes a passage or two to illustrate the workings of the model. But really there are very few instances like this, and at the end of the book, I was left with the distinct impression that Underwood had withheld a huge amount of his work, and only thrown a few tidbits into the monograph.

There was an exception to this: Chapter 4, on the 'Metamorphoses of Gender', was far more detailed. Underwood went into some detail about the strategies he used to validate his models, and analysed a whole series of examples to try and explain how his model related to the reality that it modelled. Even here, I would have liked more statistical tests, and more data tables, to make clear exactly what had been modelled and how, but overall the discussion was far more convincing.

Underwood concludes the book with a defence of 'distant reading' that perhaps explains why he adopted this style for the book. He expresses some anxiety that putting too many numbers into a work of literary history will turn literary colleagues away, and says at one point that a technical Appendix is probably the best place to put the really hard mathsy stuff. In my view, he has been misled by his fears. If he presented his statistical arguments in full, in their strongest light, and explained why his modelling and validations strategies worked in concrete detail, his arguments would be more convincing. To make room for this, he could significantly cut down the really quite abstract arguments that he makes in favour of distant reading. Even in the technical Appendix, he goes into very little detail at all about his methods, taking the opportunity to yet again make general arguments about why literary scholars should not be scared of statistics.

This remains an important book. The gender chapter in particular is a masterpiece, and the first chapter is one of the best defences of distant reading ever written. Underwood's incredible humility is also attractive, in a field where Wunderkinder often make extravagant claims about their digital research, and invent silly mystical-scientific names for their normally rather mundane methods. Underwood is a guy who really knows his stuff. He really cares about getting the right answer. I'm sure he's got it—I just wish he had shed a little of his humility, and been clearer about how he did so!

Profile Image for Schedex.
54 reviews18 followers
Read
March 29, 2023
Like the scientists of Jurassic Park, the first four chapters of this book have been “so preoccupied with whether or not they could” use numbers to learn something about literary history, that “they didn’t stop to think if they should.”
Generally, historical research is less risky than cloning dinosaurs. But applying numbers to the literary past, in particular, remains controversial enough that an analogy to Jurassic Park is not absurd.
Profile Image for Hobart Mariner.
446 reviews15 followers
December 1, 2023
Using machine learning to identify trends in literature. Some of the conclusions he reaches (the rapid dwindling of female characters, especially) took me by surprise. Others, like the relative stability of a text classifier to sort things by genre, were less surprising, but well-argued. Maybe it would have been better to trim the projects which succeed mainly in recovering well-known ideas of literary history via statistical methods, but perhaps that's not feasible. Much time spent on methodological apology. "Here's a statistical model...now don't worry, I'm a good humanities scholar and I know that numbers aren't really objective...but hear me out." Repeat.

The critical product he envisions, writing in which interesting literary expertise, personal taste, and stylistisc charm blend easily with various kinds of statistical modeling, certainly sounds appetizing. Yet the book itself, so caught up with defending the method and the digital humanities more broadly, doesn't provide a really strong example of what an essay in this form might look like. Wish he would just sin boldly, write that essay, and let haters go to hell...

Also wish there had been some more exploration of unsupervised methods, in particular clustering texts to see if any interesting latent sub-genres are out there. The final chapter is very frustrating. He cites unsupervised methods only as examples of methods that are naively assumed to be "objective" but come on! When you run a clustering algorithm, you don't have to consider that "delegating interpretation to a machine." It's just trying to minimize an objective function the same way that regression trainers are; the resulting clusters are up to the scholar to interpret, just as the coefficients in the regression models have been interpreted earlier in the book. (Now, it's entirely possible that those clusters are going to be difficult to interpret or opaque or otherwise useless -- it might even be likely! But they shouldn't be excluded out of hand.) Also, very minor nitpick, but it's unclear how "the mathematical sublime" is caught up in network diagrams.... These parting shots in the last chapter are frustrating, especially because in many places he seems to have a pretty sound grasp of the basics at least of regression and NLP. Would like to see how you could update this with the advent of transformer and LLM approaches.

Profile Image for Peter Boot.
287 reviews3 followers
January 7, 2022
Briljant boek. Had ik eerder moeten lezen. Niet alleen heel goed vanwege de interessante conclusies over de (Engelstalige) literatuurgeschiedenis, maar vooral vanwege de methode, die op veel andere vragen kan worden toegepast. Ook heel inzichtgevend is de manier waarop Underwood zijn methode uitlegt.
Geeft antwoord op veel vragen waar ik mee rondliep, bijvoorbeeld hoe een tekstmodel met duizend variabelen kan worden getraind zonder overfitting. Ik weet niet of ik gelukkig ben met de keuze voor voorspellende in plaats van verklarende modellen die hier achter ligt, maar daarvoor geeft het boek een aardig leeslijstje op.
Wat ook duidelijk wordt: hoe ontzettend veel voorbereidend werk aan de bronnen er nodig is om deze analyses te kunnen uitvoeren. Werk dat gedeeltelijk wordt gedaan door infrastructurele diensten (in het Engelse taalgebied bijvoorbeeld door HathiTrust), maar waar de onderzoeker ook zelf een behoorlijke bijdrage aan zal moeten leveren.
Al met al heel inspirerend en bewonderenswaardig.
Profile Image for David Eisler.
Author 1 book5 followers
July 22, 2019
Excellent introduction to the possibilities of quantitative literary history (or computational literary studies, or distant reading, or whatever you want to call it). Underwood's prose is clear, the case studies are interesting (and some surprising), and he does the yeoman's work for literary scholars interested in these techniques by including two illuminating chapters on data and methods. Underwood's intent seems to have been at least partly to use the book as a place for curious people to wade into the subject comfortably, ensuring that the results were front and center rather than the methods themselves.
Profile Image for Paul.
832 reviews83 followers
April 9, 2021
This ended up being a fascinating book about combining quantitative methods like modeling and algorithms and other things I barely understand with the study of literature, which makes a lot more sense to me. Underwood uses computers to analyze the character traits of novels to explore genre boundaries, gender depictions, and elite reception. His findings are really interesting; the various chapters delving into the more theoretical questions surrounding digital humanities are ... less so.
Profile Image for Adina.
328 reviews
August 1, 2021
A fascinating exploration of statistical analysis to illuminate longer-term trends in the history of literature. Underwood often seems overly apologetic about his approach in response to the passionate debates within the field of literary criticism. The tone is a bit bewildering to me given my feeling that it’s never a bad thing to look at old things in a new way.
Profile Image for Lawrence.
680 reviews20 followers
October 4, 2020
I read slowly and took a LOT of notes, and I can tell I'll be returning to these ideas often!
Displaying 1 - 9 of 9 reviews

Can't find what you're looking for?

Get help and learn more about the design.