Imagine a common movie a hero confronts a villain. Captioning such a moment would at first glance seem as basic as transcribing the dialogue. But consider the choices How do you convey the sarcasm in a comeback? Do you include a henchman’s muttering in the background? Does the villain emit a scream , a grunt , or a howl as he goes down? And how do you note a gunshot without spoiling the scene?
These are the choices closed captioners face every day. Captioners must decide whether and how to describe background noises, accents, laughter, musical cues, and even silences. When captioners describe a sound—or choose to ignore it—they are applying their own subjective interpretations to otherwise objective noises, creating meaning that does not necessarily exist in the soundtrack or the script.
Reading Sounds looks at closed-captioning as a potent source of meaning in rhetorical analysis. Through nine engrossing chapters, Sean Zdenek demonstrates how the choices captioners make affect the way deaf and hard of hearing viewers experience media. He draws on hundreds of real-life examples, as well as interviews with both professional captioners and regular viewers of closed captioning. Zdenek’s analysis is an engrossing look at how we make the audible visible, one that proves that better standards for closed captioning create a better entertainment experience for all viewers.
There are people who will find this book interesting, and there are people who won’t. And then there’s this:
The people in that intersection in the middle? Will find this book WILDLY FASCINATING.
That’s me. I’m people.
So, part of my dayjob involves producing a lot of multimedia materials, and captions and transcripts are a hill I regularly flop myself about on because they’re not just mandatory, they’re important. And there’s a distinction there.
Zdenek unpacks specifically how captions are important, and it boils down to two things:
1) The role of the captioner means that they are responsible for interpreting the sonic text of media and producing what amounts to a translation of sound (non-linear and layered) to text (linear and flattened);
2) As a captioner produces that translation, they make tons of decisions about what and how to interpret the sonic text, and that decision-making process includes the background, skill and biases of the captioner. The eventual translation a captioner produces necessarily affects the meaning of the text when the two are consumed together.
All of which boils down to asking: What do you caption in a text? (Speech, background noises, theme music, non-speech utterances) And how do you caption them?
It is important to note that despite the book’s jaunty title, this is way closer to a textbook than a pop culture or coffee table book. This book forces you to think, especially if you either consume or produce captions. It is dense. It is a lot. Pop culture is harnessed in service of academic theory.
But it's fascinating.
Last week, I produced a set of open captions (ones you can’t turn off) for this short video snippet for the book Stamped: Racism, Antiracism, and You. To some extent, it was relatively easy because the speaker is clear (and in fact has a lovely speaking voice) and she’s reciting a passage from a written text; the text itself provides a clear roadmap for how to write the captions.
And yet.
There is a moment towards the end of the snippet where I had to make a decision as to how to caption the speaker as she follows the directions in the text to: “Inhale… Hold it… Exhale.”
I had to decide whether and how to reproduce her non-speech sounds: an inhalation and an exhalation.
The open captions are burned in; I cannot change them without creating a new video (and in this case a new tweet). The choice I made of whether and how to reproduce that non-speech sound is irrevocable. (Thanks, internet.)
Every chapter of this book forced me to consider a new angle on captions and captioning, nearly always in a good way. How do you represent dialect? How do you express manner of speech? (Stuttering, muttering, stammering, etc.) When are the lyrics of music important for a viewer to grasp the full meaning of the content creator?
Four stars because the chapter on “Caption Irony” sailed right past me. Was it supposed to be “caption fails”? Or “caption art”? Regardless, it made zero sense to me at all AND I THINK ABOUT ALL THIS QUITE A BIT.
Also there was a brief foray into acoustic phonetic theory that I disagreed with on a molecular level. (Hi. I used to work in an acoustic phonetics lab. Fight me on waveforms, y’all.)
But other than that, each chapter took me in a new and wondrous direction of pondering. The chapter on Futurama’s Hypnotoad is worth the admission alone (although I am still never going to watch Futurama).
Written in 2015, the book occupies an interesting nebulous space where DVDs were still de rigeur for movie watching but animated gifs had started putting in an appearance, making the author consider how those were (anonymously) assigned their captions. It is a trip.
I especially appreciated the final chapter, where the author envisions where, in particular, captions could go in using orthography and text effects to reproduce non-speech sounds on-screen. (So if a character is drunk, for instance, how far could you lean the caption text over, or make it sway, or subtly lighten to convey the trailed-off nature of drunk speech?)
It’s also a really well-organized integration of online media in a print text. At regular intervals the author cites a very specific moment in a film or tv show, then gives you a clear URL to go look up that moment. I did this several times and enjoyed the text the better for it.