Mann here articulates a sense that I’ve felt for a while: contrary to popular belief, the boundaries of knowledge are so much larger than what exists online. Particularly in an age of ChatGPT, where LLM models are trained exclusively on digitized content, it seems weird to think there is a ton of knowledge and information that exists completely outside of the open internet.
But there is, mountains and mountains of print books have never been digitized. Many maps, documents, government publications, manuscripts and artifacts have not either. Many academic papers, newspaper and magazine articles are either locked behind subscription paywalls or have also never been digitized in the first place.
All of this non-online content, plus closed online academic sources (journals, databases etc.), are immensely helpful for research purposes. But are not immediately obviously when you hit the first page of Google search, or ask an LLM to give you a summary of a field.
But they are housed and available in libraries. Mann in this book shows how to navigate these storehouses of knowledge. Opening you to various methods to help you as a researcher answer particular types of questions, find sources, and understand the lay of the land in any field.
There’s a lot of really good stuff in here, plus endless lists of recommended sources for different niche fields. I’m obviously not a formal researcher, but I do have a few hobby projects that I’ve been working on that require some level of research. I used some of the stuff in here recently and was blown away at how much more efficient it was than doing simple Google keyword searches.
However, beyond the practical guidance here, I think Mann does a great deal of defending the need for libraries to exist. The role they have, and should have, in doing informed research that could not be done in a exclusively online world.
It also made me think about the limitations of current approaches to LLMs and generalized AI that is all the rage these days. The contours of any trained intelligence will necessarily be what content it is provided, and the largest boundary is going to be digitized content that is available. It means the limits of any AI system is bound to the narratives, content, and information that can be found online.
But sometimes the information online is misguided or even downright incorrect. I’ve seen this in niche history articles I help edit on Wikipedia myself. Compounded by the fact that most people will trust what they see online and then post about it, even if its incorrect, creating a content feedback loop. All creating conditions that exacerbate misguided by knowledge by being fed back into any LLM. Trusting LLM systems in deep academic topics with this context is difficult.
To be clear, this isn’t some anti-AI rant. I mean, I work in the field, and I think there are many wonderful use cases of it. But to expect generalized AI to be all encompassing in its knowledge of the world is misguided. Many technologists fundamentally misunderstand how much is out there that isn’t online. It just shows how much we still need real humans to be able to navigate information correctly. And it shows how much we still need real libraries in an increasingly digital world.