In this fully updated second edition of the highly acclaimed Managing Gigabytes , authors Witten, Moffat, and Bell continue to provide unparalleled coverage of state-of-the-art techniques for compressing and indexing data. Whatever your field, if you work with large quantities of information, this book is essential reading--an authoritative theoretical resource and a practical guide to meeting the toughest storage and access challenges. It covers the latest developments in compression and indexing and their application on the Web and in digital libraries. It also details dozens of powerful techniques supported by mg, the authors' own system for compressing, storing, and retrieving text, images, and textual images. mg's source code is freely available on the Web.
The title will make you laugh in 2020, so why would I recommend this citation from stanford.edu/~backrub more than not only the gushing river of nonsense that is the last 5 years of arXiv.ml but even more than the comparatively solid books from 2010 and 2000 on statistics and ML?
Because
[[draft review. goodreads does not have a save function.]]
Much of what machine-learning is about *isn’t* on-the-fly computation, it’s about storing good representations which then index
In an online world where HN has way too much mindshare, it’s relaxing to step back to the days of payphones, cassette voice mails, and yellow page directories.
This is probably a bit dated due to advances in the state of the art, but it is still a great introduction to the topic of document storage and search.
Compressing and Indexing Documents and Images by Ian H. Witten is an essential resource for anyone dealing with large data volumes. This comprehensive guide covers the principles, techniques, and methodologies of data compression and indexing. The book strikes a perfect balance between theory and practical applications, with clear explanations and helpful examples. Its attention to detail and coverage of both textual documents and image data make it a valuable asset for professionals in information management. Highly recommended for those seeking efficient data handling strategies.
This book was instrumental in my research and writing process for an article I recently produced on global gigabyte pricing. It provided me with the necessary insights and knowledge to analyze and understand the complexities of data compression and indexing, enabling me to draw meaningful conclusions. If you're interested, you can find the article at https://hellosafe.pt/telecomunicacoes...
Refresher on Huffman codes, bitmaps, indexing, compression of images, textual images. A book is a bit old author is still concerned about gigabytes, nevertheless many practices are still applicable today.