Search Engines: Information Retrievalin Practice is ideal for introductory information retrievalcourses at the undergraduate and graduate level in computerscience, information science and computer engineering departments.It is also a valuable tool for search engine and informationretrieval professionals.Written by a leader in the field ofinformation retrieval, Search Engines: Information Retrievalin Practice, is designed to give undergraduate students theunderstanding and tools they need to evaluate, compare and modifysearch engines. Coverage of the underlying IR andmathematical models reinforce key concepts. The book’snumerous programming exercises make extensive use of Galago, aJava-based open source search engine.
A clearly written and well-structured introduction to the science and technology of search engines. The book covers all aspects of search engines from crawling to the web all the way to the ranking and presentation of the result list in a clear, well-structured and rigorous fashion. The book includes the major algorithms and data structures, as well as explaining the mathematical models of ranking results, categorization and clustering. Practical problems like character encoding, document formats and multilingual retrieval are addressed in sufficient detail. The focus of the book is on the standard approaches in full-text search engines, and the final chapters give brief introductions to related topics like audio, image and music retrieval and emerging fields like social search and peer-to-peer search. The book contains plenty of references to the latest research. The book can be highly recommended for students of computer science.
I originally read this in UARKs Information Retrieval course taught by Dr. Gauch, and it greatly accelerated my skills in different topics. Revisiting in 2025, unfortunately it is a bit outdated in some areas due to the rise of LLMs
This is an undergraduate-level textbook explaining how search engines work. It discusses in reasonable detail, how a search engine crawls the Web, prepares the documents for indexing, creates and stores the indexes, and processes the queries to retrieve the documents. Other chapters discuss classifying the documents into clusters, social search in an online community, and retrieval with understanding that documents have structure (refer to entities, have dependencies between words) as opposed to being simple bags of words.
Possibly the only book on IR that really helped me understand the nature of the beast. It could have used some SOLR or Lucene instead of Galago, and Java isn't really my favorite programming language even though we use it on my search product. Nevertheless, it's a valuable book and you should digest it completely if you want to know how to make a search engine.