Imagine a priest that never read the Bible. You would expect him to be filled with remorse, wouldn't you? I spent almost a decade in the data warehousing industry without having read the foundational book that started it for most people*: Bill Inmon's 'Building the Data Warehouse.' So finally I decided to end my pangs of conscience. Was it worth it?
The author begins with the history of decision support systems. That's in order to show that data warehousing was made possible by technical advances and that it was an answer to specific problems of the industry. He goes on to define what a data warehouse (DWH) is and how it fits into what he calls the 'corporate information factory'. While doing so, he constantly stresses the difference between the DWH and the operational systems. The book continues with the development cycle of the DWH and its relation to data modeling. Some fundamental subjects like normalization, grain, metadata, events or data marts appear and are explained. As the DWH is not only a concept, but also a real application, there is a whole chapter devoted to requirements that any technology supporting the DWH must meet.
After laying the groundwork for data warehousing, the book expands on various subjects related to it. He tells the story of how the DWH ate the, now defunct, Executive Information Systems. It also covers the relation to external data and recommendations on migrating to the DWH. The reader will also find a sketch of the relation between the DWH and the internet. This is followed by a chapter about how unstructured data fits into the DWH and another on the rapid growth of data. Close to the end we find chapters about justifying the investment in the DWH, compliance and the end-user community. The work closes with a design review checklist supposed to help anyone willing to start their own DWH project.
An important thread in the book is the reaction to the star schema model proposed by Ralph Kimball, the other founding father of the business intelligence (BI) industry. In 'Building the Data Warehouse' the star schema model and other Kimball's ideas appear many times, and although they are mainly criticized, they also get a fair share of positive appraisal. We can find a whole chapter comparing Kimball's approach versus the author's.
So is it a good book? Its first five chapters, where the author lays the basis of the data warehousing industry, are great. For example, it was eye-opening for me how the history of data storage naturally led to data warehousing. The definition of the data warehouse is another strong point: it stands till today and so are the definitions of grain, metadata and other fundamental concepts. While reading the first part of the book, I was surprised by how much of my work during the last decade had been described here. Especially in the parts devoted to the development of the DWH. The peculiar aspects of having to work with a broader picture of a company's data model while at the same time having to struggle with the realities of the data at hand are exactly the kind of stuff I have seen on many projects. Same goes to the iterative aspect of development and of the end-user's demands, which Inmon aptly defined as "give me what I say I want, and then I can tell you what I really want.' Inmon accurately stresses the huge effort needed for ETL and integration, calling it '80 percent of the the effort required'. So true. I also liked Inmon's treatement of Kimball's views. The book rarely distorts them and defends the author's point of view in a structured manner that can be roughly summarized as: 'The star schema is a modeling technique for data marts, not a data warehouse model.'
The book is far less successful when it tries to encompass a myriad of subjects related to data warehousing. It's worth mentioning that 'Building the Data Warehouse' has had four editions (1992, 1995, 2002 and 2005). Contents have grown significantly, especially between the 3rd (12 chapters, 412 pages) and 4th editions (19 chapters, 543 pages). The author clearly wanted to keep the book up to date with the latest trends. In doing so he ended up adding weaker chapters about less interesting topics, like compliance, cost justification and a shallow classification of end-users. Those chapters don't bring a lot of added value and even worse, they seem superfluous and they dilute the strong message of the first part of the book. This leads to another major weak point: the book is too long. It is also filled with a lot of useless and simplistic drawings, further increasing the length of the book. Time has also played its part, as some of the remarks are obsolete today, such as the inclusion of the pompous Zachman's framework. In what illustrates the passing of time best, the word 'Amazon' appears only once. As a river.
All in all: this venerable classic was worth the time. The first 200 pages are compulsory reading for anyone working with data warehouses, along with the Kimball-Inmon comparison. That part is a cornerstone of the whole data warehousing industry and this on its own makes it a good book. The rest is subpar, and can be optional, or simply ignored.
I once had a conversation with a manager from a major consulting company where he called the data warehouse a 'legacy system', meaning that on one hand, it definitely wasn't the hype of the moment and new talent tends to go elsewhere. On the other hand, the data warehouse has become an essential system. As Bill Inmon wrote in the book's fourth edition: 'data warehousing has achieved the status of conventional wisdom.'
*I am aware of Barry Devlin's article from 1988, but Inmon's book gained a much bigger audience.