Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.
Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. You’ll be able
Turn textual information into a form that can be analyzed by standard tools. Make the connection between analytics and Big Data Understand how Big Data fits within an existing systems environment Conduct analytics on repetitive and non-repetitive data Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it Shows how to turn textual information into a form that can be analyzed by standard tools Explains how Big Data fits within an existing systems environment Presents new opportunities that are afforded by the advent of Big Data Demystifies the murky waters of repetitive and non-repetitive data in Big Data
This book peaks early, where the author explains an "ideal" data architecture and all its components. He explains clearly why he thinks they all need to be there. There's also as good of a definition of "big data" as I've seen anywhere.
But I gave up on the later parts of the book because it got repetitive, except for very narrow issues.
First of all, I cannot believe someone like Inmon, the father of Data Warehouse, cannot draw a good diagram to illustrate his ideas. Or even hire one to do it for him! Almost all of the diagrams seem so rushed out. Many of them seem so pointless that i cannot understand why they were even included!
OK, rant over. Let's continue...
IMO, the most interesting part of the book was the "textual disambiguation" in order to appoint meaning to words extracted from emails and apply a context, but it was an abstract illustration of ideas.
Even though all the topics in this book are very important with many useful applications, there is a lack of examples and practical illustration of the ideas and techniques. Comparing this book to Kimball's The Data Warehouse Toolkit, I expected more practical examples, not silly abstract ill-drawn diagrams. It seems that the author leaves the practical/technical application of the ideas to the reader.
I wouldn't suggest this book to anyone. I would search for another book that illustrates the ideas in a more practical way.