Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you'll examine how to analyze data at scale to derive insights from large datasets efficiently.
Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you're not familiar with or prefer to focus on specific tasks, this reference is indispensable.
A good, very readable overview of Google's BigQuery product. More for BigQuery administration than for BigQuery users/consumers, as a lot of pages are devoted to cost/billing, performance, security, internal architecture, and ETL (extract/transform/load) + ELT (extract/load/transform). But there are a lot of great details about BQ's specific SQL flavor, as well as BQ's differentiator from other columnar datastores to support machine learning out-of-the-box. And the authors write very well, with clear explanations and footnotes, adding in tips/suggestions, general notes, and warnings in the middle of the text to identify best practices and caveats to a method.
The current edition of this book will probably be somewhat out-of-date by the end of 2020 / early 2021. Even the book hints at that, with a lot of "as of this writing" and "by the time you read this" qualifiers. This isn't unexpected in a rapidly evolving cloud provider industry, particularly for Google Cloud as it attempts to catch up with Amazon Web Services and Microsoft Azure in market share.
There are a lot of best practices around integration with other Google Cloud Platform products, such as Cloud Storage, Dataflow, and AI Platform. Again, this is surely going to be incomplete soon, as GCP seems to be rapidly developing new products (even renaming some) within the ecosystem. Nonetheless, very useful information though providing best practices and multiple ways to solve common data engineering problems.
I wish the authors (and Google Cloud Platform's marketing team in general) made it very clear from the beginning that the typical transactional nature of databases is not what BigQuery is intended for. It seemed to gloss over the fact that in order to create datasets in BigQuery, most businesses would need a robust transactions system to frequently write data to, and therefore, BigQuery isn't necessarily the place to do that. Now BigQuery certainly has write operations, and the book highlights some useful example DDL (data definition language) and DML (data manipulation language) statements and best practices. But as far as frequent DML updates, BigQuery is certainly not optimized for that. In fact, BigQuery used to have an append-only structure and didn't even allow UPDATEs (no longer true, but significant enough to have been mentioned in the original Dremel paper), a history of the development of the database product that is useful for a decision-maker to know when deciding to adopt BigQuery or not.
Instead, BigQuery is optimized for analytical processing, in order to facilitate the running of machine learning models directly on the data fast. That is certainly what the book touts as a competitive advantage for BigQuery against other relational databases. But I am not sure they made it clear upfront that this was at the cost of effectively discouraging frequent write operations against BQ. Instead it's a footnote in the middle of the book.
Still, this is a great introduction to BigQuery, especially if you're looking to introduce BQ to your company in 2020 and will be your company's BQ administrator. It is at best a useful reference book for your company's BQ users, as some of the chapters will not apply to users who have read-only access.
Good book to understand details of bigquery but also data warehouse in general. The book provides examples that you can follow along using the BigQuery public datasets. This is convenients because data warehouses tend to be complicated to play with.
I would suggest not to necessarily read the whole book, depending on your interests, because chapters are varied and can appeal to different persons (data scientists, analysts, architects, developers...)
But It is true that, whatever your role, you'll find some valuable chapters in it.
I'm giving this three stars, however that is mostly a mismatch between my needs and what the book provides, but Goodreads ratings are inherently somewhat solipsistic. For another person I could see four or even five stars. But I wanted a deeper dive into underlying behavior and architecture (although what is there is good, it is all too brief) and could have happily dispensed with the all the screen shots (which will be out of date shortly, or even now).
As noted by other reviewers, the book seems to have a system administrator bias. And while the authors are technical, the book shows signs of having been vetted to a painful degree by legal and marketing. But this this appears to be a good overall introduction to the product. Even if you are going to be developing against BigQuery, it is useful to understand how it is administrated. Actual use, of course, would also require using the on-line documentation for details, but that is as expected.
There are some interesting chapters about architecture of Google Data Warehouse, perf. optimization, etc. At the same time the book at times seems like a printout of documentation.
It's a good introduction to the power of BigQuery. I knew about this Google' cloud data warehouse and SQL engine and tried a couple of things in the previous years, but this book gave me a systematic approach: I learnt about its origins (it wasn't a formal project as it usually happens with the best ideas), how it evolved into the current solution and its main features. As I use data more from the business/marketing side I skipped the architecture and infrastructure chapters, but I read all the other ones because they have a good balance of theory, features description and query examples. I've already forked the Github repository that accompanies the book and plan to explore many features that I found really interesting like the geospatial and machine learning capabilities. In summary, even though it's written by Google employees, it's not like many other books that seem like an undercover ad but more like a cookbook that helps you get onboard and discover a lot of the new possibilities that this solution enables.
I learned about data structures, storage formats, and the underlying architecture that powers BigQuery's lightning-fast performance. The authors expertly explained complex concepts in a simple and accessible manner, making it easy for me to grasp the core principles.
Furthermore, the book explored the integration of BigQuery with other Google Cloud services, such as Dataflow and Dataproc, which was exactly what I was looking for.
This did fit my bill, hence here goes the well deserved five stars!
Served me well to get a first overview. It covers all aspects of the service - from basic and advanced query syntax and its architecture and philosophy down to data engineering and administration of the service. I read about 80 percent of it that were somewhat relevant to my job, omitting the administration chapter.
The Author's writing is clear and understandable. I liked the chapter about BigQuery's architecture and Advanced queries. I prefer this book than the official documentation.