Jump to ratings and reviews
Rate this book

Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale

Rate this book
Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you'll examine how to analyze data at scale to derive insights from large datasets efficiently.

Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you're not familiar with or prefer to focus on specific tasks, this reference is indispensable.

498 pages, Paperback

Published November 12, 2019

81 people are currently reading
167 people want to read

About the author

Valliappa Lakshmanan

25 books23 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
30 (38%)
4 stars
35 (45%)
3 stars
9 (11%)
2 stars
2 (2%)
1 star
1 (1%)
Displaying 1 - 10 of 10 reviews
Profile Image for Albert.
35 reviews3 followers
February 14, 2020
A good, very readable overview of Google's BigQuery product. More for BigQuery administration than for BigQuery users/consumers, as a lot of pages are devoted to cost/billing, performance, security, internal architecture, and ETL (extract/transform/load) + ELT (extract/load/transform). But there are a lot of great details about BQ's specific SQL flavor, as well as BQ's differentiator from other columnar datastores to support machine learning out-of-the-box. And the authors write very well, with clear explanations and footnotes, adding in tips/suggestions, general notes, and warnings in the middle of the text to identify best practices and caveats to a method.

The current edition of this book will probably be somewhat out-of-date by the end of 2020 / early 2021. Even the book hints at that, with a lot of "as of this writing" and "by the time you read this" qualifiers. This isn't unexpected in a rapidly evolving cloud provider industry, particularly for Google Cloud as it attempts to catch up with Amazon Web Services and Microsoft Azure in market share.

There are a lot of best practices around integration with other Google Cloud Platform products, such as Cloud Storage, Dataflow, and AI Platform. Again, this is surely going to be incomplete soon, as GCP seems to be rapidly developing new products (even renaming some) within the ecosystem. Nonetheless, very useful information though providing best practices and multiple ways to solve common data engineering problems.

I wish the authors (and Google Cloud Platform's marketing team in general) made it very clear from the beginning that the typical transactional nature of databases is not what BigQuery is intended for. It seemed to gloss over the fact that in order to create datasets in BigQuery, most businesses would need a robust transactions system to frequently write data to, and therefore, BigQuery isn't necessarily the place to do that. Now BigQuery certainly has write operations, and the book highlights some useful example DDL (data definition language) and DML (data manipulation language) statements and best practices. But as far as frequent DML updates, BigQuery is certainly not optimized for that. In fact, BigQuery used to have an append-only structure and didn't even allow UPDATEs (no longer true, but significant enough to have been mentioned in the original Dremel paper), a history of the development of the database product that is useful for a decision-maker to know when deciding to adopt BigQuery or not.

Instead, BigQuery is optimized for analytical processing, in order to facilitate the running of machine learning models directly on the data fast. That is certainly what the book touts as a competitive advantage for BigQuery against other relational databases. But I am not sure they made it clear upfront that this was at the cost of effectively discouraging frequent write operations against BQ. Instead it's a footnote in the middle of the book.

Still, this is a great introduction to BigQuery, especially if you're looking to introduce BQ to your company in 2020 and will be your company's BQ administrator. It is at best a useful reference book for your company's BQ users, as some of the chapters will not apply to users who have read-only access.
Profile Image for Francois D’Agostini.
61 reviews12 followers
September 27, 2021
Good book to understand details of bigquery but also data warehouse in general. The book provides examples that you can follow along using the BigQuery public datasets. This is convenients because data warehouses tend to be complicated to play with.

I would suggest not to necessarily read the whole book, depending on your interests, because chapters are varied and can appeal to different persons (data scientists, analysts, architects, developers...)

But It is true that, whatever your role, you'll find some valuable chapters in it.
Profile Image for Peter Aronson.
401 reviews20 followers
August 17, 2020
I'm giving this three stars, however that is mostly a mismatch between my needs and what the book provides, but Goodreads ratings are inherently somewhat solipsistic. For another person I could see four or even five stars. But I wanted a deeper dive into underlying behavior and architecture (although what is there is good, it is all too brief) and could have happily dispensed with the all the screen shots (which will be out of date shortly, or even now).

As noted by other reviewers, the book seems to have a system administrator bias. And while the authors are technical, the book shows signs of having been vetted to a painful degree by legal and marketing. But this this appears to be a good overall introduction to the product. Even if you are going to be developing against BigQuery, it is useful to understand how it is administrated. Actual use, of course, would also require using the on-line documentation for details, but that is as expected.
Profile Image for Mikhail Filatov.
388 reviews20 followers
September 17, 2020
There are some interesting chapters about architecture of Google Data Warehouse, perf. optimization, etc. At the same time the book at times seems like a printout of documentation.
Profile Image for Pablo María Fernández.
489 reviews21 followers
October 2, 2021
It's a good introduction to the power of BigQuery. I knew about this Google' cloud data warehouse and SQL engine and tried a couple of things in the previous years, but this book gave me a systematic approach: I learnt about its origins (it wasn't a formal project as it usually happens with the best ideas), how it evolved into the current solution and its main features. As I use data more from the business/marketing side I skipped the architecture and infrastructure chapters, but I read all the other ones because they have a good balance of theory, features description and query examples. I've already forked the Github repository that accompanies the book and plan to explore many features that I found really interesting like the geospatial and machine learning capabilities. In summary, even though it's written by Google employees, it's not like many other books that seem like an undercover ad but more like a cookbook that helps you get onboard and discover a lot of the new possibilities that this solution enables.
Profile Image for Sreena.
Author 11 books141 followers
June 21, 2023
I learned about data structures, storage formats, and the underlying architecture that powers BigQuery's lightning-fast performance. The authors expertly explained complex concepts in a simple and accessible manner, making it easy for me to grasp the core principles.

Furthermore, the book explored the integration of BigQuery with other Google Cloud services, such as Dataflow and Dataproc, which was exactly what I was looking for.

This did fit my bill, hence here goes the well deserved five stars!
9 reviews
November 11, 2021
Served me well to get a first overview. It covers all aspects of the service - from basic and advanced query syntax and its architecture and philosophy down to data engineering and administration of the service. I read about 80 percent of it that were somewhat relevant to my job, omitting the administration chapter.
6 reviews
February 20, 2020
The great book, like the entire series of The Definitive Guide.
Profile Image for Denis Kotnik.
64 reviews1 follower
January 29, 2023
The Author's writing is clear and understandable. I liked the chapter about BigQuery's architecture and Advanced queries. I prefer this book than the official documentation.
14 reviews3 followers
December 13, 2019
This book was of great help with the Google Cloud Data Engineering Certification.
Displaying 1 - 10 of 10 reviews

Can't find what you're looking for?

Get help and learn more about the design.