Rate this book

SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights

Name: SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights
Rating: 4.28 (7 reviews)
ISBN: 9781492088783

Cathy Tanimura

Rate this book

With the explosion of data, computing power, and cloud data warehouses, SQL has become an even more indispensable tool for the savvy analyst or data scientist. This practical book reveals new and hidden ways to improve your SQL skills, solve problems, and make the most of SQL as part of your workflow. You'll learn how to use both common and exotic SQL functions such as joins, window functions, subqueries, and regular expressions in new, innovative ways--as well as how to combine SQL techniques to accomplish your goals faster, with understandable code. If you work with SQL databases, this is a must-have reference.

GenresProgrammingNonfictionTechnologyComputer ScienceCoding

357 pages, Paperback

Published October 19, 2021

About the author

Cathy Tanimura

5 books

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

30 (49%)

4 stars

19 (31%)

3 stars

11 (18%)

2 stars

1 (1%)

1 star

0 (0%)

Displaying 1 - 7 of 7 reviews

Alex

6 reviews3 followers

November 24, 2021

First of all, it should not be considered a beginner's book: neither DA nor SQL foundations are being covered, which is a good thing. If you're learning SQL, then go for Learning SQL by Alan Beaulieu and come back to this one later (especially since 3rd edition is available). If you're new into Data analysis, I would recommend going elsewhere unless you are going to work exclusively with SQL and not going to use R/Python/Scala or any other language of your choice. This thing delivers strictly on topic of DA in a modern settings, although it omits big data completely, for the good, if you ask me. Topics are broad and tools are evolving rapidly and chances are, you're already familiar with them and that's why you are here in the first place.

I'm intermediate SQL user and I was looking to expand my knowledge with more advanced tricks and to learn, how can I offload some of my data wrangling and aggregation to SQL instead of doing it in Pandas after. It's a worthy pursuit to make reporting easier and it allows faster and more stable pipelines, especially if you're going to work with things like Big Query, which is getting more and more popular. Hence, getting better at SQL is a must for anyone working with data now and there's no going around it.

As soon I've started the chapter 2, I was pleasantly surprised with leading commas. The fact that formatting tips are actually in chapter 8 and there was nothing about leading/trailing commas is a signal that this book won't hold your hand a lot. There will be datasets for any chapter following Chapter 2 and even the code to embed them into Postgres SQL, so the complicated parts will be reproducible. Snippets are also available as a single query file for selected chapter. I haven't tested neither scripts nor snippets since I was importing the csvs directly to locally hosted Spark and only looking for further Spark/ Big Query experience, but retyping queries from the book works fine so good (chapter 5).

In a nutshell, it's a book for already skilled analysts who are looking to get better with DA in SQL, no more and no less. It won't teach you to write a faster queries and won't help you if you're into DB management. You should approach this text with statistical experience, domain experience and ideally, real problems at hand. Your SQL background should include understanding of window functions, because they will be used a lot and it's beyond the scope of the book to cover them properly, which is actually for the best. Editing is solid, you're not jumping from one dataset to another while working on the same concept, index is 14 pages long and navigation is straightforward. From binning to anomaly detection, Cathy shows how to use SQL to upstream your calculations and to offload usual work from traditional data wrangling tools, while highlighting SQL limitations and strengths.

It worth mentioning, that another SQL book from O'Reily was updated recently: SQL Cookbook. There is some overlap between them, but they are different and you should not treat Cookbook as a substitute. Cookbook is much more index-oriented and not intended to be read from start to end, while this one covers entire DA process within SQL.

I will be covering more books on more in-depth SQL knowledge later.

Ben

2,753 reviews234 followers

August 11, 2022

SQL Fun

This was a fun book. There, I said it. SQL can be fun.

I use SQL daily, and this was a great reference towards using advanced SQL to get analytics insights.

Would highly recommend for SQL experts

4.8/5

coding computer-science computers

Senjin Hajrulahovic

55 reviews

February 9, 2024

Great book. Kept me engaged from start to finish.

Xin

12 reviews

October 17, 2025

Must read book to understand SQL

Maxim

33 reviews3 followers

April 29, 2024

The book is awesome, but I disagree with some parts.

The abuse of positional syntax in GROUP BY, IMO, is a code smell:


GROUP BY
 1,
 2,
 3

It leads error-prone and unreadable code. The author's argument is that it's more applicable for complex expressions in SELECT that you don't want to repeat in GROUP BY.

The following example shows that's not true, bc you can use aliases in GROUP BY too:


with txs(id, amount) as (
 values
 (1, 100),
 (2, 280),
 (3, 100)
)

select
 count(id) as count_id,
 sum(amount) as sum_amount,
 round(amount, -2) / 100 as bucket
from
 txs
group by
 bucket;

Another thing is bad examples of working with duplicates. The author shows a couple of techniques with GROUP BY and SELECT DISTINCT...

The proper Business Deduplication by row_number() is missing:


with customer(id, updated_ts) as (
 values
 (1, '2000-01-01'::date),
 (1, '2000-01-02'::date),
 (1, '2000-01-03'::date),
 (1, '2000-01-04'::date),
 (1, '2000-01-05'::date),
 (2, '2000-01-05'::date)
),

ranked as (
 select
 *,
 row_number() over (partition by id order by updated_ts desc) as rn
 from
 customer
)

-- Deduplicate.
select
 'Most Recent' as kind,
 *
from
 ranked
where
 rn = 1

union all

-- Show Duplicates.
select
 'Duplicate',
 *
from
 ranked
where
 1 < rn
;

It's much better, because:
1. You're able to easily select all the columns.
2. You're able to write complex deduplication logic compactly.

And another problem is abuse of subqueries which are less readable than CTEs. Subqueries lead to messy and hard to debug code.

Aside from that the book is awesome.

data-processing

Joe

4 reviews

May 9, 2022

Great introduction to all the ways that SQL can be used to get to basic analytics. A great review of SQL with a focus on actionable steps for day-to-day data analysis.