Jump to ratings and reviews
Rate this book

Hacks, Leaks, and Revelations: The Art of Analyzing Hacked and Leaked Data

Rate this book
Data-science investigations have brought journalism into the 21st century, and—guided by The Intercept ’s infosec expert Micah Lee— this book is your blueprint for uncovering hidden secrets in hacked datasets.

In the current age of hacking and whistleblowing, the internet contains massive troves of leaked information. These complex datasets can be goldmines of revelations in the public interest— if you know how to access and analyze them. For investigative journalists, hacktivists, and amateur researchers alike, this book provides the technical expertise needed to find and transform unintelligible files into groundbreaking reports.

Guided by renowned investigative journalist and infosec expert Micah Lee, who helped secure Edward Snowden’s communications with the press, youʼll learn the tools, technologies, and programming basics needed to crack open and interrogate datasets freely available on the internet or your own private datasets obtained directly from sources. Each chapter features hands-on exercises using real hacked data from governments, companies, and political groups, as well as interesting nuggets from datasets that never made it into published stories. You’ll dig into hacked files from the BlueLeaks law enforcement records, analyze social-media traffic related to the 2021 attack on the U.S. Capitol, and get the exclusive story of privately leaked data from anti-vaccine group America’s Frontline Doctors. Along the way, you’ll

544 pages, Paperback

Published January 9, 2024

40 people are currently reading
316 people want to read

About the author

Micah Lee

1 book9 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
19 (59%)
4 stars
7 (21%)
3 stars
3 (9%)
2 stars
1 (3%)
1 star
2 (6%)
Displaying 1 - 8 of 8 reviews
Profile Image for Ben.
2,737 reviews233 followers
June 14, 2024
Analyzing Hacked Data For Big Time Insights

This is an excellent, but scary book.

Essentially, it is a great guide on how to analyze and perform data mining and big data analysis on hacked password and leaked document data.

It was really interesting, and very pertinent to those in cybersecurity or programming.
Also quite a related read for anyone in hacking or data analysis of private or illicit data sources.

Pretty intense.

Check it out if this sounds good to you.

4.5/5
Profile Image for Ken.
28 reviews2 followers
February 13, 2024
Halfway done and whew!!!! It's so good in that it's relevant AND doesn't presume the reader brings anything other than interest into the subject matter. It's also very practical but more than anything I think Micah's goal is folks understanding before copying code/workflows.

Very thoughtfully done and I will try to update when done.
Profile Image for Ben Rothke.
357 reviews52 followers
July 29, 2024
Some records last a very long time. Take Cy Young's record of 511 career baseball wins, set over 110 years ago. In football, the Tampa Bay Buccaneers had 26 consecutive losses 47 years ago.

However, records inevitably last much shorter amounts of time when it comes to the biggest data breaches. The Yahoo data breach of 2013 compromised over 3 billion user accounts. In August 2022, the registration data of 1.3 billion phones in Indonesia was posted on Breached Forums. That was particularly devastating as the SIM registrations included national identity numbers, phone numbers, names of telecommunications providers, and more.

Very few people can access and analyze the breach data. But in Hacks, Leaks, and Revelations: The Art of Analyzing Hacked and Leaked Data (No Starch Press), author Micah Lee has written a fascinating guide on how to analyze the data from several significant data breaches over the last decade. What the Hacking Exposed series does for penetration testing and hacking, Hacks, Leaks, and Revelations does the same for researching large data sets from breached data.

One of the many challenges with analyzing the data from these breaches is the vast size of the data sets. Standard searching tools only sometimes work effectively when dealing with multi-terabyte data sets. Here, Lee shows the reader how to do that in deep technical detail.

After a high-level overview, the book shows the reader how to use the tools needed to perform the analysis. The author guides the reader through tools and languages such as Python, Aleph, and more.

He then shows the reader how to analyze famous breached data sets such as BlueLeaks (25 years of data from law enforcement comprising 270 gigabytes of internal intelligence, memos, reports, emails, and more), Oath Keepers (anti-government militia), Heritage Foundation (conservative think tank), and more.

In the book, Lee writes from firsthand experience. He was the former director of information security for The Intercept, where investigative reporter Glenn Greenwald disclosed many of Edward Snowden's NSA documents.

He is also co-founder of the Freedom of the Press Foundation, known for its SecureDrop platform that enables the confidential and secure communication between journalists and their sources. This has proven to be a helpful platform for whistleblowers who want to securely and anonymously share their information.

The platform is needed as getting that information to journalists is not a trivial endeavor. As Glenn Greenwald writes in No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State, one of the challenges in initially vetting Snowden was getting the NSA documents.

This is a unique book in that it is a blend of deep technology, information security, privacy, politics, and more. At times, Lee's political leanings can get in the way. The George Floyd Jr. case, where a police officer murdered him, is mentioned several times in the book. Lee sometimes seems to infer that the Floyd is standard operating procedure in law enforcement. When, in fact, it was a very tragic anomaly.

Those who want to understand the high-level issues can skip (not that they should) the chapters on using the command line interface, exploring datasets in the terminal, and working with data in Python. The rest of the chapters provide interesting insights into manipulating and reviewing hacked data, which will only increase in the future.

Lee is a unique author with extremely deep technical knowledge who conveys information in a readable manner. Hacks, Leaks, and Revelations is one of the more unique information security books of recent memory and a fascinating read.
21 reviews2 followers
January 2, 2024
There has been a lot written about data leaks and the information contained therein, but few books that tell you how to do it yourself. There is so much practical how-to information, explained in simple step-by-step terms that even computer neophytes can quickly implement them.
What is unique is that Lee will teach you the skills and techniques that he used to investigate these datasets, and readers can follow along and do their own analysis with this data and others such as emails from the far-right group Oath Keepers and the Heritage Foundation, and chat logs from the Russian ransomware group Conti. This is a book for budding data journalists, as well as for infosec specialists who are trying to harden their data infrastructure and prevent future leaks from happening.
Lee's book is really the syllabus for a graduate-level course in data journalism, and should be a handy reference for beginners and more experienced readers. If you are a software developer, most of his advice and examples will be familiar. But if you are an ordinary computer user, you can quickly gain a lot of knowledge and see how one tool works with another to build an investigation. As Lee says, "I hope you’ll use your skills to discover and publish secret revelations, and to make a positive impact on the world while you’re at it."
Profile Image for Mitch.
Author 1 book31 followers
April 11, 2024
There are more hackers and leakers than journalists who know how to interpret the data.

This is a well-structured and hands-on tutorial. Each chapter teaches different methods for analyzing leaks. It does so by saying, basically, "learn these three things and then use them to read leaked OathKeeepers email inboxes for 2021!" That was very motivational.

The author worked on the Snowden Leaks and is now part of Distributed Denial of Secrets — a WikiLeaks successor with better ethics and no sketchy ties to the Russian government. Through the book, you download leaks from DDoS.

The book itself costs $55 new (used copies are down to about $35 right now). I got it from the library via Inter Library Loan (ILL) for free, and scooped the ebook from libgen. Buy the book or donate to DDoS if you want to.

Note, you'll need a free hard drive with 1 TB of free space (BlueLeaks alone takes up .3 TB and another .3 TB to unzip it). External drives are about $60 right now, but some of the work you do in the book will be very slow on an external. A few commands in this book took hours to run on an internal SSD.

If you have questions, leave a comment.
1 review2 followers
April 30, 2024
This book is an amazing introduction to a variety of programming topics for anyone interested in research or journalism. I would highly recommend this to anyone with an interest in programming and wants to try a variety of things to see what they like. I'll have this as a reference on my desk for years to come.
1 review
May 13, 2025
Technically, the book is about average. It's when the author brings his politics into the discussion. His animosity for President Trump is very evident from the beginning and destroys his credibility as a "hacker".
If you're looking for a book related to hacking, try "Hacking, the Art of Exploitation" or a very well written "The Cuckoo's Egg"
Displaying 1 - 8 of 8 reviews

Can't find what you're looking for?

Get help and learn more about the design.