Jump to ratings and reviews
Rate this book

Learning Scrapy

Rate this book
This book covers the long awaited Scrapy v 1.0 that empowers you to extract useful data from virtually any source with very little effort. It starts off by explaining the fundamentals of Scrapy framework, followed by a thorough description of how to extract data from any source, clean it up, shape it as per your requirement using Python and 3rd party APIs. Next you will be familiarised with the process of storing the scrapped data in databases as well as search engines and performing real time analytics on them with Spark Streaming. By the end of this book, you will perfect the art of scarping data for your applications with ease Dimitrios Kouzis-Loukas has over fifteen years experience as a topnotch software developer. He uses his acquired knowledge and expertise to teach a wide range of audiences how to write great software, as well. He studied and mastered several disciplines, including mathematics, physics, and microelectronics. His thorough understanding of these subjects helped him raise his standards beyond the scope of "pragmatic solutions." He knows that true solutions should be as certain as the laws of physics, as robust as ECC memories, and as universal as mathematics. Dimitrios now develops distributed, low-latency, highly-availability systems using the latest datacenter technologies. He is language agnostic, yet has a slight preference for Python, C++, and Java. A firm believer in open source software and hardware, he hopes that his contributions will benefit individual communities as well as all of humanity.

272 pages, Kindle Edition

Published January 30, 2016

22 people are currently reading
32 people want to read

About the author

Dimitrios Kouzis-Loukas

2 books2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
11 (31%)
4 stars
10 (28%)
3 stars
11 (31%)
2 stars
3 (8%)
1 star
0 (0%)
Displaying 1 - 5 of 5 reviews
212 reviews10 followers
May 21, 2018
This was a quick read for a hackathon at work, and maybe half of the book was useful to me. I think I understand Scrapy pretty well, now, which is a plus.

The major problem with this book is that it doesn't have a well-defined audience. Some of the book assumes you know nothing about programming at all, or about Python. Other parts assume you are running into distributed systems. It's very much all over the place. I wish that the book had followed a single, compelling case study the whole way through so that I could tie all of the skills together better in my mind.
Profile Image for Tim Tilberg.
9 reviews
August 14, 2020
This book shows how much more powerful Scrapy is than just an HTML parser, which is what it's usually showcased as. Scrapy is a full-on enterprise scraping tool, and everything it handles outside of downloading pages and parsing HTML is what makes it powerful. This book shows you how robust and extensible Scrapy actually is -- it literally is the Rails of web scraping, and convinced me to switch from Ruby to Python for my enterprise scraping operation at work.

The book takes a practical project recipe approach to introducing the components of Scrapy, first showing you how to generically queue downloads and extract data, then going into moves like maintaining a session with a logged-in scraper, dynamically crawling based on URLs and selectors in a spreadsheet, writing your data to a DB, leveraging Redis, Spark, and doing it all using async APIs, and why you should do that (it's not obvious for a lot of folks).

The examples are in Python 2 and require some gymnastics to make work. In the middle, there's a huge chunk dedicated to explaining Twisted, the async reactor Scrapy is built on, and leveraging deferreds. I think these examples went into a lot of superfluous detail with poor variable/function names, and were hard to follow in the end.

In the end, I've recommended this book a dozen times. It is a must-read if you do enterprise-level scraping because it empowers you with a project-oriented scraping platform, giving you all the things not directly related to parsing a page. If you are just snatching a quick dataset, you don't need the extra tools of Scrapy and this is overkill.
Profile Image for Gene Ishchuk.
240 reviews73 followers
May 2, 2020
I wasn't sure if I should give it a rating, it is 4 years old now, lots of things changed.
Part of it ain't working as supposed to, I couldn't launch that Vagrant thingy thus I had to scrape my own site as OLX.pl is banning scrapers (or just mine).
But overall it lays some basic foundations - like items, scrapers, pipelines and even basic deploy information.
Even though it is quite dated I got some good 101 out of it
This entire review has been hidden because of spoilers.
Profile Image for Helen Mary.
183 reviews14 followers
March 8, 2018
It started out great. But it kind of tapered off midway and it became less engaging. I like the flowcharts and visuals of Scrapy’s features as a main takeaway. The case studies, however, could have been more compellingly written. If you need a big picture overview it’s good.
Profile Image for Hugo.
63 reviews
August 26, 2022
Quick overview of scrapy with end to end examples. Nice introduction book.
Displaying 1 - 5 of 5 reviews

Can't find what you're looking for?

Get help and learn more about the design.