Jump to ratings and reviews
Rate this book

Hadoop with Python

Rate this book
Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.

Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.

Use the Python library Snakebite to access HDFS programmatically from within Python applications
Write MapReduce jobs in Python with mrjob, the Python MapReduce library
Extend Pig Latin with user-defined functions (UDFs) in Python
Use the Spark Python API (PySpark) to write Spark programs with Python
Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts

ebook

Published October 1, 2015

3 people are currently reading
19 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
4 (26%)
4 stars
6 (40%)
3 stars
5 (33%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Vytas.
68 reviews
February 1, 2020
My content book of python. Liked chapter about Pig a lot
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.