Jump to ratings and reviews
Rate this book

Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling

Rate this book
If you're like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.

Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.


Analyze, explore, transform, and visualize data in Apache Spark with R
Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
Perform analysis and modeling across many machines using distributed computing techniques
Use large-scale data from multiple sources and different formats with ease from within Spark
Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

296 pages, ebook

Published October 7, 2019

7 people are currently reading
26 people want to read

About the author

Javier Luraschi

2 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
6 (37%)
4 stars
7 (43%)
3 stars
1 (6%)
2 stars
2 (12%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Wej.
278 reviews8 followers
July 22, 2021
Spark is one of the latest hot big data technologies that have gained popularity among data professionals. In this book, the authors describe how Spark can be used in combination with R. The book covers a wide range of topics (e.g. settings, modelling, extensions) which should be enough to get you started to use Spark+R (mostly via sparklyr interface). The book is freely available online and is a rather easy read. At times I wasn’t sure who is supposed to be the target audience as some chapters would interest data scientists whereas others would be rather relevant to data engineers/sysadmins. I guess the authors wanted to show a wide variety of Spark capacities without going too deep into them.
Profile Image for Pritesh Shrivastava.
80 reviews7 followers
June 19, 2020
A practical book that provides a good introduction to Apache Spark and the R package that provides an interface to it, sparklyr. I found it more useful than the other big data books I've read lately.
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.