Explore the potential of Apache Spark and its ecosystem through real-world applications.
Key FeaturesA unique, practical guide with 7 end to end projects demonstrating the power of Apache SparkShows the readers how to perform real-time Big Data processing using different components of Apache SparkIncludes best practices and tips for highest performance of their Big Data processing pipeline with Apache SparkBook DescriptionApache Spark is one of the most popular Big Data tools used in a plethora of industries today right from E-commerce, Entertainment to Travel and Retail Industry. This book demonstrates how to leverage the capabilities of Apache Spark and use them in practical projects using real-world scenarios.
The book begins with a quick introduction to all the components of the Spark ecosystem and later teach the readers how to use them in real-world scenarios. It demonstrates how to use each component of Apache Spark ecosystem, i.e. Spark SQL, Spark Streaming, Spark Mllib, PySpark to build an efficient, end to end Big Data processing pipeline. Some of the projects that are covered such as Sales forecasting using SparkR and recommendation engine using PySpark. The readers will learn about the different libraries like Mlib, Spark SQL, GraphX and Spark Streaming. Throughout the book, the readers will gain knowledge about the different components of the Spark ecosystem and will also be able to manage their big data pipelines using Apache Spark.
By the end of the book, you will master all the aspects of Apache Spark, and use them in your own Big Data projects without any hassle.
What you will learnExplore Spark ecosystem and learn to deploy in large-scale clustersPerform basic operations of Spark with the Movie lens data analysisLearn how to do data analysis using Spark Streaming and SQLUnderstand how to predict flight delays with MlibLearn how to forecast sales predictions with SparkRWrite Pyspark codes for building a recommendation engine Who This Book Is ForThis book is for Big Data professionals who want to master the features of Apache Spark and bring speed and ease-of-use in executing large-scale data processing tasks. Basic understanding of Apache Spark ecosystem is sufficient to get the most out of this book.