This tutorial teaches everything you need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis. Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks. Each chapter is illuminated with a concept map: an intuitive visual index of what you'll learn -- and an easy way to refer back to what you've already learned. An extensive set of easy-to-read appendices help you fill knowledge gaps wherever they may exist. Coverage includes: Setting up your Python and Pandas environment Getting started with Pandas dataframes Using dataframes to calculate and perform basic statistical tasks Plotting in Matplotlib Cleaning data, reshaping dataframes, handling missing values, working with dates, and more Building basic data analytics models Applying machine learning techniques: both supervised and unsupervised Creating reproducible documents using literate programming techniques
I read this after reading "Python for Data Analysis" by Wes McKinney (creator of Pandas). This book actually made what I was reading from Wes stick. The examples here are more what you would encounter in a business setting doing analysis on large volume data sets. I have loaned this around the office with positive feedback as well. Best chapters for me were on the Generalized Linear Models and Model Diagnostics. Again, plenty of really useful examples.
When the author says this book is for everyone, he really means it. If you have basic experience with Python, perhaps there is a better book out there for you. In most chapters, the author details key methods as it relates to dataframes, etc. But, instead of showing the options in a methodical way, the author picks one or two that you may happen upon and uses them.
Frankly, the reader would benefit more from seeing pandas used in the context they want to use it. For instance, you can look up the youtube series about using SQL with Python from Bryan Cafferky, or try Python for Data Analysis, or even Python for Excel.
As an aside, I was also quite perplexed by how the author chose to describe the statistical methods. For somebody who knows statistics, the descriptions are too trivial to add much value in the book. For somebody who doesn't know stats well/the aspiring data analyst, please take a stats course designed for stats majors or math majors - you will be so much better off than trying to learn boot camp statistics to be a data analyst.
Материал, в принципе, не плох, но подается в очень сумбурном порядке. Автор перепрыгивает между разделами без всякой логики. Вдобавок, в книге огромное множество неудачных примеров. Так, чтобы показать возможности pandas по слиянию различных баз данных, автор предлагает скачать пять файлов, общим размеров в 500 мегабайт.
good book if you wanna learn data analysis python way, however analysis is more advanced now then this book so go for o'really data analysis with python that i am gonna read in few days i already bought it from amazon at 1800 rs inr.