Jump to ratings and reviews
Rate this book

Practical Data Analysis

Rate this book
For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that. Overview In Detail Plenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs, or even the ability to obtain fresh insights about the performance of previous products. Practical Data Analysis is a book ideal for home and small business users who want to slice and dice the data they have on hand with minimum hassle. Practical Data Analysis is a hands-on guide to understanding the nature of your data and turn it into insight. It will introduce you to the use of machine learning techniques, social networks analytics, and econometrics to help your clients get insights about the pool of data they have at hand. Performing data preparation and processing over several kinds of data such as text, images, graphs, documents, and time series will also be covered. Practical Data Analysis presents a detailed exploration of the current work in data analysis through self-contained projects. First you will explore the basics of data preparation and transformation through OpenRefine. Then you will get started with exploratory data analysis using the D3js visualization framework. You will also be introduced to some of the machine learning techniques such as, classification, regression, and clusterization through practical projects such as spam classification, predicting gold prices, and finding clusters in your Facebook friends' network. You will learn how to solve problems in text classification, simulation, time series forecast, social media, and MapReduce through detailed projects. Finally you will work with large amounts of Twitter data using MapReduce to perform a sentiment analysis implemented in Python and MongoDB. Practical Data Analysis contains a combination of carefully selected algorithms and data scrubbing that enables you to turn your data into insight. What you will learn from this book Approach Practical Data Analysis is a practical, step-by-step guide to empower small businesses to manage and analyze your data and extract valuable information from the data Who this book is written for This book is for developers, small business users, and analysts who want to implement data analysis and visualization for their company in a practical way. You need no prior experience with data analysis or data processing; however, basic knowledge of programming, statistics, and linear algebra is assumed.

360 pages, Paperback

First published January 1, 2013

7 people are currently reading
61 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (7%)
4 stars
10 (37%)
3 stars
13 (48%)
2 stars
2 (7%)
1 star
0 (0%)
Displaying 1 - 6 of 6 reviews
Profile Image for Rob.
Author 2 books440 followers
December 9, 2013
I just finished up reading Practical Data Analysis by Hector Cuesta (Packt Publishing, 2013) and overall, it was a pretty good overview and recommends some good tools. I would say that the book is a good place for someone to get started if they have no real experience performing these kinds of analyses, and though Cuesta doesn't go deep into the math behind it all, he isn't afraid to use the technical names for different formulae, which should make it easy for you to do your own follow-up research. [1]

Jeff Leek's Data Analysis on Coursera provides the lens through which I read this book. [2] That being said, I found myself doing a lot of comparing and contrasting between the two. For example, they both use practical, reasonably small "real world" sample problems to highlight specific analytical techniques and/or features of their chosen toolkits. However, whereas Leek's course focused exclusively on using R, Cuesta assembles his own all-star team of tools using Python [3] and D3.js. Perhaps it goes without saying, but there are pros and cons to each approach (e.g., Leek's "pure R" vs. Cuesta's "Python plus D3.js"), and I felt that it was best to consider them together.

Cuesta's approach with this book is to present a sample scenario in each chapter that introduces a class of problem, a solution to that problem, and his recommended toolkit. For example, chapter six creates a stock price simulation, introducing simple simulation problems (especially for apparently stochastic data), time series data and Monte Carlo methods, and then how to simulate the data using Python and visualizing it in D3.js. Although the book is not strictly a "cookbook", the chapters very much feel like macro-level "recipes". There's quite a bit of code and some decent discussion around the concepts that govern the analytical model, and (true to the "practical" in the title) the emphasis is on the "how" and not the "why".

While I did not read the entire book cover-to-cover, I would definitely recommend it to anyone that wants an introduction to some basic data analysis techniques and tools. You'll get more out of this book if you have some base to compare it to -- e.g., some experience in R (academic or otherwise); and you'll get the most out of this book if you also have a solid foundation in the mathematics and/or statistics that underlie these analytical approaches.

Disclosure: I received an electronic copy of this book from the publisher in exchange for writing this review.

------

[1] As an aside, this seems to be par for the course for the “technical” data analysis books, blog posts, and MOOCs that I’ve encountered. That is to say, “the math” is touched on, but if you don’t already have a background in linear algebra (or whatever) then you’re going to wind up taking it on faith that support vector machines do what you need them to do.

[2] I wrote about my experience in Jeff Leek’s class in April of 2013. (See: “reflecting on Data Analysis”.)

[3] Both the Python standard library and a collection of libraries like mlpy and matplotlib.
Profile Image for Al-ahmadgaid Asaad.
2 reviews
November 29, 2013
Practical Data Analysis is all about applications of statistical methodologies in computer science. I find it very useful since this was not taught in my statistics class. In college, we only practice statistics on fields like sociology, psychology, agriculture, economics, chemistry, biology, industrial engineering, and many others, but we were not onto computer science, we only deal with it when coding in R or SAS. Hal Varian once said in that,



. . . we've got at least hundred statisticians on Google . . .



And I was curious about that, I mean, what are they doing on Google? What are the statistical tools do they use? Thanks to this book, Hector Cuesta utilized Dynamic Time Warping (DTW) for illustrating the image similarity search which is used by Google for searching images, by using time series for comparing the distance between the photo pixels; another is classifying spam from not spam emails based on the subject line of the messages, where he demonstrates the application of Naïve Bayes algorithm for text classification (isn't that cool?); he also talk about Kernel Ridge Regression for predicting gold price using time series; the Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) for dimensionality reduction; and then on the later chapters, it's all about "Hacking" just as what John D. Cook described on his review. Hacking data from social networking sites like facebook and twitter, how to visualize these using Gephi and make an analysis about it.



Further, I also learned new visualization tool, that is the D3.js, I am loving it now. The brief introduction on every topic is well-explained, without needing to google it for more readings; and the step-by-step procedure for programming is easy to follow.


The Programming Language
In general, the programming language used in this book is Python and JavaScript, but mostly Python. So it is advantageous if you have a general understanding on these languages.



Issues
Some of the issues I found are the unconsistency of the file name between the Github repository and the book itself, it gets you confuse, like the pokemonByType.csv in Github, is named as sumPokemon.csv in the book; in Chapter 2, working with OpenRefine, the column names of the Excel data in Github are in different language (I think spanish), while in the book it's in English; another is with the code, the D3.js charts in Chapter 3, such as the bar and pie charts did not work on my machine, I am new to D3.js and so I was not able to fix it immediately, but despite that, I got a quick response after sending an issue to the author. He even said, if I can help you in anything else don't hesitate to ask. So there is nothing worry, it is a minor issue, just to caution you.



Conclusion
Overall, I recommend this book, it is worth reading.


Link to the book here: http://bit.ly/1co6hOZ
2 reviews
November 25, 2013
This book http://bit.ly/1co6hOZ gives a very practical introduction to data analysis. It covers a wide range of topics, including data visualization, text analysis (spam recognition, sentiment analysis), image analysis, social graph analysis, Bayes classification, SVM, etc. The examples are very practical, and teaches the user how to use popular languages and libraries like d3.js, python3, nltk, mlpy etc. to do basic data analysis.

The book is a great read for beginners. To read and fully appreciate it, no data analysis is required. The books provide an introduction to the very basic techniques. Some basic understanding of python and javascript would be necessary, though.

What I like of this book is its hand-on style: while reading, you can easily get started with your first data analyses. The examples are very simple, the code easy to read, and a very detailed appendix helps to install the tools used. This book is a great help to learn data analysis by doing.

What may be improved is precision. I found some grammar mistakes. Not so big a problem, but not perfect, either. For instance reading sentences like "we will use Pillow due to its compatibility with Python 3.2 and can be downloaded ..." [p. 97] does hurt a little. More problematic is the section "Classifier accuracy" [p. 90]. It simply uses the ratio of correctly predicted emails to be a measure of accuracy, although actually every discussion of classification accuracy must contain the rations of false positives and false negatives as well.

Overall, this book is a very practical introduction to data analysis. I can recommend it to beginners of this area.
600 reviews11 followers
March 11, 2014
A good book when you want an overview on Data Analysis without having prior experience. It doesn’t go deep into the topic what I think is the biggest problem with this book. If you don’t know Python it will be hard to follow and you will miss out on the examples. Despite this the topic is really interesting and you will know where to look for more information.
Profile Image for Michael.
24 reviews
June 22, 2014
liked it. not a lot of depth but lots of starter and environment validation.
Displaying 1 - 6 of 6 reviews

Can't find what you're looking for?

Get help and learn more about the design.