Jump to ratings and reviews
Rate this book

Data Analysis with Open Source Tools [DATA ANALYSIS W/OPEN SOURCE TO] [Paperback]

Rate this book
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications. Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you. Use graphics to describe data with one, two, or dozens of variables Develop conceptual models using back-of-the-envelope calculations, as well as scaling and probability arguments Mine data with computationally intensive methods such as simulation and clustering Make your conclusions understandable through reports, dashboards, and other metrics programs Understand financial calculations, including the time-value of money Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations Become familiar with different open source programming environments for data analysis "Finally, a concise reference for understanding how to conquer piles of data." -- Austin King, Senior Web Developer, Mozilla "An indispensable text for aspiring data scientists." -- Michael E. Driscoll, CEO/Founder, Dataspora About the Author After previous careers in physics and software development, Philipp K. Janert currently provides consulting services for data analysis, algorithm development, and mathematical modeling. He has worked for small start-ups and in large corporate environments, both in the U.S. and overseas. He prefers simple solutions that work to complicated ones that don

556 pages, Paperback

First published January 1, 2010

76 people are currently reading
1718 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
120 (38%)
4 stars
113 (36%)
3 stars
63 (20%)
2 stars
12 (3%)
1 star
4 (1%)
Displaying 1 - 18 of 18 reviews
Profile Image for Louis.
228 reviews32 followers
December 28, 2010
This is a book that is how to think about data analysis, not only how to perform data analysis. Like a good data analysis, Janert's book is about insight and comprehension, not computation. And because of this it should be a part of any analysts bookshelf, set apart from all the books that merely teach tools and techniques.

The practice of data analysis can get a bad rap, especially by those who think that data analysis is only statistics. Most books on data analysis don’t help because they focus on using the features of a particular tool, leading to the view that data analysis is following a recipe from a cookbook. This book subverts this by being principally of how to think about data analysis, and providing examples using different tools (primarily R and Python, but he uses other examples as well)

Among other topics, Janert covers graphing, single and multi-variable analysis, probability, data modeling, statistics, simulation, component analysis, reporting, financial modeling and predictive analytics. In each section he starts by explaining the concepts, what it is for, and (just as important) what each topic is not. Working through it you get a sense of not just what and how of the various tools and methods discussed, but why they are used as well as some ways these techniques are misapplied.

Janert also illustrates the methods using some data analysis environments. Principally R and Python (with Numpy, Scipy and Matplotlib), but also other tools such as Gnuplot and the Gnu Scientific Library. What is helpful here is the focus is on what techniques and capabilities are needed in the tool, not the tool itself. Instead of being a cheerleader for a particular tool, Janert discusses in his appendix the qualities that make environments such as Matlab, R and Python good data analysis environments. However, this focus means that he does not teach any particular tool. If you want to learn how to use a particular tool for data analysis, you are better off getting a book on R or Python (or Matlab, Excel, etc.)

I received an electronic copy of this book as part of the O’Reilly Blogger program. But this was a book that was on my to buy list even if I did not get it from them. The book page at O'Reilly.com is here: Data Analysis with Open Source Tools
Profile Image for Igor.
109 reviews26 followers
April 6, 2016
Book is focused not on tools (mostly outdated by now), but on methods, skills and underlying math, with many examples of applications in real-life context. I really liked this approach, even though some equations were a bit too hard and some examples are too far from my area of interest.
Profile Image for John Orman.
685 reviews32 followers
February 18, 2013
I used this book in my online Data Analysis class, in which we used the open source language R. The book also mentions Octave, a clone of Matlab. I used Octave for my online Machine Learning class.
Python and Java are also given a brief description.
The reader is also asked to investigate Perl, Ruby, regular expressions, databases, and Unix.

This book has some very good sections on graphical analysis of data. Also includes probability modeling, and an interesting chapter on using statistics in mythbusting. How to mine data, perform simulations, and create predictive models.
Useful appendices cover scientific software tools, and the handling of data. An excellent review of using open source software to analyze and graph data.
Profile Image for Romain.
930 reviews58 followers
May 15, 2020
Les reproches faits à ce livre sont de deux ordres. Le premier porte sur sa structure -- voire son contenu -- qui n'est pas conventionnelle pour un livre intitulé Data analysis. C'est vrai que l'on s'attend à suivre une méthodologie, à être guidé et il faut bien reconnaître que ce n'est pas le cas. Si vous cherchez ce type d'ouvrage, je vous conseille de vous plonger dans [Book: Practical Data Science with R] qui est un excellent ouvrage tout à fait dans ce registre. Cette approche non conventionnelle n'est pas gênante et au contraire car elle aide à ouvrir la réflexion à voir autrement et surtout à réfléchir tout simplement. Il est aussi plus théorique et va au fond de choses -- dit autrement il y a des maths, tout ce qui l'avance est démontré et l'auteur s'efforce de faire passer deux messages:

- Il faut rester simple: back of envelope
- Il faut comprendre ce que l'on fait

Et il vrai qu'aujourd'hui -- je l'ai vu de mes yeux -- il est facile d'oublier ces deux fondamentaux et de bourrer des modèles compliqués d'un tas de données pour en sortir quelque chose que l'on ne saura pas expliquer et qui n'apportera donc rien -- valeur = 0.

En fait on dirait qu'il a mis dans ce livre une grande partie de ses connaissances, de son savoir faire et de son expérience acquise en tant que consultant pour des grandes entreprises. Ce retour d'expérience d'une grande richesse adresse a peu près tous les sujets -- et va même au-delà je pense aux chapitres consacrés à la simulation à la modélisation et aux probabilités -- qu'un consultant peut avoir à utiliser. Il faut aussi dire que pour chacun des sujets il fournit une bibliographie sélective pour aller plus loin. C'est la même chose pour les outils et c'est ici qu'arrive le second reproche qui consiste à dire que l'outillage présenté est un peu daté. S'il existe de meilleurs outils maintenant c'est tant mieux et ils n'exemptent toujours pas -- me semble-t-il -- de comprendre ce que l'on fait.

Enfin, je voudrais souligner une dernière chose que l'on tend à négliger pour un livre technique. C'est le ton, la façon d'exposer les choses. Au bout de quelques pages et en lisant ensuite ce livre de bout en bout on rentre en résonance avec l'auteur et sa façon d'expliquer. Résultat on comprend bien mieux les choses. Son objectif n'est pas d'en mettre plein la vue avec des algorithmes et des techniques complexes au contraire, il s'efforce en permanence de démystifier et de revenir aux fondamentaux. Pour illustrer le ton, voici un extrait du chapitre intitulé "What You Really Need to Know About Classical Statistics".

Basic classical statistics has always been somewhat of a mystery to me: a topic full of obscure notions such as t-tests and p-values, and confusing statements like "we fail to reject the null hypothesis" -- which I can read several times and still not know if it is saying yes, no, or maybe.


Initialement publié sur mon blog.
Profile Image for K. Permyakova.
4 reviews
February 27, 2025
If you’re getting into data analysis and want to use open-source tools, this book is nice. It’s practical, hands-on, and explains concepts clearly without unnecessary fluff. Perfect for beginners who want to learn real-world data skills!
Profile Image for Earo.
23 reviews
January 1, 2013
Author keeps placing emphasis on insights instead of numbers while working with data. The ultimate goal of data analysis is to understand how the system works, not to show off how proficient you are at Math. That's the true spirit of professionalism. Some annoying jargon are well explained in a plain manner. Little sections on R.
158 reviews3 followers
August 7, 2012
This book focuses on methods and experience, using tools only for demonstrating on the topic.

Where many books already cover tools, this book covers what many don't, insights and experience. While many topics with enough explanation on the method and where to use it.

Highly recommended
Profile Image for Donn Lee.
393 reviews5 followers
October 12, 2016
I love most O'Reilly books and this one doesn't disappoint. There is plenty of great material in here for [aspiring] data scientists; very nicely chapterised with a nice mix of tools. Was doing a forecasting project and had this book constantly by my side for inspiration - wonderful reference.
Profile Image for Vuk Trifkovic.
528 reviews55 followers
February 17, 2011
Very good. Focus is firmly on the methods, but with just enough tooling or practical data. I'd start from this, and then dive into specific toolsets, say R or some Python libs..
629 reviews11 followers
January 3, 2012
I haven't touched this for a while, so it seemed more appropriate to move it to the on-hold category. I was really intrigued by the sections that I did read though, so I plan to get back to it soon.
Profile Image for Andries Burger.
22 reviews2 followers
May 5, 2011
Just started on this book. Have lots of data to turn into information. Some obscure, some blatant.

Will add to this review as I work my way through the book...
Profile Image for Matt Heavner.
1,130 reviews14 followers
October 24, 2011
This is a good thought provoking read. It is a reminder of lots of techniques I've already "learned" -- but a great practical review and refresher.
160 reviews4 followers
April 4, 2012
Good refresher on data analysis for more detailed work
Profile Image for Sefa.
56 reviews
Read
December 7, 2021
Data analysis book using Python/R and focusing on more methods in a not math-heavy way, rather than implementation details.
Profile Image for Daniel.
Author 3 books38 followers
February 1, 2016
Pleasantly focussed on methods, so that the somewhat dated view on technologies doesn't hurt too much.
Displaying 1 - 18 of 18 reviews

Can't find what you're looking for?

Get help and learn more about the design.