It’s a book to learn data science, machine learning, data analysis with tons of examples and explanations around several topics like:
* Exploratory data analysis
* Data preparation
* Selecting best variables
* Model performance
## Everything is related to everything
The book’s premise is “everything is related to everything”. It is noticed in the relationship across different sections, for example choosing the right data type for any variable could be related to dealing with missing values, and vice-versa.
In addition, some technical examples are related to “real-life” situations as well as philosophical concepts. The ultimate goal is to simplify the learning journey.
## How is it organized?
It's a playbook with full of data preparation receipts, using the open source R language.
There are two types of examples, some are oriented to teach general concepts around data analysis (like the information theory concept), while others are intended to show how to transform missing values, choosing the correct data type, and the implications in any case; among others, using easy copy-paste pieces of code.
## Who is the target audience?
It’s aimed at two types of people:
1- The ones who don’t want -or don’t know- how to code, but want to get some useful insights adding value as data project analysts.
2- Programmers and data scientist who work -or want to- in machine learning projects.
All the R examples are well explained in code comments.
No math or statistical background to understand it.
The book tries to be as tool-independent as possible. For example, the decision of what to do to deal with missing or extreme values is the whether we choose R, Python, Julia. What it changes is the how.
## Last words
To develop a critical thinking, without taking any statement as the "true truth", it’s essential in this sea of books, courses, videos and any technical material to learn. This book is just another view in the data science perspective. Hope you like it.