Jump to ratings and reviews
Rate this book

Statistics for Data Science: Leverage the power of statistics for Data Analysis, Classification, Regression, Machine Learning, and Neural Networks

Rate this book
Get your statistics basics right before diving into the world of data science Data science is an ever-evolving field, which is growing in popularity at an exponential rate. Data science includes techniques and theories extracted from the fields of statistics; computer science, and, most importantly, machine learning, databases, data visualization, and so on. This book takes you through an entire journey of statistics, from knowing very little to becoming comfortable in using various statistical methods for data science tasks. It starts off with simple statistics and then move on to statistical methods that are used in data science algorithms. The R programs for statistical computation are clearly explained along with logic. You will come across various mathematical concepts, such as variance, standard deviation, probability, matrix calculations, and more. You will learn only what is required to implement statistics in data science tasks such as data cleaning, mining, and analysis. You will learn the statistical techniques required to perform tasks such as linear regression, regularization, model assessment, boosting, SVMs, and working with neural networks. By the end of the book, you will be comfortable with performing various statistical computations for data science programmatically. Step by step comprehensive guide with real world examples This book is intended for those developers who are willing to enter the field of data science and are looking for concise information of statistics with the help of insightful programs and simple explanation. Some basic hands on R will be useful.

286 pages, Paperback

Published November 17, 2017

6 people are currently reading
20 people want to read

About the author

James D. Miller

29 books5 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
0 (0%)
4 stars
0 (0%)
3 stars
1 (25%)
2 stars
3 (75%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for BCS.
218 reviews32 followers
March 16, 2018
Software applications of various kinds have long offered the ability for users to report upon the data that they have acquired and generated. More recently, and particularly with the advent of Big Data, users have begun to explore not only what is in their data sets, but also to ask why the results are as they are, how elements of the data sets are interrelated and whether historical data can be used to predict future values. Answering these types of questions has led to the emergence of the discipline of Data Science. While data science is somewhat vaguely defined, its practitioners typically hold an interdisciplinary skill set which may include elements of database analysis, programming, statistics and data visualisation.

This book aims to introduce software developers to some of the different tools and techniques typically used in data science. In order to help make the content relevant to the target audience, the author attempts to draw parallels between data science and database development.

The book is structured into twelve chapters. Strangely, given the book’s title, the discussion of statistical analysis only really starts at chapter five. The preceding chapters attempt to differentiate between the roles and responsibilities of database developers and those of data scientists. From chapter five onwards, the book introduces and discusses a variety of topics related to data science, including regression analysis, regularisation and boosting, as well as machine learning topics such as artificial neural networks and support vector machines. Where appropriate, examples are given in the R programming language, and the source code and data sets used within the book are available for download from the publisher’s website.

Unfortunately, however, I can find little to recommend this book. The writing style is poor and the structure often meanders. In places the text does not read as being authoritative, with abrupt departures from the author’s loose conversational style into a more scholarly tone suggesting that the content is perhaps being drawn from elsewhere. Sentences often include unnecessary parenthetical comments, in one case six in a single sentence, which do little to aid readability. Parallels drawn with the author’s perceived experience of the ‘data or database developer’ are often tenuous. Indeed, the chapter titles themselves are somewhat misleading. It is hard to imagine, for example, what ‘database progression to database regression’ might actually refer to.

Some examples which really need little explanation, such as the description of parallelism within artificial neural networks, are laboured and confusing. On other occasions, the text is simply wrong, for example, the output of a sigmoid function is incorrectly described as switching from zero to one based on some threshold. Frustratingly, some points of interest receive no explanation at all. For example, the outlier in a learning curve in the chapter on model assessment attracts no comment or explanation, the author preferring rather to explain how to export the graph as a PNG format file.

The book’s references are obscure and poorly cited and some parts of the text, such as the section on variance, do little more than direct the reader to Wikipedia.

The book is not helped by deficiencies in the editorial and production processes. There are spelling errors, repeated and nonsensical sentences and meaningless illustrations. Also, despite outward appearances, the first data set that is referred to in the text isn’t available in the accompanying download, the CSV file being merely a renamed copy of a text file containing the source code of the examples.

There is certainly some useful information in this book, however for me the text felt like an early draft rather than a finished product. Rather than being helpful, I found the attempt to align concepts with those deemed to be familiar to the target readership often seemed awkward and, at times, patronising. There is no shortage of introductory books in the data science field and I would suggest that the time and effort required to read this book could more fruitfully be expended exploring other resources.

Review by Patrick Hill BSc(Hons) MSc PhD CEng MBCS CITP
Originally published: http://www.bcs.org/content/conWebDoc/...
Profile Image for Aniket Patil.
525 reviews22 followers
February 21, 2019
I found this book good to understand data science theory. This book is okay to get some knowledge but I am unable to learn it all /get all that is written in the book. Even you can plot data by following examples in the book. It looks same as given in the book. But that would be pure copy and paste, in terms of understanding it lacks the depth as well as the ease which was required. Book clearly mentions about people experienced in this field or software development field will understand quick and I am not among those.

Overall, okay book and not for the freshers though.
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.