Preface This book, “A First Level Book to expedite Statistics through An Inquisitive approach” is drafted as a first level book to those people who are excited to enter into Data Sciences arena.
First chapters are designed to have more orientation towards building skill set in R programming language. These chapters are flooded with many trickier exercises to skim R language skill set. Also, these chapters are equipped with many examples from Mathematics so as to en-rich coding abilities in R. For instance, chapter on Vectors is designed to have examples from numerical integration, differentiation, vector product, scalar product, polynomial multiplication, operations on sets, descriptive statistics, Gini index computation, polynomial’s roots computation. Also, chapter on matrices is designed to elucidate various matrix operations such as addition, subtraction, inverse, pseudo-inverse, multiplication, division, trace, determinant, Eigen values, Eigen vectors, Kronecker product, quadratic form, simultaneous equation solution, covariance matrix calculation from the raw data, mirror transformations, rotation matrix, sparse matrix, etc., using R is illustrated. This book can be the best fit to use in first level linear algebra courses or in related laboratories because of the chapters on vectors and matrices.
Separate chapters on factors, lists, data frames are conceived in the book so as to build extensive skill set in the reader. Also, the chapter in data frames explains about data import/export in addition to recent SQL style data manipulation package, dplyr. Various forms of join operations which are common in data processing are explained with live examples.
Unlike other programming languages, the essence of R is lying in its rich set of functions. Chapter on function development introduces the reader on how to write functions. All most, more than 100 functions are included in this chapter to inculcate function development skills in the readers.
Chapter on apply family of functions introduces the concepts such that the reader can process that data without using conventional looping constructs such as while loop, for loop, etc.
Regular expressions and string manipulation in R are explained in a separate chapter. Live web based applications are extra bliss to this chapter.
Chapter on R environment introduces the R system, search path, package environment, function environment, etc., which enlightens the working of R system.
As the book is a first level book and its aim is to let readers to step into Data Sciences with easy hop, we have confined our discussion at user level only. Thus, we did not include hard core programming, software or package development concepts in this book. However, we do not want to wrongly convey to the readers R means this much only, we have introduced object oriented concepts that are extensively used in R through a simple chapter.
For many data science problems, visualization of data is the beginning step. Since its beginning, R is having a rich set of plotting facilities. In this book, we have included two voluminous chapters on visualization. In the first one, we have elucidated base plotting utilities while in the second we have explained about ggplot2, ggvis packages.
Probability distributions, hypothesis testing, etc., are discussed in another bulky chapter. Both parametric and non-parametric methods for testing are discussed in this chapter along with their implementation in R.
Simulation became very common in research studies where one will be using random numbers extensively. A chapter on random numbers is included in this book which explains how to generate random numbers from various distributions such as normal, Poisson, etc,. Also, we have discussed about how to generate multivariate random numbers in R which are essential for machine learning. Also, regression, multiple regression, Poission regression, spline regression, survival analysis, etc