This book provides a linguist with a statistical toolkit for exploration and analysis of linguistic data. It employs R, a free software environment for statistical computing, which is increasingly popular among linguists. How to do Linguistics with Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches, among which are Behavioural Profiles, Vector Space Models and various measures of association between words and constructions. The statistical topics are presented comprehensively, but without too much technical detail, and illustrated with linguistic case studies that answer non-trivial research questions. The book also demonstrates how to visualize linguistic data with the help of attractive informative graphs, including the popular ggplot2 system and Google visualization tools. This book has a companion
Natalia Levshina is a postdoctoral researcher in the ERC-funded Project “Grammatical Universals” headed by Martin Haspelmath. Since obtaining her PhD degree in Leuven under the supervision of Dirk Geeraerts and Dirk Speelman, she has worked in Jena, Marburg and Louvain-la-Neuve. Her main interests are usage-based linguistics, functional typology and language variation. She is a fan of big corpora, programming and statistics and has recently published a textbook “How to Do Linguistics with R: Data exploration and statistical analysis” (2015).
This is one of the clearest books on how to tackle linguistic problems with R. I liked that it is case study driven (especially lexical semantics wise). If there is one point of improvement, it would be that I want more tidyverse applications of the problems, but that the book uses mostly base R is understandable, given its release in 2015. Maybe something for a second edition?