Doing data science is difficult. Projects are dynamic with requirements that change. Data arrives piecemeal, is replaced, contains flaws and comes from a variety of sources. New business rules are uncovered. Guerrilla Analytics shows you how to structure and manage your Data Science projects so you can focus on doing Data Science.
In this book, you will learn about:
The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting.
Reproducible, traceable analytics: how to design and implement work products and data pipelines that are reproducible, testable and stand up to external scrutiny.
Practice tips and war stories 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research.
Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions.
Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects
A must read if you do data analysis and do not have experience in working in software engineering teams. Many useful concepts from software engineering may boost analytics team performance and that's what this book mostly is about. The book covers common data analytics project practices, but more on a side of traditional collect-analyze-report projects. So, if you seek for an advice on how to organize work to successfully build a software product with is centered on data science applications or other modern concept such as analytics as a service this book might not be the most complete guide. Nonetheless, the basics described there won't go anywhere and will apply for any kind of analytics-related project.
Super detailed guide for analytic projects. It has so many valuable tips that can potentially save you/your team/your project a ton of time, and avoid so many disasters.
Guerrilla Analytics - how to deliver analytics in the cut & thrust of business By Paul Laughlin · September 5, 2019
Despite the plethora of good books on topics like Data Visualisation, very few cover how to deliver Guerrilla Analytics.
By that, I mean beyond the theoretical ideals of Data Science textbooks. Beyond just the coding challenges of how to use R or Python to encode a question. Books that engage with real world challenges for analysts.
So, I am delighted to share one that I have discovered that does just that. A book based on practitioner experience. One that addresses the diverse challenges to delivering effective analytics in today’s changing businesses. The book is called “Guerrilla Analytics” and is written by Enda Ridge, who is Chief Data Scientist at Sainsburys. The subtitles of this book are “a practical approach to working with data” and “the savvy manager’s guide” and it delivers on both promises. The need for Guerrilla Analytics In Part 1 of this book, Enda introduces the need for such an approach to analytics & defines his terms. He paints an all too familiar picture to analytics leaders everywhere. From the broad sweep & ill defined nature of analytics to the number of things that change. Outside the idealistic textbooks, in the real world, data, requirements & resources all change & too many processes are ill defined. Add to that the constraints of limited time & toolsets, it is no wonder that too little analytics is robust & repeatable. To directly address this issue, together with a number of other terms, Enda defines the term Guerrilla Analytics. His definition is: “Guerrilla Analytics is data analytics performed in a very dynamic project environment that presents the team with varied and frequent disruptions and constrains the team in terms of the resources they can bring to bear on their analytics problem.“ Most analytics leaders I know can relate to that reality, whether or not they formally work in an Agile environment. Enda goes on to outline, in useful detail the risks and challenges that such an approach needs to address. Demonstrating why more than a general CRISP-DM approach is needed as a working methodology. As a foundation for the rest of this book he then explains the 7 principles of Guerrilla Analytics. These cover practical day to day decisions about storage, documentation, automation, audit-able work, knowledge management & code design. For this is not a book for just leaders, it’s much more a practical 485 Book Reviews to cross post to Amazon & Goodreads Guerrilla Analytics - how to deliver analytics in the cut & thrust of business handbook for a whole analytics team. Practicing across the workflow In the next part, the reader is walked through how to apply this approach throughout their workflow. Starting with Data Extraction, Enda shows how his 7 principles can be applied to improve practice. Some of the examples get into very specific detail. But the themes and lessons learnt help avoid this becoming too technical or distracting. The diagram of the Guerrilla Analytics workflow is a useful map to this journey and would be a handy visual aide memoire to those seeking to work this way. One of the real strengths of this book is how the author has peppered the text with two elements. First illustrations to summarise his points. Secondly and most importantly “war stories“. Real life examples of how things have gone wrong. These are so helpful in seeing the application to your work. The stages usefully covered in detail are a useful checklist themselves: Data Extraction Data Receipt Data Load Coding Consolidating Knowledge Work Products Reporting Across another way of passing on practical leadership advice from experience of ‘hacks’ that work. Testing Guerrilla Analytics Rightly a focus that is identified as needed at all stages. To bring this to life for the reader, Enda focusses Part 3 on this topic. Here he shares both principles (like ‘establishing a testing culture‘ and ‘test early‘) as well as then getting into practicalities. That practical application covers 3 chapters on what testing needs to look like at stages of the workflow. For Data Testing, he share 5 Cs of data quality to be tested. For testing code he shares a more detailed step-by-step testing guide than I’ve seen for analytics programming. For testing products, Enda returns to a useful 5 Cs of testing for these & the extra steps for testing statistical models. Building the capability you need In the final section of this book, Enda shares his guidance on how to build such a capability. these and the following section on Testing, Enda also shares Practice Tips (90 in total). These are 486
Book Reviews to cross post to Amazon & Goodreads Guerrilla Analytics - how to deliver analytics in the cut & thrust of business Echoing themes we have covered before on this blog, he has advice on People, Process & Technology. The chapter on People Capability stresses the importance of Softer Skills, Data Visualisation & Knowledge Management. These are in addition to the need for a number of technical skills, but he also echoes Martin’s call for an analytical attitude in people. The Process chapter supports arguments I have made for more focus on workflow & methodologies. The technology chapter is a useful overview of the technology elements needed for a working Data Manipulation Environment (DME). That latter term is one Enda is fond of using throughout this book & which he richly describes. Finally, there are useful Closing Thoughts & supporting appendices to motivate you to get started. With a comprehensive index, this book works well not just for an initial quick read, but ongoing for reference. Do you need Guerrilla Analytics for your team? I hope that brief book review is helpful and gives you an idea what to expect from it. I recommend it as a manual for analysts & their leaders alike. In fact, I think it’s such a practical guide to the skills that analysts need to master in practice, that I use it with my university students. I teach a module of the MSc Data Science at University of South Wales and this is my recommended text. Hope it also helps you educate and develop your team. Please share your thoughts or recommendations, especially if you’ve read this book. Well done, Enda, this is a really positive contribution to the analytics community.
"Convention over configuration" Guerrilla Analytics sets out key principles, methodologies with practical examples about how a data function can operate at a high pace, with high performance, in any industry. I didn't think you could have an agile analytics operation without the pain of a certain amount of chaos. I've been working in analytics for over 10 years in science research, consultancy and tech, and I wish I'd read this book when I was starting out. Learning these principles have increased my productivity manyfold.