Data has dramatically changed how our world works. From entertainment to politics, from technology to advertising and from science to the business world, understanding and using data is now one of the most transferable and transferable skills out there.
Learning how to work with data may seem intimidating or difficult but with Confident Data Skills you will be able to master the fundamentals and supercharge your professional abilities. This essential book covers data mining, preparing data, analysing data, communicating data, financial modelling, visualizing insights and presenting data through film making and dynamic simulations.
In-depth international case studies from a wide range of organizations, including Netflix, LinkedIn, Goodreads, Deep Blue, Alpha Go and Mike's Hard Lemonade Co. show successful data techniques in practice and inspire you to turn knowledge into innovation. Confident Data Skills also provides insightful guidance on how you can use data skills to enhance your employability and improve how your industry or company works through your data skills. Expert author and instructor, Kirill Eremenko , is committed to making the complex simple and inspiring you to have the confidence to develop an understanding, adeptness and love of data.
How to master the basics of Data Science. A review from a Data Engineer.
If you ever wanted to figure out what a Data Science does on an average day, this book is an excellent start! Whether you want to graduate as a Data Scientist, transition your career as a Data Scientist, or have to work along with Data Scientists, this book will fill you into the big picture in learning more details about its role.
A little bit about myself: I have worked extensively as a Data Engineer. Nonetheless, there is not a single mention about Data Engineers in this book. At some point in the book, the author mentions that the IT department and Marketing department were like completely separate entities that had no reason to communicate with each other and the Data Scientist is there to bridge that gap. Although the book does not mention the word Data Engineering, it does mention a similar task that Data Engineers do: Data preparation. In general, Data Scientists have to work 80% of their time in data preparation, yet it only covers that topic in only one small chapter within that book as opposed to two big chapters for the topic of Data Analysis (Spending less than 10% of your time in data analysis is doable these days because there are a lot of modern data science toolkits that can reduce the mundane work you had to do before, as long you know intuitively how those tools work). This perfectly makes sense if we take into account that 80% of the project sources Data Scientist use are already prepared and cleaned by Data Engineers while the other 20% has to be prepared by themselves because of their unique project requirements. Yes, Data Scientist spends 80% of the time in data preparation in most companies. Data scientist are not efficient in data preparation as Data Engineers because they are generalists of many disciplines. Otherwise, if the databases were better prepared in an IT department by having more Data Engineers or Data Engineers collaborated more with Data Scientists, they would have spent less time with data preparation. This is perfectly the case in big multinational corporations where a Data Scientist 80% of the time is focused on presentations instead of data preparations.
Another rule of thumb to take from this book is that the audience the author tries to tailor is catered for people who will use data science for practical purposes. Academia has a completely different approach in the duties that are assigned to Data Scientists. In Academia, you do not work in general problems that you have to understand their context and apply data analysis to your solutions. Instead, your time is mostly dedicated to research and prototyping new algorithms that are more efficient and effective. Data Science is on a boom, but it is not happening in Academia, but in small to big businesses. The responsibilities of a Data Scientist in a business differ from what you would encounter in Academia. A lot of the content on the internet that provides materials to Data Science is mostly tailored if you wanted to work in an academic field. Fortunately, this book will show you the ropes to get to the other side of the road.
The main goal of the book is to teach you what a Data Scientist does. A Data Scientist is a generalist. It consists of doing a lot of different type of tasks. The Data Scientist Toolkit provided in the book will help you in mastering them all. By the end of the book, the author hopes you will be familiar with all the basics a Data Science does in his work field, making you a confident practitioner when you start to become one or have to engage with one. Although all of them are tailored for any business organization that advertises Data Scientist as a job position, keep up in mind the structure of the organization. If there is a lack of data engineers, expect more data preparation on your end. If there is a lack of data literacy, expect to be more of a people person to buy-in to your propositions. If the organization is small, expect more to work on technical stuff (i.e. data analysis) due to the limited resources. If your organization is big, expect to do more presentations.
Regardless of what environment you will be at, the book takes an excellent cover of all the type of tasks Data Scientist does. You may not be working as a Data Scientist which requires you to be good at all the type of tasks the books mention to you, but if you shift to another job environment, the stuff that you learn here may become handy in those situations.
The first chapters of the book are not tailored to what a Data Scientist does, but advocating why Data Science is becoming such an emerging and important field within our economy. Although it sounds like a section that you should skip reading, I suggest that you do not. We live in times where Data Science is treated by many people as a fad. It is like we are living again in the early times where computers were just invented where its practical applications to a small business would be small, yet impossible to afford. But times are changing. Collecting data and using data more effectively to the bottom line of your business, such as enhancing customer experience for higher retention rates or finding patterns within your business that can be improved in order to reduce costs and providing cheaper prices to your customers, will give you the competitive advantage. It is no wonder that applying data in your organization is a matter whether you will disrupt others or get disrupted. I think we already got the main highlights how that went who those who followed to use data in their organization to those who not: Netflix, Amazon, and Uber are companies that started from nothing and changed the playing field in movies, e-commerce, and ride-hailing transportation within a few years. With solid proof of how these companies overtook the market share by making proper observations from the data that was collected, it is no doubt that more companies will come to disrupt other categories within our economy. For that reason, it is best to stick reading in those chapters for getting a grip of the second phase of our information age. Yes, the information age had two stages. The first phase of the information age was how to collect and distribute and share data effectively and efficiently. Making better storage devices to store information and exchanging information of the internet at faster speeds where the first stage. Think of it more like the industrial revolution. Now we live in times where we can cater our information based on our personal needs. In other words, we are seeing a big economy swing like the times we experienced the commercial revolution. No wonder data scientist is becoming the sexiest job of the year. The book gives a lot of use cases in how companies use data within their organizations. It gives several examples of each type of need within Maslow's hierarchy of needs. These examples can later be used to advocate others in using data within their organization or explain to a non-technical person in layman terms the benefits of using data within their organization.
After the book finishes its introduction to the topic, it goes at each step of the Data Science Toolkit. It consists of the following:
A. Identify the question: Teaches you to improve your interpersonal skills with others, as well filling the gaps from reaching your destination. The problem with these days is that business requirements are very fragmented that you have to communicate with a lot of stakeholders to understand the problem more clearly. Hopefully, one day, all information will be organized better online so everybody is on the same page. But since the reality for most organizations is not like that, you have to take the effort on being initiative on having a presence around them so you can be an ear to their problems. The less technical the person is or less relevant in knowing how the business operates, the more you will have to fill in the gaps to the destination they want to reach. For that reason, you have to iterate with them many times until they can identify the right question that they need to approach to their problem. Last but not least, you have to prioritize the problems you have received in your organization to bring the most positive business impact. There are two reasons to identify the problem in person: 1. To get comfortable more with you by knowing what you do and how you can help them so you have fewer politics with them down the road 2. To know some stuff within the organization (how it operates, what goals they have prioritized) which you cannot find alone in the data. Either case, it teaches us that all business solutions should start from a top-down view in order to not stray in a path that does not fit within the business culture.
B. Data preparation: To get in the technical part of getting your hands dirty with data. Stuff that you learn in here ranges from learning the various type of files you will be working with to removing and replacing records with incorrect data in such a way that it does not distort the model that you will use later on. Although sounds like a grind, it is very critical to clean the data properly, otherwise the rest of your effort (data analysis, visualization, presentations) can ultimately reach to the wrong conclusions.
C. Data analysis: Dedicating two long chapters to the book, the author explains in very detail each method intuitively. Not only that, he describes what use cases each algorithm is often used within the business. It is like having a very nice professor on getting the grasp of a concept you have never heard before. He does a very good job of presenting those concepts in the book. Those concepts are still very hard to stick in your head until you do some exercises or create some small projects out of it, which the author recommends either by using free datasets that exist online or take one of his course lectures in his site superdatascience. He splits data analysis into three topics, from easiest to hardest: problems that we have data and categories are already given (classification), problems that we have data and do not know its categories (clustering), problems that we do not have enough data collected (reinforcement learning). For classification, it discusses decision trees, random forest, K-nearest neighbors, Naive Bayes, Logistic regressions. For clustering, it discusses K-means clustering and Hierarchical clustering. For reinforcement learning, it discusses upper confidence bound and Thompson sampling and how much better they are than A/B testing.
D. Visualization: Author advocates visualizations are much better than numerical drill down reports because visuals can give us more different interpretations when observed. It gives us a lot of recommendations on how to make your visuals simple and organized. This chapter elevates from the rest by providing several pages in how to use the proper colors and charts within your visualizations.
E. Presentation: This may sound a generic topic that you can find in any other discipline as well other than Data Science, but it is still relevant to know in order for others to buy-in to your proposed solution. Otherwise, all the effort from the previous steps will have gone down the toilet. This chapter teaches you all the tricks common presenters do to make their presentation engaging and comfortable. A lot of clicks and "aha" moments will pave your way here if you have sat down in many presentations. Creating a structured story from all the steps that you have gone through (from identifying the problem to the proposed solution) in the context of your audience will surely be a killer presentation.
Overall, the book stands out, for not only talking about one topic but five topics of what a Data Scientist does. When a Data Scientist needs to identify questions, it shows that he needs to have good verbal communication skills, as well as writing and visual skills for his presentations. Overall, a Data Scientist in a business organization differs from a Data Analyst from his soft skills. Regardless, Data Scientist needs to know all the technicalities in cleaning data properly and using the correct algorithms within his analysis.
The last chapter talks about the different careers you can start before becoming a data scientist: Business Analyst, Data Analyst, Data Scientist Manager. It also gives you tips to stand out from the rest, such as identifying your strengths, making connections, keeping your skills relevant, and sharing your work portfolio and interests online.
As a Data Engineer, I am happy that my deliverables helped with data analysts and data scientists in many ways. My first two years within my job was working as a Report Developer creating interfaces to users that can get a subset out of the corporate data that they need through the filter checkboxes and textboxes provided for each type of report. Data analysts create from the small data gathered visualizations within their spreadsheets. My further years was working in data warehouses and data marts, providing clean data where Data Scientist could use within their models. Overall, all these years, I had a blurry picture of what they exactly do within their organization and this book helped tremendously in getting a better clear picture of their relevant roles. For these reasons, whether you want to work as a Data Scientist or get to know more about them, this book is a great fit to get started.
Muito bom para ter uma ótimas noção sobre data science e fundamentos de análise de dados que froam muito úteis pra mim. Anotei bastante coisa que conseguirei colocar em prática.
I learnt a lot from Kirill as I also took one of his courses previously (will take a few more).
It is a great guide to understand what it means to be a data analyst, how to become one, what they do and what their actual task is. To ensure that stakeholders understand the outcome of analysis, the required following actions and their costs. The author explains well that we need analytics everywhere with case studies within Maslow's Hierarchy of Need.
The book will guide you on how an analysis is performed (the whole process generally), some analysis techniques, intuition behind them and statistics, visualization techniques (he also convinced me to appreciate the value of graphics) as a great analyst should not underestimate the importance of presentation skills. You will also find tips on where to start and what to expect on a job interview.
This is an unusual book for me. The book provides information for the reader to decide if they want to pursue a career in data science. The chapters layout all the information the reader needs to make the decision and to help them make choices for their pursuing education, applying and interviewing for a job, the first job, and expanding to a fulfilling career. If anyone is looking for a career change this is one of the books you must read. Furthermore, if you have thought of getting into data science this is the book for you. I've never seen a book spell out a career like this book does. I wish other careers would do this. It would help in choosing a career a lot more efficient. Kudos to the author.
It started very well, but then added up in meta language. The book has more paragraphs about writing the book and the author itself, than about the subject of data science. The data science part is short and very, very rudimentary and the vast of the chapters is aimed at juniors leaving the university. In total the book could be reduced to half the pages and I don't need statistics for that. There's a chapter on presentation skills, finding a job, ... I'm just missing the chapter how to knot my tie. The SuperDataScience website is also referenced and quoted on every possible occasion. If you want a data science intro, than look for other, less written-by-a-guru alternatives.
Good overview for someone totally new to data, its possibilities, and/or data analysis. Written in a very accessible way. Includes a decent non-technical intro to algorithms and their appropriate applications.
Really does not get into much technical content in terms of in-depth examples or processes, coding or mathematical aspects of algorithms and statistics. That’s ok for what this book is - it’s to provide an overview of data skills so people can make more informed choices based on their careers and interests, and seek more information/training from there.
This book offers to beginning to end insight into data science, from the gathering of questions and requirements at the start of a project to the presentation and dissemination of results at the end. While it could easily have gone into a lot of detail and complicated terminology, Confident Data Skills is incredibly accessible and discusses the topics in a manner which anyone interested in data science could understand but without it being patronising or too simplified. While it could have gone into more detail about the different kinds of algorithms used (it only covers a small number) and there were a few diagrams and images about colour that were in black and white, it provided a good introduction, and general resource, for data science.
One of the best general overviews of what does mean data science. It explains the entire process from how to structure a data analysis to how to present the results to stakeholders. It does not deep dive into the analytical methods, but it explains the concept, logic and process of the main ones, allowing you to, once you have the big picture understand and study each of them independently. I'd recommend this book almost to anyone who has to deal with data: from someone that wants to become an analyst but does not know where to start to a manager that wants to improve his/her analytical skills and manage a team of analyst or an analyst that wants to have a diffent perspective of how the process is done.
Good content, great approach to thinking about data and potential impact. Easy to understand in most chapters and gave good foundational knowledge. Book was poorly edited though, and could use another revision as there were many errors and typos. Also the section on Maslow's Hierarchy of Needs seemed forced.
Provides a good overview to Managers, Data Scientists and Data Engineers. I would recommend the read to anyone interested in entering either field or just want an overview of the possibilities of data.
Had trouble understanding some of the algorithms. I wish there was more information here. Felt like there were some assumptions. Other than that, the book was exactly what I needed as a starting point into data.
This book is a good introduction for total n00bs to Data Science... Anyone with some pror knowledge and/or quantitative chops will want to looks for something more advanced.
for everybody who don’t have any knowledge about data science maybe it will be a decent book.
i’m the one that have no prior knowledge and found that this books interesting to read also easy to understand.
for you who likes book that have a lot practical thing this will be bad. instead this book will give you a lot explanation about data science in theory way with analogies that makes you understand.
there are introductory how data science applications in real world. this book also give explanation about step to become data science job related.
Almost everyone, especially at nonprofits, needs to understand and work with data. This is an accessible text that may raise awareness of the growing field as well as the fact that you don’t need a masters degree in data science to become a data scientist.
This book takes a broad approach to introducing what a data professional does daily and I would recommend it to anyone looking to break into a data career. I really like the case studies the author included and found them to have helpful material for interviews. As a side note, I found it unnerving that the Oxford comma (do British publishers not care for it?!) was missing and some of the punctuation was off. pg 126 ...changes in a country's__against its GDP crime rate.
Kirill is great explaining and transmitting the confidence needed to start a career in Data Science. It also lays in simple steps how anyone, including people that have never been in a technical field, can start and have a successful career in this field, Data Science.
Excellent fundamental resource for data (and other) analytics. Checked this out at the library, but highly considering buying a copy for myself. Would highly recommend!