Rate this book

Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning

Name: Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning
Rating: 3.82 (8 reviews)
ISBN: 9781491963005

Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

Rate this book

From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning.

You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems.

Preprocess and vectorize text into high-dimensional feature representationsPerform document classification and topic modelingSteer the model selection process with visual diagnosticsExtract key phrases, named entities, and graph structures to reason about data in textBuild a dialog framework to enable chatbots and language-driven interactionUse Spark to scale processing power and neural networks to scale model complexity

GenresComputer ScienceTechnologyProgrammingCodingComputersNonfiction

334 pages, Kindle Edition

Published June 11, 2018

87 people are currently reading

187 people want to read

About the author

Benjamin Bengfort

6 books8 followers

Benjamin Bengfort is an experienced data scientist and Python developer who has worked in military, industry, and academia for the past 8 years. He is currently pursuing his PhD in Computer Science at the University of Maryland, College Park, doing research in Metacognition and Natural Language Processing. He holds a Master's degree in Computer Science from North Dakota State University, where he taught undergraduate Computer Science courses. He is also an adjunct faculty member at Georgetown University, where he teaches Data Science and Analytics. Benjamin has been involved in two data science start-ups in the DC region: leveraging large-scale machine learning and Big Data techniques across a variety of applications. He has a deep appreciation for the combination of models and data for entrepreneurial effect, and he is currently building one of these start-ups into a more mature organization.

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

19 (33%)

4 stars

15 (26%)

3 stars

16 (28%)

2 stars

5 (8%)

1 star

1 (1%)

Displaying 1 - 8 of 8 reviews

Mahmoud Rabie

5 reviews6 followers

April 2, 2018

I really faced a hard time reading the book, the book contains too much amount of text that I think can be shorten and presented in better ways, also the book didn't present the code in a good format, just chunks of code written here and there and you need to keep following up these lines

The book code located on GitHub still has a lot of issues and won't run with fixing them - The book still in the early release phase)

Some chapters were complex for me and I didn't get much of them (i.e. Context-Aware Text Analysis, Text Visualization)

Also I was expecting that the book will focus more on the analysis part, but I found that big part from the book wasted on building Corpus readers and other stuff not related to "Analysis"

I was expecting the book to focus more on the feature extraction part and the vectorization part in details and with enough code samples

Jevgenij

553 reviews14 followers

March 25, 2019

This book has a very good first half and very bad second half. I'd prefer less code examples and more explanations of _why_, but still first chapters are very approachable, explain things in simple terms with plenty of examples.
The second half is just overly complicated, many terms and strategies are left unexplained or barely explained.

Alexis Idlette-Wilson

8 reviews

August 10, 2018

Lots of code examples and detailed explanation about how analytics could solve a variety of business problems. Well written and a good reference

Suhrob

503 reviews61 followers

March 3, 2019

I don't understand the high ratings here.

The book focuses mostly on old approaches: stuck mostly in NLTK, with only bits of spacy, gensim.

Vector representations are mentioned only briefly. The author is sceptical about this new whipper-snapper technology called deep learning, and gives you only a few pages of the simplest keras implementation.

If it was 2015 I would understand. This book is from 2018.5.

Also it would be OK if the aim was to explain all the basics of tokenization, PoS tagging, lemmatization etc. but all is handled very superficially. You won't get much understanding here unless you've heard about them elsewhere.

Most space is dedicated on bending the NLTK and sklearn APIs to work together...

The "practical" examples seemed quite shallow, unfounded and unclear (I liked the gender analysis in the first part of the book. It is downhill from there).

I DID like the parallelization and Spark parts though! So at least something...

Theo

1 review

December 11, 2018

Find it really hard to follow through the chapters. The code chunks are in bits and pieces and find it really hard to put it together. The book spends great effort in their own data ingestion engine. Nothing wrong about that. But if you attempt to use/install, that is when the trouble starts.

I find the content covered is really good and wish the authors have made the code easy to follow/execute.

Joao António

18 reviews1 follower

January 18, 2022

A very very dense book with a lot of great insights, I started it with a few scripts in mind and on production, it is super interesting how much has changed since I started reading the book. It will stay in my library for consulting for a long time.
I would just recommend that they added more models, in the Text classification, the ones that they recommended are not the simplest one, nor the best ones, just seems a bit random on the models, but still a great book.

Craig Nicol

67 reviews

September 6, 2020

A great step by step overview of a variety of text analysis techniques, taking the reader from beginner through to complex analyses using Spark and Sci-Kit

exported