David’s Reviews > Natural Language Processing in Action > Status Update

David
David is on page 123 of 544
Jan 17, 2022 11:38AM
Natural Language Processing in Action

flag

David’s Previous Updates

David
David is on page 153 of 544
Not absorbing as much of this past chapter. Feels like the authors are speaking to their peers at this point rather than to students. Could also be that there are some non-listed prerequisites I need to learn first, but am still absorbing a high enough percentage to keep going with this book.
Jan 18, 2022 10:31AM
Natural Language Processing in Action


David
David is on page 143 of 544
Jan 18, 2022 10:05AM
Natural Language Processing in Action


David
David is on page 134 of 544
Jan 18, 2022 08:46AM
Natural Language Processing in Action


David
David is on page 128 of 544
Jan 17, 2022 04:15PM
Natural Language Processing in Action


David
David is on page 115 of 544
Jan 16, 2022 10:30AM
Natural Language Processing in Action


David
David is on page 96 of 544
Deeper dive into inverse document frequency. Showed how to build a TF-IDF matrix (using scikit-learn). Listed several - and walked through one - alternative algorithm for TF-IDF normalization
Jan 12, 2022 02:51PM
Natural Language Processing in Action


David
David is on page 86 of 544
Revisiting vector spaces in more detail, plotting two vectors and calculating cosine similarity between multiple vectors.

Also introduced Zipf's law: given a large enough document the number of occurrences of distinct words - when in descending order - will often differ from the next distinct word(s) in surprisingly neat distances.
Jan 11, 2022 12:56PM
Natural Language Processing in Action


David
David is on page 76 of 544
Sentiment analysis: rule-based vs machine learning (naive Bayes).

Next, bag-of-words with term-frequency times inverse document frequency (TF-IDF) factored in.
Jan 09, 2022 10:43AM
Natural Language Processing in Action


David
David is on page 59 of 544
A formal introduction to stop words, and drilling into n-grams a little further. Text normalization techniques, like case folding and stemming.
Jan 04, 2022 04:52PM
Natural Language Processing in Action


David
David is on page 48 of 544
A tour through the simplest and increasingly more complex and flexible ways to tokenize text (documents)

Introduced Panda DataFrames and Series, illustrated the value of condensing one hot vectors into more compact structures (based on dictionaries). Showcasing the value of bag-of-words vectors and word frequency vectors to simplify a large document to its essence.

Also introduced dot product (aka matrix product)
Jan 02, 2022 12:26PM
Natural Language Processing in Action


No comments have been added yet.