Jump to ratings and reviews
Rate this book

Transformers for Natural Language Processing

Rate this book
Publisher's A new edition of this book is out now that includes working with GPT-3 and comparing the results with other models. It includes even more use cases, such as casual language analysis and computer vision tasks, as well as an introduction to OpenAI's Codex. The transformer architecture has proved to be revolutionary in outperforming the classical RNN and CNN models in use today. With an apply-as-you-learn approach, Transformers for Natural Language Processing investigates in vast detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers. The book takes you through NLP with Python and examines various eminent models and datasets within the transformer architecture created by pioneers such as Google, Facebook, Microsoft, OpenAI, and Hugging Face. The book trains you in three stages. The first stage introduces you to transformer architectures, starting with the original transformer, before moving on to RoBERTa, BERT, and DistilBERT models. You will discover training methods for smaller transformers that can outperform GPT-3 in some cases. In the second stage, you will apply transformers for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Finally, the third stage will help you grasp advanced language understanding techniques such as optimizing social network datasets and fake news identification. By the end of this NLP book, you will understand transformers from a cognitive science perspective and be proficient in applying pretrained transformer models by tech giants to various datasets. Since the book does not teach basic programming, you must be familiar with neural networks, Python, PyTorch, and TensorFlow in order to learn their implementation with Transformers. Readers who can benefit the most from this book include experienced deep learning & NLP practitioners and data analysts & data scientists who want to process the increasing amounts of language-driven data. (N.B. Please use the Look Inside option to see further chapters)

384 pages, Paperback

Published February 1, 2021

37 people are currently reading
149 people want to read

About the author

Denis Rothman

15 books12 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
19 (43%)
4 stars
17 (38%)
3 stars
5 (11%)
2 stars
3 (6%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Raed.
327 reviews122 followers
January 16, 2022
Transformers : THE STATE-OF THE-ART
This book is for Data Scientists or Data Engineers, but this review is for Everyone

The story begin by a genius called Markov this man introduced the concept of random values and created a theory of stochastic processes. We know them in artificial intelligence (AI) as Markov Decision Processes (MDPs).

In 1948, Claude Shannon's The Mathematical Theory of Communication was published. He cites Andrey Markov's theory multiple times when building his probabilistic approach to sequence modeling.

In 1950, Alan Turing published his seminal article: Computing Machinery and Intelligence. Alan Turing based this article on machine intelligence on the immensely successful Turing Machine that decrypted German messages. The expression artificial intelligence was first used by John McCarthy in 1956. However, Alan Turing was implementing artificial intelligence in the 1940s to decode encrypted encoded messages in German.

In 1982, John Hopfield introduced Recurrent Neural Networks (RNNs) . John Hopfield was inspired by W.A. Little, who wrote The Existence of Persistent States in the Brain .


In the 1980s, Yann Le Cun (a Brave man believed in himself ) designed the multi-purpose Convolutional Neural Network (CNN). He applied CNNs to text sequences, and they have been widely used for sequence transduction and modeling as well.

After that, if AI models needed to analyze longer sequences that required an increasing amount of computer power, AI developers used more powerful machines and found ways to optimize gradients


It seemed that nothing else could be done to make more progress. Thirty years passed this way. And then, in December 2017, came the Transformer, the incredible innovation that seems to have come from a distant planet. The Transformer swept everything away, producing impressive scores on standard datasets.

AND WELCOME TO A NEW AGE OF ARTIFICIAL INTELLIGENCE 🧠







60 reviews5 followers
February 3, 2021
This book successfully filled those enormous gaps I had in my understanding of transformers, BERT, GPT-2, GPT-3.

I really enjoyed the structural, organized, step-by-step approach to introducing base transformers, then BERT, RoBERTa, GPT-2 and GPT-3, downstream tasks with fine-tuning, T5 models, tokenizers, semantic labeling, possible optimizations, real-world use cases, and applications. Each chapter built off the previous ones. If a new model architecture was introduced, the differences between it and the base transformer architecture were highlighted and explained.

One of the most important takeaways for me was the overview of the next steps, the potential for practical usages and building ideas and projects off transformers - where to start, bottlenecks, what options available, etc.
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.