416-Deep Learning-Ian Goodfellow-ML-2016
Barack
2024/04/28
"Deep Learning", was first published in 2016. It introduces the broad topic of deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives. Deep learning is a form of machine learning that enables computers to learn from experience and understand the world based on a hierarchy of concepts. Because computers gather knowledge from experience, human-computer operators do not need to formally specify all the knowledge the computer needs. Hierarchies of concepts allow computers to learn complex concepts by building them from simpler concepts. These hierarchical diagrams will have many layers.
It provides a mathematical and conceptual background covering relevant concepts in linear algebra, probability and information theory, numerical computing, and machine learning. It describes deep learning techniques used by industrial practitioners, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodologies; it surveys natural language processing, speech recognition, computer vision, online recommendations systems, bioinformatics, and video games. Finally, the book provides a research perspective covering theoretical topics such as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, partition functions, approximate inference, and deep generative models.
Ian Goodfellow was born in America in 1987. Studied at Stanford University and Université de Montréal . He is an American computer scientist, engineer, and executive known for his work on artificial neural networks and deep learning. He previously served as a research scientist at Google Brain and director of machine learning at Apple and has made several important contributions to the field of deep learning, including the invention of generative adversarial networks (GAN).
Table of Contents
1 Introduction
I Applied Math and Machine Learning Basics
2 Linear Algebra
3 Probability and Information Theory
4 Numerical Computation
5 Machine Learning Basics
II Deep Networks: Modern Practices
6 Deep Feedforward Networks
7 Regularization for Deep Learning
8 Optimization for Training Deep Models
9 Convolutional Networks
10 Sequence Modeling: Recurrent and Recursive Nets
11 Practical Methodology
12 Applications
III Deep Learning Research
13 Linear Factor Models
14 Autoencoders
15 Representation Learning
16 Structured Probabilistic Models for Deep Learning
17 Monte Carlo Methods
18 Confronting the Partition Function
19 Approximate Inference
20 Deep Generative Models
When studying the myths of various countries, we can find that almost all myths describe God’s creation of humans as the origin of humans, and usually gods create humans in their own image. God has given mankind the wisdom of God, but not the lifespan of God. And when humans began to try to play the role of creator, they seemed to naturally expect to create creations similar to themselves. For computer scientists and engineers, such goals might include designing software that simulates the human brain and designing hardware that resembles humans in appearance and capabilities. If we trace history vertically, we can find that people had this desire thousands of years ago. In modern times, if we look horizontally, we will find that even in the early stages of life, children's scribblings often focus on people as important objects.
Machines excel at tasks that can be clearly defined and formulated, such as chess and Go. Since the solutions to these problems are theoretically clear, as long as the hardware conditions are sufficient, there is enough storage space, and fast computing power, it is almost inevitable for machines to solve these problems. So tasks that humans find difficult are actually easier for machines. However, tasks that are simple for humans, especially cognitive tasks, appear complex for machines. These so-called simple tasks are often based on a large amount of life experience, forming a subjective judgment. For example, when you see a face, you can immediately remember the name and related memories, and when you hear words, you can quickly convert them into corresponding concepts and understand them. Although humans have made major breakthroughs in scientific research in this area after 2000, there is still a long way to go to achieve strong artificial intelligence or general artificial intelligence. In everyday life, simple common-sense events are generally not considered a sign of intelligence, whereas the ability to solve complex mathematical and physical problems is considered a sign of genius. In contrast, in the world of machines, software that can understand images, and language and perform reading comprehension is considered more intelligent. If one day, humans can have a deeper understanding of how the human brain processes language, images, and text, and can describe these abilities in a formulaic way, then there will probably be breakthroughs in the capabilities of machines in these intuitive fields.
Whether it is a machine or a human, recognizing an object requires capturing its characteristics. The human process of capturing features often happens unconsciously and automatically, so we often don’t think deeply about the process. However, how to identify these features is a very challenging problem for machines. I am reminded of a story about Plato and Diogenes. Plato defined man as a "bipedal hairless animal." Diogenes then plucked all the feathers out of a chicken and brought it to Plato. Regardless of whether this story is true or not, it reflects how difficult it is to describe something as accurately as possible through wired characteristics. Since the extracted features are incomplete or inaccurate, the machine is prone to make mistakes when making judgments based on these features. For simple objects, we may be able to manually set specific characteristics for the machine to make judgments, but for more complex problems, this method obviously does not work. Therefore, we must hope that the machine can independently find a suitable way to learn feature extraction by itself.
If we study neural networks, the basic idea is to simplify complex problems, that is, by decomposing a complex problem into multiple simple sub-problems. These simple factors, the variables or features we choose, form the input layer, which is visible to the user. The output layer displays the final result, while intermediate layers are usually invisible to the end user. In a neural network, the output of each layer becomes the input of the next layer. In addition, we can also calculate the depth of the neural network in two ways: one is to calculate the length of the longest path from the input to the output, and the other is to calculate the depth of the graph. Our best practices show that generally greater depth provides better results.
The author introduces the basics of linear algebra at the beginning. I am now more and more aware of the importance of mathematics in various subjects. Looking back on the education I received in the past, I now feel that any emphasis on mathematics is justified. However, in the process of learning mathematical knowledge, I found that I lacked an intuitive feeling for the specific application of this knowledge, which to a certain extent weakened my understanding of the importance of the beauty of mathematics. On the one hand, mathematics is important and is the result of rational thinking; on the other hand, mathematics also has natural beauty, which is often ignored in mathematics education. In the field of computer science, linear algebra, and discrete mathematics are important mathematical foundations. I took these courses as an undergraduate. Linear algebra mainly studies continuous mathematical problems, while discrete mathematics, as the name implies, focuses on dealing with discrete elements. Just as a pyramid is built step by step from its foundation stones, complex mathematical theories, and applications are derived step by step from these basic concepts.
Probability theory is an important branch of mathematics in computer science that plays a central role in programming. In essence, programming is a modeling activity where we build models to simulate and analyze real situations. In most cases, models are simplifications of real situations. Due to this simplification, some features of the real situation are bound to be lost in the model, and these missing features lead to a certain degree of distortion, thus creating uncertainty. When dealing with this uncertainty, we often need to design some simple and effective rules so that the computer can make judgments. Often, a simple but slightly flawed explanation is more adaptable in practice than a rigorous but complex rule. When an algorithm makes a choice, it usually selects the explanation or decision with the highest probability from many possible options. This is essentially a process of finding a local optimal solution. For example, large language models are an example of predicting output based on input. This model infers possible outputs based on inputs, so the quality of the input directly affects the quality of the output.
Numerical computing is an important field, not least because in practical applications we face a challenge: numbers are infinite and computer memory is limited. Therefore, we need to use limited memory to express nearly infinite values as much as possible. Due to this limitation, the loss of information may lead to some problems such as underflow and overflow. Underflow occurs when values approach zero, and some functions may not work as expected. In this case, we may need to modify the input to a very small value close to zero, which is a problem I have encountered in my previous programming experience. On the other hand, overflow occurs when the input value approaches positive or negative infinity, beyond the range that the program can detect. Therefore, it is very important when writing programs to pay attention to handling these edge cases to avoid potential errors.