Pranesh > Pranesh's Quotes

Showing 1-30 of 35
« previous 1
sort by

  • #1
    Brian  Christian
    “Bayes’s Rule tells us that when it comes to making predictions based on limited evidence, few things are as important as having good priors—that is, a sense of the distribution from which we expect that evidence to have come. Good predictions thus begin with having good instincts about when we’re dealing with a normal distribution and when with a power-law distribution. As it turns out, Bayes’s Rule offers us a simple but dramatically different predictive rule of thumb for each.  …”
    Brian Christian, Algorithms to Live By: The Computer Science of Human Decisions

  • #2
    Peter Cawdron
    “It’s simple math, you know. Classic bell curve. Normal distribution. Most people are going to pile up in the middle, not going anywhere, but there’ll be a handful of outliers that go high and low, often for inexplicable reasons. Point is, even survivors don’t really understand why they survive—they just do.”
    Peter Cawdron, Losing Mars

  • #3
    Leonard Mlodinow
    “The normal distribution describes the manner in which many phenomena vary around a central value that represents their most probable outcome;”
    Leonard Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives

  • #4
    “Many of the most interesting phenomena that we have touched upon fall into this category, including the occurrence of disasters such as earthquakes, financial market crashes, and forest fires. All of these have fat-tail distributions with many more rare events, such as enormous earthquakes, large market crashes, and raging forest fires, than would have been predicted by assuming that they were random events following a classic Gaussian distribution.”
    Geoffrey West, Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life, in Organisms, Cities, Economies, and Companies

  • #5
    “Modern statistics is built on the idea of models — probability models in particular. [...] The standard approach to any new problem is to identify the sources of variation, to describe those sources by probability distributions and then to use the model thus created to estimate, predict or test hypotheses about the undetermined parts of that model. […] A statistical model involves the identification of those elements of our problem which are subject to uncontrolled variation and a specification of that variation in terms of probability distributions. Therein lies the strength of the statistical approach and the source of many misunderstandings. Paradoxically, misunderstandings arise both from the lack of an adequate model and from over reliance on a model. […] At one level is the failure to recognise that there are many aspects of a model which cannot be tested empirically. At a higher level is the failure is to recognise that any model is, necessarily, an assumption in itself. The model is not the real world itself but a representation of that world as perceived by ourselves. This point is emphasised when, as may easily happen, two or more models make exactly the same predictions about the data. Even worse, two models may make predictions which are so close that no data we are ever likely to have can ever distinguish between them. […] All model-dependant inference is necessarily conditional on the model. This stricture needs, especially, to be borne in mind when using Bayesian methods. Such methods are totally model-dependent and thus all are vulnerable to this criticism. The problem can apparently be circumvented, of course, by embedding the model in a larger model in which any uncertainties are, themselves, expressed in probability distributions. However, in doing this we are embarking on a potentially infinite regress which quickly gets lost in a fog of uncertainty.”
    David J. Bartholomew, Unobserved Variables: Models and Misunderstandings

  • #6
    Pedro Domingos
    “The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it. For symbolists, all intelligence can be reduced to manipulating symbols, in the same way that a mathematician solves equations by replacing expressions by other expressions. Symbolists understand that you can’t learn from scratch: you need some initial knowledge to go with the data. They’ve figured out how to incorporate preexisting knowledge into learning, and how to combine different pieces of knowledge on the fly in order to solve new problems. Their master algorithm is inverse deduction, which figures out what knowledge is missing in order to make a deduction go through, and then makes it as general as possible. For connectionists, learning is what the brain does, and so what we need to do is reverse engineer it. The brain learns by adjusting the strengths of connections between neurons, and the crucial problem is figuring out which connections are to blame for which errors and changing them accordingly. The connectionists’ master algorithm is backpropagation, which compares a system’s output with the desired one and then successively changes the connections in layer after layer of neurons so as to bring the output closer to what it should be. Evolutionaries believe that the mother of all learning is natural selection. If it made us, it can make anything, and all we need to do is simulate it on the computer. The key problem that evolutionaries solve is learning structure: not just adjusting parameters, like backpropagation does, but creating the brain that those adjustments can then fine-tune. The evolutionaries’ master algorithm is genetic programming, which mates and evolves computer programs in the same way that nature mates and evolves organisms. Bayesians are concerned above all with uncertainty. All learned knowledge is uncertain, and learning itself is a form of uncertain inference. The problem then becomes how to deal with noisy, incomplete, and even contradictory information without falling apart. The solution is probabilistic inference, and the master algorithm is Bayes’ theorem and its derivates. Bayes’ theorem tells us how to incorporate new evidence into our beliefs, and probabilistic inference algorithms do that as efficiently as possible. For analogizers, the key to learning is recognizing similarities between situations and thereby inferring other similarities. If two patients have similar symptoms, perhaps they have the same disease. The key problem is judging how similar two things are. The analogizers’ master algorithm is the support vector machine, which figures out which experiences to remember and how to combine them to make new predictions.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #7
    Pedro Domingos
    “How do we learn? Is there a better way? What can we predict? Can we trust what we’ve learned? Rival schools of thought within machine learning have very different answers to these questions. The main ones are five in number, and we’ll devote a chapter to each. Symbolists view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic. Connectionists reverse engineer the brain and are inspired by neuroscience and physics. Evolutionaries simulate evolution on the computer and draw on genetics and evolutionary biology. Bayesians believe learning is a form of probabilistic inference and have their roots in statistics. Analogizers learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #8
    Pedro Domingos
    “Each of the five tribes of machine learning has its own master algorithm, a general-purpose learner that you can in principle use to discover knowledge from data in any domain. The symbolists’ master algorithm is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference, and the analogizers’ is the support vector machine.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #9
    “In a different direction, the necessity to model the analysis of noisy incomplete sensory data not by logic but by Bayesian inference first came to the forefront in the robotics community with their use of Kalman filters.”
    Ulf Grenander, Calculus Of Ideas, A: A Mathematical Study Of Human Thought

  • #10
    Pedro Domingos
    “prior probability that the sun will rise, since it’s prior to seeing any evidence. It’s not based on counting the number of times the sun has risen on this planet in the past, because you weren’t there to see it; rather, it reflects your a priori beliefs about what will happen, based on your general knowledge of the universe. But now the stars start to fade, so your confidence that the sun does rise on this planet goes up, based on your experience on Earth. Your confidence is now a posterior probability, since it’s after seeing some evidence. The sky begins to lighten, and the posterior probability takes another leap. Finally, a sliver of the sun’s bright disk appears above the horizon and perhaps catches “the Sultan’s turret in a noose of light,” as in the opening verse of the Rubaiyat. Unless you’re hallucinating, it is now certain that the sun will rise. The crucial question is exactly how the posterior probability should evolve as you see more evidence. The answer is Bayes’ theorem. We can think of it in terms of cause and effect. Sunrise causes the stars to fade and the sky to lighten, but the latter is stronger evidence of daybreak, since the stars could fade in the middle of the night due to, say, fog rolling in. So the probability of sunrise should increase more after seeing the sky lighten than after seeing the stars fade. In mathematical notation, we say that P(sunrise | lightening-sky), the conditional probability of sunrise given that the sky is lightening, is greater than P(sunrise | fading-stars), its conditional probability given that the stars are fading. According to Bayes’ theorem, the more likely the effect is given the cause, the more likely the cause is given the effect: if P(lightening-sky | sunrise) is higher than P(fading-stars | sunrise), perhaps because some planets are far enough from their sun that the stars still shine after sunrise, then P(sunrise | lightening sky) is also higher than P(sunrise | fading-stars).”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #11
    Jake VanderPlas
    “Due to the various pragmatic obstacles, it is rare for a mission-critical analysis to be done in the “fully Bayesian” manner, i.e., without the use of tried-and-true frequentist tools at the various stages. Philosophy and beauty aside, the reliability and efficiency of the underlying computations required by the Bayesian framework are the main practical issues. A central technical issue at the heart of this is that it is much easier to do optimization (reliably and efficiently) in high dimensions than it is to do integration in high dimensions. Thus the workhorse machine learning methods, while there are ongoing efforts to adapt them to Bayesian framework, are almost all rooted in frequentist methods. A work-around is to perform MAP inference, which is optimization based.
    Most users of Bayesian estimation methods, in practice, are likely to use a mix of Bayesian and frequentist tools. The reverse is also true—frequentist data analysts, even if they stay formally within the frequentist framework, are often influenced by “Bayesian thinking,” referring to “priors” and “posteriors.” The most advisable position is probably to know both paradigms well, in order to make informed judgments about which tools to apply in which situations.”
    Jacob Vanderplas, Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data

  • #12
    “Supervised learning algorithms typically require stationary features. The reason is that we need to map a previously unseen (unlabeled) observation to a collection of labeled examples, and infer from them the label of that new observation. If the features are not stationary, we cannot map the new observation to a large number of known examples. But stationary does not ensure predictive power. Stationarity is a necessary, non-sufficient condition for the high performance of an ML algorithm. The problem is, there is a trade-off between stationarity and memory. We can always make a series more stationary through differentiation, but it will be at the cost of erasing some memory, which will defeat the forecasting purpose of the ML algorithm.”
    Marcos Lopez de Prado, Advances in Financial Machine Learning

  • #13
    James Gleick
    “When information is cheap, attention becomes expensive.”
    James Gleick, The Information: A History, a Theory, a Flood

  • #14
    James Gleick
    “It is not the amount of knowledge that makes a brain. It is not even the distribution of knowledge. It is the interconnectedness.”
    James Gleick, The Information: A History, a Theory, a Flood

  • #15
    “The first [method] I might speak about is simplification. Suppose that you are given a problem to solve, I don't care what kind of problem-a machine to design, or a physical theory to develop, or a mathematical theorem to prove or something of that kind-probably a very powerful approach to this is to attempt to eliminate everything from the problem except the essentials; that is, cut is down to size. Almost every problem that you come across is befuddled with all kinds of extraneous data of one sort or another; and if you can bring this problem down into the main issues, you can see more clearly what you are trying to do an perhaps find a solution. Now in so doing you may have stripped away the problem you're after. You may have simplified it to the point that it doesn't even resemble the problem that you started with; but very often if you can solve this simple problem, you can add refinements to the solution of this until you get back to the solution of the one you started with.”
    Claude Shannon

  • #16
    Pedro Domingos
    “The second simplest algorithm is: combine two bits. Claude Shannon, better known as the father of information theory, was the first to realize that what transistors are doing, as they switch on and off in response to other transistors, is reasoning. (That was his master’s thesis at MIT—the most important master’s thesis of all time.)”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #17
    James Gleick
    “For Wiener, entropy was a measure of disorder; for Shannon, of uncertainty. Fundamentally, as they were realizing, these were the same.”
    James Gleick, The Information: A History, a Theory, a Flood

  • #18
    Jimmy Soni
    “Geniuses are the luckiest of mortals because what they must do is the same as what they most want to do and, even if their genius is unrecognized in their lifetime, the essential earthly reward is always theirs, the certainty that their work is good and will stand the test of time. One suspects that the geniuses will be least in the Kingdom of Heaven—if, indeed, they ever make it; they have had their reward.”
    Jimmy Soni, A Mind at Play: How Claude Shannon Invented the Information Age

  • #19
    Jimmy Soni
    “In one sense, the world seen through such eyes looks starkly unequal. “A very small percentage of the population produces the greatest proportion of the important ideas,” Shannon began, gesturing toward a rough graph of the distribution of intelligence. “There are some people if you shoot one idea into the brain, you will get a half an idea out. There are other people who are beyond this point at which they produce two ideas for each idea sent in. Those are the people beyond the knee of the curve.” He was not, he quickly added, claiming membership for himself in the mental aristocracy—he was talking about history’s limited supply of Newtons and Einsteins.”
    Jimmy Soni, A Mind at Play: How Claude Shannon Invented the Information Age

  • #20
    “Be wary, though, of the way news media use the word “significant,” because to statisticians it doesn’t mean “noteworthy.” In statistics, the word “significant” means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not—that’s a human judgment.”
    Daniel J. Levitin, A Field Guide to Lies: Critical Thinking in the Information Age

  • #21
    Stanisław Lem
    “Certainly not! I didn't build a machine to solve ridiculous crossword puzzles! That's hack work, not Great Art! Just give it a topic, any topic, as difficult as you like..."
    Klapaucius thought, and thought some more. Finally he nodded and said:
    "Very well. Let's have a love poem, lyrical, pastoral, and expressed in the language of pure mathematics. Tensor algebra mainly, with a little topology and higher calculus, if need be. But with feeling, you understand, and in the cybernetic spirit."
    "Love and tensor algebra?" Have you taken leave of your senses?" Trurl began, but stopped, for his electronic bard was already declaiming:

    Come, let us hasten to a higher plane,
    Where dyads tread the fairy fields of Venn,
    Their indices bedecked from one to n,
    Commingled in an endless Markov chain!

    Come, every frustum longs to be a cone,
    And every vector dreams of matrices.
    Hark to the gentle gradient of the breeze:
    It whispers of a more ergodic zone.

    In Reimann, Hilbert or in Banach space
    Let superscripts and subscripts go their ways.
    Our asymptotes no longer out of phase,
    We shall encounter, counting, face to face.

    I'll grant thee random access to my heart,
    Thou'lt tell me all the constants of thy love;
    And so we two shall all love's lemmas prove,
    And in bound partition never part.

    For what did Cauchy know, or Christoffel,
    Or Fourier, or any Boole or Euler,
    Wielding their compasses, their pens and rulers,
    Of thy supernal sinusoidal spell?

    Cancel me not--for what then shall remain?
    Abscissas, some mantissas, modules, modes,
    A root or two, a torus and a node:
    The inverse of my verse, a null domain.

    Ellipse of bliss, converge, O lips divine!
    The product of our scalars is defined!
    Cyberiad draws nigh, and the skew mind
    Cuts capers like a happy haversine.

    I see the eigenvalue in thine eye,
    I hear the tender tensor in thy sigh.
    Bernoulli would have been content to die,
    Had he but known such a^2 cos 2 phi!”
    Stanisław Lem, The Cyberiad

  • #22
    “Taking least squares is no longer optimal, and the very idea of ‘accuracy’ has to be rethought. This simple fact is as important as it is neglected. This problem is easily illustrated in the Logistic Map: given the correct mathematical formula and all the details of the noise model – random numbers with a bell-shaped distribution – using least squares to estimate α leads to systematic errors. This is not a question of too few data or insufficient computer power, it is the method that fails. We can compute the optimal least squares solution: its value for α is too small at all noise levels. This principled approach just does not apply to nonlinear models because the theorems behind the principle of least squares repeatedly assume bell-shaped distributions.”
    Leonard A. Smith, Chaos: A Very Short Introduction

  • #23
    Greg Egan
    “Imagine a universe entirely without structure, without shape, without connections. A cloud of microscopic events, like fragments of space-time … except that there is no space or time. What characterizes one point in space, for one instant? Just the values of the fundamental particle fields, just a handful of numbers. Now, take away all notions of position, arrangement, order, and what’s left? A cloud of random numbers.”
    Greg Egan, Permutation City

  • #24
    David Spiegelhalter
    “Probability theory naturally comes into play in what we shall call situation 1: When the data-point can be considered to be generated by some randomizing device, for example when throwing dice, flipping coins, or randomly allocating an individual to a medical treatment using a pseudo-random-number generator, and then recording the outcomes of their treatment. But in practice we may be faced with situation 2: When a pre-existing data-point is chosen by a randomizing device, say when selecting people to take part in a survey. And much of the time our data arises from situation 3: When there is no randomness at all, but we act as if the data-point were in fact generated by some random process, for example in interpreting the birth weight of our friend’s baby.”
    David Spiegelhalter, The Art of Statistics: Learning from Data

  • #25
    Pedro Domingos
    “Our search for the Master Algorithm is complicated, but also enlivened, by the rival schools of thought that exist within machine learning. The main ones are the symbolists, connectionists, evolutionaries, Bayesians, and analogizers. Each tribe has a set of core beliefs, and a particular problem that it cares most about. It has found a solution to that problem, based on ideas from its allied fields of science, and it has a master algorithm that embodies it.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #26
    Pedro Domingos
    “Evolutionaries and connectionists have something important in common: they both design learning algorithms inspired by nature. But then they part ways. Evolutionaries focus on learning structure; to them, fine-tuning an evolved structure by optimizing parameters is of secondary importance. In contrast, connectionists prefer to take a simple, hand-coded structure with lots of connections and let weight learning do all the work. This is machine learning’s version of the nature versus nurture controversy, and there are good arguments on both sides.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #27
    Brian  Christian
    “If you want to be a good intuitive Bayesian—if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate—you need to protect your priors. Counterintuitively, that might mean turning off the news.”
    Brian Christian, Algorithms to Live By: The Computer Science of Human Decisions

  • #28
    Jordan Ellenberg
    “In the Bayesian framework, how much you believe something after you see the evidence depends not just on what the evidence shows, but on how much you believed it to begin with.”
    Jordan Ellenberg, How Not to Be Wrong: The Power of Mathematical Thinking

  • #29
    Pedro Domingos
    “For a Bayesian, in fact, there is no such thing as the truth; you have a prior distribution over hypotheses, after seeing the data it becomes the posterior distribution, as given by Bayes’ theorem, and that’s all.”
    Pedro Domingos, The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

  • #30
    Amit Ray
    “The beauty of quantum machine learning is that we do not need to depend on an algorithm like gradient descent or convex objective function. The objective function can be nonconvex or something else.”
    Amit Ray, Quantum Computing Algorithms for Artificial Intelligence



Rss
« previous 1