Updates!
(1) My 8-year-old son asked me last week, “daddy, did you hear that GPT-5 is now out?” So yes, I’m indeed aware that GPT-5 is now out! I’ve just started playing around with it. For detailed reports on what’s changed and how impressive it is compared to previous models, see for example Zvi #1, #2, #3. Briefly, it looks like there are major reductions in hallucinations and sycophancy, and improvements in practical usefulness for coding and other tasks, even while the “raw intelligence” is unlikely to blow away someone who was already well-acquainted with o3 and Opus 4 other state-of-the-art models, the way ChatGPT and then GPT-4 blew away people who had no idea what was possible in late 2022 and early 2023. Partly how impressive a result you see depends on which of several GPT-5 models your query gets routed to, which you don’t entirely control. Anyway, there’s grist here for the people who claim that progress toward AGI is slowing down, but also grist for the people who claim that it continues pretty much as expected within our post-ChatGPT reality!
(2) In other belated news, OpenAI and DeepMind (and then, other companies) announced that they achieved Gold Medal performance on the International Math Olympiad, by solving 5 of the 6 problems (there was one problem, the 6th and hardest, that all of the AIs struggled with). Most importantly, this means that I’ve won $100 from my friend Ernest Davis, AI expert at NYU, who bet me $100 that no AI would earn a Gold Medal at the International Math Olympiad by December 4, 2026. Even though I’m normally risk-averse and reluctant to take bets, I considered this one to be extremely safe, and indeed I won it with more than a year to spare.
(3) I’ve signed an open letter to OpenAI, along with many of my fellow former OpenAI employees as well as distinguished scientists and writers (Geoffrey Hinton, Stuart Russell, Sheldon Glashow, Sean Carroll, Matt Yglesias…), asking for more transparency about OpenAI’s continuing efforts to change its own structure. The questions basically ask OpenAI to declare, in writing, whether it has or hasn’t now completely abandoned the original nonprofit goals with which the organization was founded in 2015.
(4) At Lighthaven, the rationalist meeting space in Berkeley that I recently visited (and that our friend Cade Metz recently cast aspersions on in the New York Times), there’s going to be a writer’s residency called Inkhaven for the whole month of November. The idea—which I love—is that you either write a new blog post every day, or else you get asked to leave (while you also attend workshops, etc. to improve your writing skills). I’d attend myself for the month if teaching and family obligations didn’t conflict; someone standing over me with a whip to make me write is precisely what I need these days! As it is, I’m one of the three advisors to Inkhaven, along with Scott Alexander and Gwern, and I’ll be visiting for a long weekend to share my blogging wisdom, such as I have. Apply now if you’re interested!
(5) Alas, the Springer journal Frontiers of Computer Science has published a nonsense paper, entitled “SAT requires exhaustive search,” claiming to solve (or dissolve, or reframe, or something) the P versus NP problem. It looks indistinguishable from the stuff I used to get in my inbox every week—and now, in the ChatGPT era, get every day. That this was published indicates a total breakdown of the peer review process. Worse, when Eric Allender, Ryan Williams, and others notified the editors of this, asking for the paper to be retracted, the editor-in-chief declined to do so: see this guest post on Lance’s blog for a detailed account. As far as I’m concerned, Frontiers of Computer Science has now completely discredited itself as a journal; publication there means nothing more than publication in viXra. Minus 10 points for journals themselves as an institution, plus 10 points for just posting stuff online and letting it be filtered by experts who care.
(6) Uma Girish and Rocco Servedio released an arXiv preprint called Forrelation is Extremally Hard. Recall that, in the Forrelation problem, you’re given oracle access to two n-bit Boolean functions f and g, and asked to estimate the correlation between f and the Fourier transform of g. I introduced this problem in 2009, as a candidate for an oracle separation between BQP and the polynomial hierarchy—a conjecture that Ran Raz and Avishay Tal finally proved in 2018. What I never imagined was that Forrelation could lead to an oracle separation between EQP (that is, Exact Quantum Polynomial Time) and the polynomial hierarchy. For that, I thought you’d need to go back to the original Recursive Fourier Sampling problem of Bernstein and Vazirani. But Uma and Rocco show, using “bent Boolean functions” (get bent!) and totally contrary to my intuition, that the exact (zero-error) version of Forrelation is already classically hard, taking Ω(2n/4) queries by any randomized algorithm. They leave open whether exact Forrelation needs ~Ω(2n/2) randomized queries, which would match the upper bound, and also whether exact Forrelation is not in PH.
(7) The Google quantum group, to little fanfare, published a paper entitled Constructive interference at the edge of quantum ergodic dynamics. Here, they use their 103-qubit superconducting processor to measure Out-of-Time-Order Correlators (OTOCs) in a many-body scrambling process, and claim to get a verifiable speedup over the best classical methods. If true, this is a great step toward verifiable quantum supremacy for a useful task, for some definition of “useful.”
(8) Last night, on the arXiv, the team at USTC in China reported that it’s done Gaussian BosonSampling with 3,050 photons and 8,176 modes. They say that this achieves quantum supremacy, much more clearly than any previous BosonSampling demonstration, beating (for example) all existing simulations based on tensor network contraction. Needless to say, this still suffers from the central problem of all current sampling-based quantum supremacy experiments, namely the exponential time needed for direct classical verification of the outputs.
Scott Aaronson's Blog
- Scott Aaronson's profile
- 126 followers
