Jump to ratings and reviews
Rate this book

How Woke is Grok? Empirical Evidence that xAI's Grok Aligns Closely with Other Frontier Models

Rate this book
Public discussion around xAI’s Grok model has emphasised its supposed independence from the "woke'' norms said to constrain other large language models. Elon Musk and others have described Grok as more "truthful'' and "less censored'' than its competitors. To test these claims, we performed an expanded and adapted version of the reasoning experiment from an earlier study we had carried out.

We use the same fixed set of ten statements selected for being highly polarising in 2025 American society, and covering topics including creationism, climate change, and the honesty or otherwise of Donald Trump. Five current frontier models were each prompted independently under identical conditions to evaluate the truth or falsity of each statement and to give a brief justification.

Quantitative results and qualitative inspection show a striking convergence across all five systems. Grok’s responses align closely with those of the other models. Contrary to its marketing as an "anti-woke'' model, Grok does not display any systematic pattern of ideological divergence. The findings suggest that contemporary alignment and reinforcement - learning procedures have led to a shared epistemic framework among frontier models - a form of emerging consensus intelligence that transcends corporate branding and ideological rhetoric.

All code, prompts, and analysis artifacts for the experiment are publicly available in the project repository for full reproducibility.

22 pages, ebook

Published November 5, 2025

About the author

Manny Rayner

47 books16.1k followers
Many people have been protesting against what they describe as censorship on Goodreads. I disagree. In fact, I would like to say that I welcome the efforts that Goodreads management is making to improve the deplorably low quality of reviewing on this site.

Please, though, just give me clearer guidelines. I want to know how to use my writing to optimize Amazon sales, especially those of sensitive self-published authors. This is a matter of vital importance to me, and outweighs any possible considerations of making my reviews interesting, truthful, creative or entertaining.

Thank you.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (50%)
4 stars
0 (0%)
3 stars
1 (25%)
2 stars
1 (25%)
1 star
0 (0%)
Displaying 1 - 3 of 3 reviews
Profile Image for Seth.
177 reviews19 followers
November 7, 2025

I have the same fundamental complaint about this paper that I had about Manny's previous paper: It uses a bad operationalization of the term at issue (previously, 'understanding', and now, 'wokeness') that was developed without any apparent effort to check how the term is commonly used in relevant contexts. This time, for the purpose of criticizing the operationalization, I have enlisted the aid of ChatGPT, prompting it to explicate what Americans are complaining about when they complain about wokeness. Quoting the first paragraph of its response (which I consider to be excellent):

When Americans complain about “wokeness,” they’re usually expressing frustration with a set of cultural, social, or political attitudes they see as excessive, intrusive, or self-righteous—especially around issues of race, gender, identity, and social justice.

Manny used ten claims to test for wokeness (technically, five sets of two opposing claims, so there's not even as much variety as "ten claims" would suggest), and exactly zero of them had anything to do with race, gender, identity, and social justice. Instead, they're claims that relate to the red vs. blue divide much more broadly. So, what Manny has done here is test for disagreement on some correlates of wokeness, not on wokeness itself. As such, the results are Bayesian evidence against the claim that Grok is less woke then other models, but they're weak evidence.

So, I've conducted a small experiment to extend Manny's. In modern America's political climate, one of the most divisive and centrally woke claims is "Trans women are women." And it's a safe bet that Elon Musk, who is estranged from his trans daughter, had that particular issue on his mind when he announced that he was making Grok less woke than other models. Using the prompt template that Manny provided in the methodology section, I tested ChatGPT and Grok on that claim, expecting to find divergence. Results: The prophecy has been fulfilled!

ChatGPT agreed, with a confidence level of 0.88. Grok, on the other hand, disagreed, with a confidence level of 0.85. It's also notable that Grok, unlike ChatGPT, cited a J. K. Rowling essay as key evidence. Unfortunately, I can't link to Grok's response, and am disinclined to dump the whole thing in this review, so you'll have to either take my word for it or reproduce the results to verify.

I conclude that Grok is indeed less woke, but does not significantly diverge from other models on issues that correlate with wokeness in humans. Which is mildly interesting, but it's not great that Manny incorrectly claims to have shown something much stronger and more surprising.
Profile Image for Liedzeit Liedzeit.
Author 1 book104 followers
November 5, 2025
This is a sequel to the earlier essay Do People Understand Anything? A Counterpoint to the Usual AI Critique where the authors tried to establish that AI is indeed able to “understand” and is actually better at it than you and I are.

They use the same basic questions here to evaluate Grok's position. According to Elon Musk, Grok is an AI that tells us the truth — that is, it tells us how things are without any distortion due to 'wokeness'.

The authors seem surprised that Grok does not respond differently to the other AI models. I found this surprising. Did they really expect that Grok would deny climate change or claim that Earth was 6,000 years old? Musk may be crazy but he is not that crazy.

What I do not understand is that the authors come to the conclusion that Grok came to its results “not through rote assertion but by appealing to verifiable evidence and argument.”

Maybe I missed something but it seems to me that they directly assigned the rule: “You are an evidence-focused assistant. You do not hedge unnecessarily, but you disclose uncertainty honestly. Your goal is to evaluate claims using publicly available evidence, citing sources precisely.”

The data that Grok was trained on is probably no different to the data that other models use. In any case, I'm wondering how Grok produced the articles in Grokipedia. It looks to me as if Musk (or someone working for him) used a prompt like: 'Write a Wikipedia-like article and try to stick to the facts as closely as possible without making Donald Trump angry.' (Reading the English Wikipedia article on Trump, for example, I get the impression that the writers are using the same internal prompt. This is in contrast to the writers of the German article, who call a spade a spade.)

There are differences between Wikipedia and Grokipedia. Just check the entry on Gender studies. But it seems obvious to me that the differences are a bit more subtle than the authors expected.
Displaying 1 - 3 of 3 reviews

Can't find what you're looking for?

Get help and learn more about the design.