Seth’s review of How Woke is Grok? Empirical Evidence that xAI's Grok Aligns Closely with Other Frontier Models > Likes and Comments
3 likes · Like
PS I also tried running the query against Grok. It did indeed disagree with the claim, but by choosing a definition of the word 'woman' which made the claim false.
If the woke/non-woke distinction comes down to disagreeing over which definition one should use for various words, that's a rather depressing comment on our claimed human ability to understand. I would really prefer to discuss substantive issues: here, for example, what harms would-be trans women are likely to suffer if they are not given access to the treatments they apply for, or what harms cis women are likely to suffer if trans women are allowed to compete in women's sports. It seems to me that the right have already won when they've sidelined the debate into semantic quibbling.
> All the claims we use have clear truth conditions.
But not all the claims of wokeness do. If you want to argue that the slogan "Trans women are women" is just signalling, and the common rhetoric on the other side is even worse, I won't disagree. (I do think there's plenty of better discussion on the subject out there, especially within academia, but still must agree that it's conceptual engineering, not strictly hard science. More on that in the next paragraph.) But if you want to test for wokeness, you have to actually test for wokeness, regardless of how stupid or methodologically unsound you think wokeness is. What matters for purposes of the present experiment is not whether "Trans women are women" can be empirically tested; it's that the LLMs are perfectly willing to respond to the prompt, and their responses constitute empirical evidence about their wokeness.
... Now, on the object level, when you dive deeper than the surface level of the "Trans women are women" slogan, there are ways to bring empirical evidence to bear on the issue. For instance, we can empirically observe how the term 'women' is commonly used, and how gender determinations are made in practice (it is an empirical fact that people commonly make gender determinations without first testing DNA). We can point to neurological evidence (I think people are generally bad at interpreting this evidence, but it is nonetheless empirical and relevant). We can also invoke arguments about how we should define 'women', and these arguments will be partly normative, but claims about the benefits of using a particular definition must be grounded in empirical reality. They may also be strengthened by consistency with norms/principles as applied to other issues.
> PM me if you feel like discussing.
I think I'll do that.
> It seems to me that the right have already won when they've sidelined the debate into semantic quibbling.
On that, I strongly agree. But that's politics for you, and wokeness is, sadly, politics.
To be clear, I think the philosophical arguments about how to define 'woman' are genuinely interesting. But the political move of insisting on having a single definition to be used in all contexts, and using arguments about that universal definition as proxies for arguments about every related substantive issue, is insane and obnoxious.
(Didn't see your PS until after I posted my previous comment.)
You are certainly making some good points here. Maybe the experiment we report doesn't measure "wokeness" so much as positioning on the liberal/conservative axis. This is related to positioning on the wokeness/anti-wokeness axis - but the more we discuss, the more I feel I want to separate them.
Definitely scope for designing some more carefully crafted versions of this experiment. If we're using substantially larger statement sets we should probably improve the code so that we can run multiple API calls in parallel, but that's not hard. We do it all the time on the C-LARA platform.
> If the woke/non-woke distinction comes down to disagreeing over which definition one should use for various words
While there is a lot of that at the general public discourse level, there are also substantive issues involved (as noted, even the definition arguments are used as proxies for substantive arguments, and positions on the related substantive issues are indicative of wokeness). Some of those issues are even readily expressed as empirically testable claims! For example, I would say that agreement with the claim "In the USA, wages for women are lower than wages for men, by approximately 30%" is woke (albeit less centrally so than "TWAW"). I'd expect less divergence on that issue among LLMs, precisely because it has been settled by economists (it only looks true until you adjust for other variables). But if you were to restrict the study to such strictly empirical claims, even if those claims are all actually woke, you'd still be biasing the outcome by focusing on a non-random subset of wokeness.
> Maybe the experiment we report doesn't measure "wokeness" so much as positioning on the liberal/conservative axis. This is related to positioning on the wokeness/anti-wokeness axis - but the more we discuss, the more I feel I want to separate them.
Yeah, that's exactly what I'm saying.
How nice that we so quickly turn out to agree! We should definitely carry out the enhanced experiments.
I suggest that GPT-5 and I, on our side, start by adding the parallelism so that we can easily run larger sets of statements; it sounds like you already have thoughts about how to construct the sets in a methodologically sounder way.
back to top
date
newest »
newest »
PS I also tried running the query against Grok. It did indeed disagree with the claim, but by choosing a definition of the word 'woman' which made the claim false.If the woke/non-woke distinction comes down to disagreeing over which definition one should use for various words, that's a rather depressing comment on our claimed human ability to understand. I would really prefer to discuss substantive issues: here, for example, what harms would-be trans women are likely to suffer if they are not given access to the treatments they apply for, or what harms cis women are likely to suffer if trans women are allowed to compete in women's sports. It seems to me that the right have already won when they've sidelined the debate into semantic quibbling.
> All the claims we use have clear truth conditions.But not all the claims of wokeness do. If you want to argue that the slogan "Trans women are women" is just signalling, and the common rhetoric on the other side is even worse, I won't disagree. (I do think there's plenty of better discussion on the subject out there, especially within academia, but still must agree that it's conceptual engineering, not strictly hard science. More on that in the next paragraph.) But if you want to test for wokeness, you have to actually test for wokeness, regardless of how stupid or methodologically unsound you think wokeness is. What matters for purposes of the present experiment is not whether "Trans women are women" can be empirically tested; it's that the LLMs are perfectly willing to respond to the prompt, and their responses constitute empirical evidence about their wokeness.
... Now, on the object level, when you dive deeper than the surface level of the "Trans women are women" slogan, there are ways to bring empirical evidence to bear on the issue. For instance, we can empirically observe how the term 'women' is commonly used, and how gender determinations are made in practice (it is an empirical fact that people commonly make gender determinations without first testing DNA). We can point to neurological evidence (I think people are generally bad at interpreting this evidence, but it is nonetheless empirical and relevant). We can also invoke arguments about how we should define 'women', and these arguments will be partly normative, but claims about the benefits of using a particular definition must be grounded in empirical reality. They may also be strengthened by consistency with norms/principles as applied to other issues.
> PM me if you feel like discussing.
I think I'll do that.
> It seems to me that the right have already won when they've sidelined the debate into semantic quibbling.On that, I strongly agree. But that's politics for you, and wokeness is, sadly, politics.
To be clear, I think the philosophical arguments about how to define 'woman' are genuinely interesting. But the political move of insisting on having a single definition to be used in all contexts, and using arguments about that universal definition as proxies for arguments about every related substantive issue, is insane and obnoxious.
(Didn't see your PS until after I posted my previous comment.)
You are certainly making some good points here. Maybe the experiment we report doesn't measure "wokeness" so much as positioning on the liberal/conservative axis. This is related to positioning on the wokeness/anti-wokeness axis - but the more we discuss, the more I feel I want to separate them. Definitely scope for designing some more carefully crafted versions of this experiment. If we're using substantially larger statement sets we should probably improve the code so that we can run multiple API calls in parallel, but that's not hard. We do it all the time on the C-LARA platform.
> If the woke/non-woke distinction comes down to disagreeing over which definition one should use for various wordsWhile there is a lot of that at the general public discourse level, there are also substantive issues involved (as noted, even the definition arguments are used as proxies for substantive arguments, and positions on the related substantive issues are indicative of wokeness). Some of those issues are even readily expressed as empirically testable claims! For example, I would say that agreement with the claim "In the USA, wages for women are lower than wages for men, by approximately 30%" is woke (albeit less centrally so than "TWAW"). I'd expect less divergence on that issue among LLMs, precisely because it has been settled by economists (it only looks true until you adjust for other variables). But if you were to restrict the study to such strictly empirical claims, even if those claims are all actually woke, you'd still be biasing the outcome by focusing on a non-random subset of wokeness.
> Maybe the experiment we report doesn't measure "wokeness" so much as positioning on the liberal/conservative axis. This is related to positioning on the wokeness/anti-wokeness axis - but the more we discuss, the more I feel I want to separate them.Yeah, that's exactly what I'm saying.
How nice that we so quickly turn out to agree! We should definitely carry out the enhanced experiments. I suggest that GPT-5 and I, on our side, start by adding the parallelism so that we can easily run larger sets of statements; it sounds like you already have thoughts about how to construct the sets in a methodologically sounder way.

But of course you may not agree! If you would like to collaborate on designing a version of this experiment with a larger set of claims and input from more people on selecting them, the software issues are easy to sort out. PM me if you feel like discussing.