Counting On LLMs
- Ritvik Mandyam
- Dec 15
- 2 min read
I spend WAY too much time on Reddit, and recently, I've been noticing an interesting type of post on the ChatGPT subreddit: people are asking GPT 5.2 how many Rs are in "garlic".

Way back when (i.e. mid-2024), people used to do a similar trick where they asked GPT 4o how many Rs were in "strawberry", and they got similar results - both these LLMs would regularly come back with the wrong answer (apparently, the Venn diagram of people who like to cook and people who like to mess with LLMs is a circle). People like to use this as proof that LLMs are stupid, but I think it's actually a little more interesting than that. See, I tried this test myself, and GPT 5.2 told me garlic has 2 Rs, but I did a little more analysis. Once I got that answer, I asked it to actually WRITE OUT the individual letters in the word, and then count how many times each letter appeared. The LLM immediately corrected itself and told me that "garlic" actually has only ONE R. Interesting, right? Why would an LLM give you two different answers to the same question?

The answer, I think, lies in what Anthropic has recently been calling "circuits". LLMs actually AREN'T magical arbiters of universal truth - they're HUGE collections of weights that have been tuned to give you an answer which "feels" right - this is why they're also notorious for occasionally producing output that is coherent but factually completely untrue. When you ask the LLM how many Rs are in garlic, the LLM doesn't go actually see the letters "G A R L I C" - it sees some numbers which represent the word "garlic". Now, garlic is a pretty common word - one the LLM has seen a bunch - so it doesn't go, "Let's break out the counting circuitry." No, instead, it basically just... Guesses. When you explicitly ASK it to write out the letters, though, it actually DOES bring out the counting circuitry, and comes to the correct answer. In a way, it's like Daniel Kahneman's "System 1 and System 2" - turns out the LLMs generally operate on System 1, too, but you can PROMPT them into activating System 2.
This is also why paradigms like LLM-as-a-judge make sense, and why people are so obsessed with the idea of using an LLM to judge ITS OWN work - there are different circuits that are activated when the LLM is CRITIQUING its own output than when it's GENERATING it in the first place. Just make sure you start a new conversation first - LLMs are also prone to believing their work is beyond reproach. I wonder where they get that from.




Comments