If I program something to always reply “2” when you ask it “how many [thing] in [thing]?” It’s not really good at counting. Could it be good? Sure. But that’s not what it was designed to do.
Similarly, LLMs were not designed to count things. So it’s unsurprising when they get such an answer wrong.
Eh
If I program something to always reply “2” when you ask it “how many [thing] in [thing]?” It’s not really good at counting. Could it be good? Sure. But that’s not what it was designed to do.
Similarly, LLMs were not designed to count things. So it’s unsurprising when they get such an answer wrong.
the ‘I’ in LLM stands for intelligence
I can evaluate this because it’s easy for me to count. But how can I evaluate something else, how can I know whether the LLM ist good at it or not?