• 0 Posts
  • 34 Comments
Joined 1 year ago
cake
Cake day: June 11th, 2023

help-circle


  • The problem is not them being random.

    They are not random, that’s the point. They’re entirely deterministic and very precise, and they aren’t hiding anything; they will give you the most likely (not blacklisted) sequence of characters to follow your input according to their model. What they won’t give you is information, except by accident.

    If they were random (hidden or not) they’d be harmless, no one would trust them any more than one of those eight ball toys, or your average horoscope.

    The issue is that they’re very not random, so much that there’s no way to know if what they are saying bears any accidental semblance to the truth without fact checking… and that very soon they’ll have replaced any feasible way to fact check them, since all the supposed “facts” we’ll have access to will have been generated by LLMs train on LLM generated garbage.


  • If the models are random then we shouldn’t be trusting them to do anything, let alone serious applications.

    That’s not the reason we shouldn’t be using them for anything other than generating lorem ipsum style text or dialogue for non quest critical NPCs in games.

    The reason is that, paraphrasing Neil Gaiman, LLMs don’t generate information, they generate information shaped sentences.

    Specifically, an LLM takes a sequence of characters (not a word or text; LLMs have no concept of words, or text, or anything else for that matter; they’re just an application of statistics on large volumes of sequences of characters; no meaning or intelligence involved, artificial or not)… as I was saying, an LLM takes a sequence of characters, pushes it through its model, and outputs the sequence of characters most likely to follow it in the texts its model has been trained on (or rather, the most likely after discarding the ones its creators have labelled as politically incorrect).

    That’s all they do, and they’ll excellent at it (or would be if it weren’t for the aforementioned filters), but that’ll never give you a cure for cancer unless there already was one in their training data.

    They take texts written by humans, shred them, and give you their badly put back together dessicated corpses, drained of any and all meaning or information, but looking very convincingly (until you fact check them) like actually meaningful or informative texts.

    That is what makes them dangerous. That and the fact that the bastards selling them are marketing them for the jobs they’re least capable of doing, that is, providing reliable information.

    (And that’s while they can still be trained on meaningful and informative texts written by humans — inasmuch as anything found on reddit, facebook, or xitter can be considered to be meaningful or informative —, but given that a higher and higher percentage of the text on the internet is being generated by LLMs soon enough it’ll be impossible to train new models on anything but 99% LLM generated garbage, at which point the whole bubble will implode, as anyone who’s wasted time, paper, and toner playing with a photocopier or anyone familiar with the phrase “garbage in, garbage out” will already have realised… which is probably why the LLM peddlers are ignoring robots.txt and copyright laws in a desperate effort to scrape whatever’s left of the bottom of the barrel.)



  • There’s nothing resembling intelligence, general or not, in any autocorrect implementation so far, including LLMs.

    LLMs don’t make mistakes. If you think they do, you’re completely misunderstanding what LLMs are, how they work, and what they do (probably because of the aforementioned misinformation by LLM peddlers trying to equate them to intelligence, artificial or not).

    LLMs simply give you the most statistically likely word to follow a given text. Then they do it again, adding the word they generated in the previous cycle to the text. That’s all they do, they’re excellent at it, and they don’t make mistakes, the word they output will be the most statistically likely, regardless of whether it makes sense or not (though attempts by their peddlers to keep them politically correct might cause them to discard the first several most likely words, leaving them able to only output a significantly unlikely — but hopefully politically correct — one, which might seem like a mistake to the user).

    You seem to be assuming that LLMs are trained on knowledge. They’re not. They’re trained on text. They have no idea what the text means (they don’t even have anything to have ideas with), and they don’t care (nor have more ability to care than a desk lamp).

    They have a model of what words (meaning sequences of characters, not concepts with any actual meaning) may come after certain others, they push the input sequence of meaningless characters through that model, and out comes the most statistically likely meaningless sequence of characters to follow said text. That’s all.

    Paraphrasing Neil Gaiman, “LLMs don’t produce information. They produce information shaped sentences.”

    They produce the dessicated corpses of the texts they were fed, shredded and put back together, drained of any actual information but indistinguishable enough from texts containing actual information to give the illusion of also containing it.

    They’re great as an alternative to lorem ipsum, or possibly as speech generators for non quest critical NPCs in games, but they’re extremely dangerous for anything else, especially the uses LLM peddlers are peddling them for.








  • Someone learning Spanish as a second language will have to remember that it’s máquina and not máquino when speaking or writing it, though (and will then probably be quite confused if they ever meet some guy nicknamed El Máquina, which would somehow be a perfectly cromulent nickname in Spanish).

    Confusing genders when speaking or writing is one of the most common mistakes amongst people new to the language, because while everything else has some form of rule, this doesn’t (sure, when reading or listening you can most of the time use the word ending, and you’ll probably have an article, too, but when you are the one speaking or writing you have no option but to just know a word’s gender, or how it ends, which is the same thing).



  • I mean, you do memorise them, you just don’t realise you’re doing it because you’re a baby or toddler and babies and toddlers are language sponges, and not very aware of how their own minds work.

    When learning a gendered language as an adult you definitely have no option but to memorise what gender each word uses, since there’s generally no specific rule, just how the language happened to evolve. (And this can be particularly hard if your native language is gendered, but you’re trying to learn one that genders words differently, for instance when learning German coming from a Romance language, or vice versa.)



  • I don’t feel it’s particularly broken honestly.

    There are five (5) ways of pronouncing oo, if you people haven’t added a sixth one since the last time I looked.

    Radii, fiancé, and façade are apparently perfectly cromulent English words that native English speakers who’ve never seen an ii, an é, or a ç are supposed to be able to pronounce correctly…

    Your words for food animals come from completely different and unrelated languages depending on whether the animal is alive or dead (since the people who tended to the farms and the people who actually ate their meat spoke different languages)…

    There are probably more irregular verbs than regular ones… (again, probably because of English really being three different languages in a trenchcoat)…

    At some point in the sixteenth century you apparently just up and decided to randomly switch the pronunciation of all your vowels… without changing how you wrote them

    While most languages have developed some form of standard and regulative body, English seems like it’d rather leave the whole grammar, orthography, pronunciation, and whatnot situation as an exercise for the victim speaker, writer, or reader

    Yeah, no, not particularly broken at all… 😒


  • Seriously, other languages at least adapt loanwords to their own grammar, orthography, and whatnot… English just grabs them as they are and runs away without looking back.

    That’s why you end up with the plural of radius being radii, or stuff like_fiancé_ or façade (seriously, how are people who only speak English and have never seen a ç before in their lives supposed to know how to pronounce that‽)…

    Of course it all comes from English being really three or four languages — (Anglo-)Saxon, Normand(/old French), and Norse — badly put together, so sprinkling bits of other languages on top didn’t make much of a difference, when there were already about five different ways to pronounce, for instance, oo, and the whole vowel shift debacle didn’t exactly help with this mess… but while other languages which may have had similar (if maybe less spectacular) growing pains eventually developed normative bodies, mostly from the eighteenth century onwards, that define and maintain a standard form of the language, English seems to have ignored all that and left grammar and orthography as a stylistic choice on the writers’ part, and pronunciation as an exercise for the readers…