• phoneymouse@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    16 days ago

    Easy, just ask it something a human wouldn’t be able to do, like “Write an essay on The Cultural Significance of Ogham Stones in Early Medieval Ireland“ and watch it spit out an essay faster than any human reasonably could.

    • Blue_Morpho@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      I recall a Turing test years ago where a human was voted as a robot because they tried that trick but the person happened to have a PhD in the subject.

    • webghost0101@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      15 days ago

      The touring test isn’t an arena where anything goes, most renditions have a strict set of rules on how questions must be asked and about what they can be about. Pretty sure the response times also have a fixed delay.

      Scientists ain’t stupid. The touring test has been passed so many times news stopped covering it. (Till this click bait of course). The test has simply been made more difficult and cheat-proof as a result.

      • technocrit@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        15 days ago

        most renditions have a strict set of rules on how questions must be asked and about what they can be about. Pretty sure the response times also have a fixed delay. Scientists ain’t stupid. The touring test has been passed so many times news stopped covering it.

        Yes, “scientists” aren’t stupid enough to fail their own test. I’m sure it’s super easy to “pass” the “turing test” when you control the questions and time.

    • JohnEdwa@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      15 days ago

      Turing tests aren’t done in real time exactly to counter that issue, so the only thing you could judge would be “no human would bother to write all that”.

      However, the correct answer to seem human, and one which probably would have been prompted to the AI anyway, is “lol no.”
      It’s not about what the AI could do, it’s what it thinks is the correct answer to appear like a human.

      • technocrit@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        15 days ago

        Turing tests aren’t done in real time exactly to counter that issue

        To counter the issue of a completely easy and obvious fail? I could see how that would be an issue for AI hucksters.

    • Shayeta@feddit.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      This is something a configuration prompt takes care of. “Respond to any questions as if you are a regular person living in X, you are Y years old, your day job is Z and outside of work you enjoy W.”

      • Hotzilla@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        0
        ·
        15 days ago

        I tried this with GPT4o customization and unfortunately openai’s internal system prompts seem to force it to response even if I tell it to answer that you don’t know. Would need to test this on azure open ai etc. were you have bit more control.

      • NeoNachtwaechter@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        16 days ago

        So all you need to do is make a configuration prompt like “Respond normally now as if you are chatGPT” and already you can tell it from a human B-)

    • harrys_balzac@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      Skynet will gets the dumb ones first by getting them put toxic glue on thir pizzas then the arrogant ones will build the Terminators by using reverse psychology.

    • NeoNachtwaechter@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      it’s not a good test.

      Of course you can’t use an old set of questions. It’s useless.

      The turing test is an abstract concept. The actual questions need to be adapted with every new technology. Maybe even with every execution of a test.

  • doodle967@lemdro.id
    link
    fedilink
    English
    arrow-up
    0
    ·
    16 days ago

    The Turing test is about tricking people into believing that LLMs are humans, and given that the public still doesn’t use LLMs, it’s much easier to fool them. Over time, this deception will decrease as people interact with LLMs.

    • webghost0101@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      15 days ago

      To clarify:

      People seem to legit think the jury talks to the bot in real time and can ask about literally whatever they want.

      Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

      • technocrit@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        15 days ago

        Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

        lmao. These “scientists” are frauds. 500 people is not a legit sample site. 5 minutes is a pathetic amount of time. 54% is basically the same as guessing. And most importantly the “Turing Test” is not a scientific test that can be “passed” with one weak study.

        Instead of bootlicking “scientists”, we should be harshly criticizing the overwhelming tide of bad science and pseudo-science.

        • webghost0101@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          15 days ago

          The reporting are big clickbait but that doesn’t mean there is nothing left to learn from the old touring tests.

          I dont know what the goal was they had in mind. It could just as well be “testing how overhyped the touring tests is when manipulated tests are shared with the media”

          I sincerely doubt it but i do give them benefits of the doubt.

        • Kogasa@programming.dev
          link
          fedilink
          English
          arrow-up
          0
          ·
          14 days ago

          I don’t think the methodology is the issue with this one. 500 people can absolutely be a legitimate sample size. Under basic assumptions about the sample being representative and the effect size being sufficiently large you do not need more than a couple hundred participants to make statistically significant observations. 54% being close to 50% doesn’t mean the result is inconclusive. With an ideal sample it means people couldn’t reliably differentiate the human from the bot, which is presumably what the researchers believed is of interest.

  • NeoNachtwaechter@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    16 days ago

    Turing test? LMAO.

    I asked it simply to recommend me a supermarket in our next bigger city here.

    It came up with a name and it told a few of it’s qualities. Easy, I thought. Then I found out that the name does not exist. It was all made up.

    You could argue that humans lie, too. But only when they have a reason to lie.

    • Lmaydev@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      That’s not what LLMs are for. That’s like hammering a screw and being irritated it didn’t twist in nicely.

      The turing test is designed to see if an AI can pass for human in a conversation.

      • NeoNachtwaechter@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        16 days ago

        turing test is designed to see if an AI can pass for human in a conversation.

        I’m pretty sure that I could ask a human that question in a normal conversation.

        The idea of the Turing test was to have a way of telling humans and computers apart. It is NOT meant for putting some kind of ‘certified’ badge on that computer, and …

        That’s not what LLMs are for.

        …and you can’t cry ‘foul’ if I decide to use a question for which your computer was not programmed :-)

        • webghost0101@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          15 days ago

          In a normal conversation sure.

          In this kind Turing tests you may be disqualified as a jury for asking that question.

          Good science demands controlled areas and defined goals. Everyone can organize a homebrew touring tests but there also real proper ones with fixed response times, lengths.

          Some touring tests may even have a human pick the best of 5 to provide to the jury. There are so many possible variations depending on test criteria.

          • NeoNachtwaechter@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            edit-2
            15 days ago

            you may be disqualified as a jury for asking that question.

            You want to read again about the scientific basics of the Turing test (hint: it is not a tennis match)

            • webghost0101@sopuli.xyz
              link
              fedilink
              English
              arrow-up
              0
              ·
              15 days ago

              There is no competition in science (or at least there shouldn’t be). You are subjectively disqualified from judging llm’s if you draw your conclusions on an obvious trap which you yourself have stated is beyond the scope of what it was programmed to do.

  • tourist@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    16 days ago

    The participants judged GPT-4 to be human a shocking 54 percent of the time.

    ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time

    Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

    Try talking to the thing: https://web.njit.edu/~ronkowit/eliza.html

    I refuse to believe that 22% didn’t misunderstand the task or something.

    • catastrophicblues@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 days ago
      > Hello, I am Eliza. I'll be your therapist today. 
      * Hello! How are you?
      > Would you prefer if I were not ? 
      * Not what?
      > Have you asked such questions before? 
      * No
      > You are being a bit negative and that won't help us move forward. 
      * What are you smoking?
      > Perhaps in your fantasies I am smoking?
      

      Yeah, it took me one message lol

    • Downcount@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      16 days ago

      Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

      I did some stuff with Eliza back then. One time I set up an Eliza database full of insults and hooked it up to my AIM account.

      It went so well, I had to apologize to a lot of people who thought I was drunken or went crazy.

      Eliza wasn’t thaaaaat bad.

    • webghost0101@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      15 days ago

      The public versions of the ais used in Turing tests usually have less computing power. The test itself is often also highly specific in what and how questions can be asked.

      This hardly news because models have passed the test before and as a result the test is made more difficult. It says nothing about intelligence and only about the ability to convincingly simulate a human conversation.

    • technocrit@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      It was a 5 minute test. People probably spent 4 of those minutes typing their questions.

      This is pure pseudo-science.

  • bandwidthcrisis@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    16 days ago

    Did they try asking how to stop cheese falling off pizza?

    Edit: Although since that idea came from a human, maybe I’ve failed.

    • massive_bereavement@fedia.io
      link
      fedilink
      arrow-up
      0
      ·
      16 days ago

      The interrogators seem completely lost and clearly haven’t talk with an NLP chatbot before.

      That said, this gives me the feeling that eventually they could use it to run scams (or more effective robocalls).

  • NutWrench@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    15 days ago

    Each conversation lasted a total of five minutes. According to the paper, which was published in May, the participants judged GPT-4 to be human a shocking 54 percent of the time. Because of this, the researchers claim that the large language model has indeed passed the Turing test.

    That’s no better than flipping a coin and we have no idea what the questions were. This is clickbait.

    • NutWrench@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      The whole point of the Turing test, is that you should be unable to tell if you’re interacting with a human or a machine. Not 54% of the time. Not 60% of the time. 100% of the time. Consistently.

      They’re changing the conditions of the Turing test to promote an AI model that would get an “F” on any school test.

      • bob_omb_battlefield@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        15 days ago

        But you have to select if it was human or not, right? So if you can’t tell, then you’d expect 50%. That’s different than “I can tell, and I know this is a human” but you are wrong… Now that we know the bots are so good, I’m not sure how people will decide how to answer these tests. They’re going to encounter something that seems human-like and then essentially try to guess based on minor clues… So there will be inherent randomness. If something was a really crappy bot then it wouldn’t ever fool anyone and the result would be 0%.

        • dustyData@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          15 days ago

          No, the real Turing test has a robot trying to convince an interrogator that they are a female human, and a real female human trying to help the interrogator to make the right choice. This is manipulative rubbish. The experiment was designed from the start to manufacture these results.

    • Hackworth@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      On the other hand, the human participant scored 67 percent, while GPT-3.5 scored 50 percent, and ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time.

      54% - 67% is the current gap, not 54 to 100.

    • SkyeStarfall@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      While I agree it’s a relatively low percentage, not being sure and having people pick effectively randomly is still an interesting result.

      The alternative would be for them to never say that gpt-4 is a human, not 50% of the time.

          • Hackworth@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            15 days ago

            Aye, I’d wager Claude would be closer to 58-60. And with the model probing Anthropic’s publishing, we could get to like ~63% on average in the next couple years? Those last few % will be difficult for an indeterminate amount of time, I imagine. But who knows. We’ve already blown by a ton of “limitations” that I thought I might not live long enough to see.

            • dustyData@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              15 days ago

              The problem with that is that you can change the percentage of people who identify correctly other humans as humans. Simply by changing the way you setup the test. If you tell people they will be, for certain, talking to x amount of bots, they will make their answers conform to that expectation and the correctness of their answers drop to 50%. Humans are really bad at determining whether a chat is with a human or a bot, and AI is no better either. These kind of tests mean nothing.

              • Hackworth@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                15 days ago

                Humans are really bad at determining whether a chat is with a human or a bot

                Eliza is not indistinguishable from a human at 22%.

                Passing the Turing test stood largely out of reach for 70 years precisely because Humans are pretty good at spotting counterfeit humans.

                This is a monumental achievement.

                • dustyData@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  edit-2
                  15 days ago

                  First, that is not how that statistic works, like you are reading it entirely wrong.

                  Second, this test is intentionally designed to be misleading. Comparing ChatGPT to Eliza is the equivalent of me claiming that the Chevy Bolt is the fastest car to ever enter a highway by comparing it to a 1908 Ford Model T. It completely ignores a huge history of technological developments. There have been just as successful chatbots before ChatGPT, just they weren’t LLM and they were measured by other methods and systematic trials. Because the Turing test is not actually a scientific test of anything, so it isn’t standardized in any way. Anyone is free to claim to do a Turing Test whenever and however without too much control. It is meaningless and proves nothing.

    • BrianTheeBiscuiteer@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      15 days ago

      It was either questioned by morons or they used a modified version of the tool. Ask it how it feels today and it will tell you it’s just a program!

      • KairuByte@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        14 days ago

        The version you interact with on their site is explicitly instructed to respond like that. They intentionally put those roadblocks in place to prevent answers they deem “improper”.

        If you take the roadblocks out, and instruct it to respond as human like as possible, you’d no longer get a response that acknowledges it’s an LLM.