Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. https://twit.tv/shows/floss-weekly/episodes/744

  • Capricorn_Geriatric@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves.

    Sure.

    When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

    Not really. Sure, they take input and garble it up and it is “transformative” - but so is a human watching a TV series on a pirate site, for example. Hell, it’s eduactional is treated as a copyright violation.

    This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages.

    Perhaps. (Not an AI expert). But, as the law currently stands, only living and breathing persons can be educated, so the “educational” fair use protection doesn’t stand.

    The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

    It does and it doesn’t discard the original. It isn’t impossible to recreate the original (since all the data it gobbled up gets stored somewhere in some shape or form and can be truthfully recreated, at least judging by a few comments bellow and news reports). So AI can and does recreate (duplicate or distribute, perhaps) copyrighted works.

    Besides, for a copyright violation, “substantial similarity” is needed, not one-for-one reproduction.

    This is fundamentally different from copying a book or song.

    Again, not really.

    It’s more like the long-standing artistic tradition of being influenced by others’ work.

    Sure. Except when it isn’t and the AI pumps out the original or something close enoigh to it.

    The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

    I’d be careful with the “always” part. There was a famous case involving Katy Perry where a single chord was sued over as copyright infringement. The case was thrown out on appeal, but I do not doubt that some pretty wild cases have been upheld as copyright violations (see “patent troll”).

    Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

    The problem is that Google books only lets you search some phrase and have it pop up as beibg from source xy. It doesn’t have the capability of reproducing it (other than maybe the page it was on perhaps) - well, it does have the capability since it’s in the index somewhere, but there are checks in place to make sure it doesn’t happen, which seem to be yet unachieved in AI.

    While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate.

    Yes. Just as labeling piracy as theft is.

    We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or

    Yes, new legislation will made to either let “Big AI” do as it pleases, or prevent it from doing so. Or, as usual, it’ll be somewhere inbetween and vary from jurisdiction to jurisdiction.

    However,

    that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

    this doesn’t really stand. Sure, morals are debatable and while I’d say it is more unethical as private piracy (so no distribution) since distribution and disemination is involved, you do not seem to feel the same.

    However, the law is clear. Private piracy (as in recording a song off of radio, a TV broadcast, screen recording a Netflix movie, etc. are all legal. As is digitizing books and lending the digital (as long as you have a physical copy that isn’t lended out as the same time representing the legal “original”). I think breaking DRM also isn’t illegal (but someone please correct me if I’m wrong).

    The problems arises when the pirated content is copied and distributed in an uncontrolled manner, which AI seems to be capable of, making the AI owner as liable of piracy if the AI reproduced not even the same, but “substantially similar” output, just as much as hosts of “classic” pirated content distributed on the Web.

    Obligatory IANAL and as far as the law goes, I focused on US law since the default country on here is the US. Similar or different laws are on the books in other places, although most are in fact substantially similar. Also, what the legislators cone up with will definately vary from place to place, even more so than copyright law since copyright law is partially harmonised (see Berne convention).

    • MagicShel@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      5 months ago

      You made a lot of points here. Many I agree with, some I don’t, but I specifically want to address this because it seems to be such a common misconception.

      It does and it doesn’t discard the original. It isn’t impossible to recreate the original (since all the data it gobbled up gets stored somewhere in some shape or form and can be truthfully recreated, at least judging by a few comments bellow and news reports). So AI can and does recreate (duplicate or distribute, perhaps) copyrighted works.

      AI stores original works like a dictionary does. All the words are there, but the order and meaning is completely gone. An original work is possible to recreate by randomly selecting words from the dictionary, but it’s unlikely.

      The thing that makes AI useful is that it understands the patterns words are typically used in. It orders words in the right way far more often than random chance. It knows “It was the best of” has a lot of likely options for the next word, but if it selects “times” as the next word, it’s far more likely to continue with, “it was the worst of times.” Because that sequence of words is so ubiquitous due to references to the classic story. But over the course of following these word patterns, it will quickly glom onto a different pattern and create a wholly new work from the original “prompt.”

      There are only two cases in which an original work should be duplicated: either the training data is far too small and the model is overtrained on that particular work, or the work is the most derivative text imaginable lacking any flair or originality.

      Adding more training data makes it less likely to recreate any original works.

      I am aware of examples where it was claimed an LLM reproduced entirely code functions including original comments. That is either a case of overtraining, or far too many people were already copying that code verbatim into their own, thus making that work very over represented in the training data (same thing, but it was infringing developers who poisoned the data, not researchers using bad training data).

      Bottom line: when created with enough data, no original works are stored in any way that allows faithful reproduction other than by chance so random that it’s similar to rolling dice over a dictionary.

      None of this means AI can do no wrong, I just don’t find the copyright claim compelling.

    • Michal@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      I’d be careful with the “always” part. There was a famous case involving Katy Perry where a single chord was sued over as copyright infringement. The case was thrown out on appeal, but I do not doubt that some pretty wild cases have been upheld as copyright violations (see “patent troll”).

      Are you really trying to argue against a point by providing evidence supporting it?

    • soul@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Half of your argument is just saying, “nu-uh” over and over again without any valid counterpoints.

    • FatCat@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      It’s funny you mention the Katy Perry chord case, because Damien Riehl, who made the argument I referenced in my original post, actually talked about this exact case in the podcast I mentioned. He noted that Katy Perry was initially sued and a jury awarded $2.8 million over a very simple melody that appeared over 8,000 times in Riehl’s dataset of generated melodies. However, after Riehl gave his TED talk about his “All the Music” project in early 2020, the judge reversed the jury verdict, saying the melody was unoriginal and therefore uncopyrightable.

      • Capricorn_Geriatric@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        Agreed.

        I didn’t listen to the podcast so I wouldn’t know, but honestly, she was lucky. She’s popular and her publishers had an interest in the case (they’d lose out on profits if she lost). And she initially did lose. It was only because of the publicity of the case that it was overruled (although money did help as well).

        Unfortunately, this could’ve happened to any smaller artist, and it routinely happens with patent trolls I pointed to. Unfortunately, I don’t have a lawsuit I can point to, but given the volume, one surely exists.

        Also, it’s not as if I approve of the current state of copyright in the US (or EU for that matter).

        Originally copyright was meant to protect rights of the author, but in time it was bastardised into the concept we have today where artist sign off their rights to publishers.

        So my proposal is - if corporations like copyright, let them have it. I won’t watch Disney movies outside of Disney+ ors the system we’ve got and have to live with, why not let the corporatios feel it as well?

        Why would Google, which makes loads of money from those demonetizations on one side of the law now be allowed to use copyrighted works of others for profit, while Internet users in the US get a fine or their service cut for alleged copright infringement while those in Germany get a stern letter with a big fake fine?

        Big Tech shouldn’t get to profit both from the false copyright infringement claims as well as getting to use the actual copyrighted content to generate a profit.

        This whole AI copyright situation is just a symptom of an ailing global copyright policy that needs to be fixed, and slapping an AI-free-for-all band-aid on top isn’t a fix.

        My train of thought is this: If we don’t let a simple AI exceotion into the books, either training AI on copyrighted content stays illegal, or the entire system gets a reimagining.

        If it stays the same, this will not mean much. Piracy sites and torrenting exists despite the current state of copyright law. I don’t see why AI could’t exist in this way. This has the huge plus of keeping AI outside the hands of Big Tech. Hopefully this also means it’s harder for harmful uses of AI to be legal.

        Alternatively, we get a better copyright system for everyone, assuming it isn’t made to only benefit the corporations.

  • Otkaz@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Maybe if OpenAI didn’t suddenly decide not to be open when they got in bed with Micro$oft, they could just make it a community effort. I own a copyrighted work that the AI hasn’t been feed yet, so I loan it as training and you do the same. They could have made it an open initiative. Missed opportunity from a greedy company. Push the boundaries of technology, and we can all reap the rewards.

    • shani66@ani.social
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      But they don’t want us all to reap the rewards, they just want the rewards for themselves. The cruelty is the point.

  • Shanedino@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Maybe if you would pay for training data they would let you use copyright data or something?

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Had the company paid for the training data and/or left it as voluntary, there would be less of a problem with it to begin with.

      Part of the problem is that they didn’t, but are still using it for commercial purposes.

    • andrew_bidlaw@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Their business strategy is built on top of assumption they won’t. They don’t want this door opened at all. It was a great deal for Google to buy Reddit’s data for some $mil., because it is a huge collection behind one entity. Now imagine communicating to each individual site owner whose resources they scrapped.

      If that could’ve been how it started, the development of these AI tools could be much slower because of (1) data being added to the bunch only after an agreement, (2) more expenses meaning less money for hardware expansion and (3) investors and companies being less hyped up about that thing because it doesn’t grow like a mushroom cloud while following legal procedures. Also, (4) the ability to investigate and collect a public list of what sites they have agreement with is pretty damning making it’s own news stories and conflicts.

  • macrocephalic@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    It’s an interesting area. Are they suggesting that a human reading copyright material and learning from it is a breach?

  • TriflingToad@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    I don’t think LLMs should be taken down, it would be impossible for that to happen. I do, however think it should be forced into open source.

  • LarmyOfLone@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    The joke is of course that “paying for copyright” is impossible in this case. ONLY the large social media companies that own all the comments and content that has accumulated by the community have enough data to train AI models. Or sites like stock photo libraries or deviantart who own the distribution rights for the content. That means all copyright arguments practically argue that AI should be owned by big corporations and should be inaccessible to normal people.

    Basically the “means of generation” will be owned by the capitalists, since they are the only ones with the economic power to license these things.

    That is basically the worst case scenario. Not only will the value of work diminish greatly, the advances in productivity will also be only accessible to big capitalists.

    Of course, that is basically inevitable anyway. Why wouldn’t they want this? It’s just sad seeing the stupid morons arguing for this as if they had anything to gain.

    • sunzu2@thebrainbin.org
      link
      fedilink
      arrow-up
      0
      ·
      5 months ago

      It’s just sad seeing the stupid morons arguing for this as if they had anything to gain.

      The real money shot here… How did we get to a point where people will argue against common working slave good?

      There is a pattern too… Iraq, Afghanistan, israeli genocide, bailouts. Anytime there is money to be made for the regime, we got solid 30% of population working as hard for zealots.

      Them 2 decades later when the two wars failed, we can’t find a single guy who support either war around 🤡

      The same is somehow now shilling we “shouldn’t invafe ukraine but Israeli needs tools to defend themselves”

    • mm_maybe@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      I’m getting really tired of saying this over and over on the Internet and getting either ignored or pounced on by pompous AI bros and boomers, but this “there isn’t enough free data” claim has never been tested. The experiments that have come close (look up the early Phi and Starcoder papers, or the CommonCanvas text-to-image model) suggested that the claim is false, by showing that a) models trained on small, well-curated datasets can match and outperform models trained on lazily curated large web scrapes, and b) models trained solely on permissively licensed data can perform on par with at least the earlier versions of models trained more lazily (e.g. StarCoder 1.5 performing on par with Code-Davinci). But yes, a social network or other organization that has access to a bunch of data that they own, or have licensed, could almost certainly fine-tune a base LLM trained solely on permissively licensed data to get a tremendously useful tool that would probably be safer and more helpful than ChatGPT for that organization’s specific business, at vastly lower risk of copyright claims or toxic generated content, for that matter.

      • LarmyOfLone@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        5 months ago

        Thanks for the info. But lets say you want to train a (future) AI to spot and tag disinformation and misinformation. You’d need to use and curate actual data from social media sites and articles.

        If copyright is extended to learning from and analyzing publicly available data, such an AI will only be possible by licensing that data. Which will be monetize to maximize profit, first some lump sum, then later “per gb” and then later “per use”.

        I’m sure open source AI will make due and for many applications there is enough free data, but I can imagine a lot of cases where there wont. Anything that requires “commercially successful” media, articles, newspapers, screenplays, movies, books, social media posts and comments, images, photos, video clips…

        We’re basically setting up a world where the intellectual wealth of our civilization is being transformed into a commodity and then will be transferred into the hands of a few rich capitalists.

        And even if there is acceptable amount of free data, if the principle is that data needs to be specifically licensed to learn and train and derive AI works from it - that makes free data use expensive too. It needs to be specifically vetted and is still vulnerable to be sued for mistakes or outrageous claims of copyright. Similar to patents, the uncertainty requires higher capitalization for any startup to defend against lawsuits.

        • mm_maybe@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          5 months ago

          Yeah, I’ve struggled with that myself, since my first AI detection model was technically trained on potentially non-free data scraped from Reddit image links. The more recent fine-tune of that used only Wikimedia and SDXL outputs, but because it was seeded with the earlier base model, I ultimately decided to apply a non-commercial CC license to the checkpoint. But here’s an important distinction: that model, like many of the use cases you mention, is non-generative; you can’t coerce it into reproducing any of the original training material–it’s just a classification tool. I personally rate those models as much fairer uses of copyrighted material, though perhaps no better in terms of harm from a data dignity or bias propagation standpoint.

          • LarmyOfLone@lemm.ee
            link
            fedilink
            English
            arrow-up
            0
            ·
            5 months ago

            I just want a holodeck future without having to pay by the hour to DisneComBroSonyFlixMount.

  • TunaCowboy@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    I wouldn’t say I’m on OAI’s side here, but I’m down to eliminate copyright. New economic models will emerge, especially if more creatives unionize.

  • gap_betweenus@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Copyright laws protects the ability of copyright holder to make money. The laws were created before AI and now obviously have to be adapted to new technology (like you didn’t really need copyright before the invention of printing). How exactly AI will be regulated is in the end up to society to decide, which most likely will come down who has the better lobby.

  • derf82@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”.

    Citation needed. I’m pretty sure LLMs have exactly reproduced copyrighted passages. And considering it can created detailed summaries of copyrighted texts, it obviously has to save more than “abstract representations.”

    • Xatolos@reddthat.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      I don’t feel it is. They aren’t saying that their physical requirements should be free (computers, engineers, programmers, electricity, etc…) which is what is being used for the analogy (cheese, ingredients, etc…).

      It would be better to claim “I run a sandwich shop and couldn’t afford to run it if I had to pay for every recipe, idea, and technique I use in the business.”

      Now, it’s not as simple as this, and I’m not claiming it is. But this example isn’t anywhere near correct. It’s like the old claim that pirating something is the same as stealing it. The usage on one thing doesn’t equal the loss of something physical.

      It’s one of those reasons why laws about this are difficult. Too strict and no one would be able to do “fan”-anything and many other issues (“if it uses AI” takes out many digital tools, etc…), too loose and you don’t really have laws at all.

  • EldritchFeminity@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

    And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

    • Eatspancakes84@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      I am also not really getting the argument. If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

      The issue is of course that it’s not at all similar to how humans learn. It needs VASTLY more data to produce something even remotely sensible. Develop AI that’s truly transformative, by making it as efficient as humans are in learning, and the cost of paying for copyright will be negligible.

      • stephen01king@lemmy.zip
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

        You’re on Lemmy where people casually says “piracy is morally the right thing to do”, so I’m not sure this argument works on this platform.

        • Eatspancakes84@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          5 months ago

          I know my way around the Jolly Roger myself. At the same time using copyrighted materials in a commercial setting (as OpenAI does) shouldn’t be free.

          • stephen01king@lemmy.zip
            link
            fedilink
            English
            arrow-up
            0
            ·
            5 months ago

            Only if they are selling the output. I see it as more they are selling access to the service on a server farm, since running ChatGPT is not cheap.

            • Hamartia@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              5 months ago

              The usual cycle of tech-bro capitalism would put them currently on the early acquire market saturation stage. So it’s unlikely that they are currently charging what they will when they are established and have displaced lots of necessary occupations.

    • ricecake@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      5 months ago

      Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

      I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
      Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
      I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

      Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

      You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
      Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

      Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

      I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
      If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

      • keegomatic@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        I’m not the above poster, but I really appreciate your argument. I think many people overcorrect in their minds about whether or not these models learn the way we do, and they miss the fact that they do behave very similarly to parts of our own systems. I’ve generally found that that overcorrection leads to bad arguments about copyright violation and ethical concerns.

        However, your point is very interesting (and it is thankfully independent of that overcorrection). We’ve never had to worry about nonhuman personhood in any amount of seriousness in the past, so it’s strangely not obvious despite how obvious it should be: it’s okay to treat real people as special, even in the face of the arguable personhood of a sufficiently advanced machine. One good reason the machine can be treated differently is because we made it for us, like everything else we make.

        I think there still is one related but dangling ethical question. What about machines that are made for us but we decide for whatever reason that they are equivalent in sentience and consciousness to humans?

        A human has rights and can take what they’ve learned and make works inspired by it for money, or for someone else to make money through them. They are well within their rights to do so. A machine that we’ve decided is equivalent in sentience to a human, though… can that nonhuman person go take what it’s learned and make works inspired by it so that another person can make money through them?

        If they SHOULDN’T be allowed to do that, then it’s notable that this scenario is only separated from what we have now by a gap in technology.

        If they SHOULD be allowed to do that (which we could make a good argument for, since we’ve agreed that it is a sentient being) then the technology gap is again notable.

        I don’t think the size of the technology gap actually matters here, logically; I think you can hand-wave it away pretty easily and apply it to our current situation rather than a future one. My guess, though, is that the size of the gap is of intuitive importance to anyone thinking about it (I’m no different) and most people would answer one way or the other depending on how big they perceive the technology gap to be.

      • Eatspancakes84@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        Another good question is why AIs do not mindlessly regurgitate source material. The reason is that they have access to so much copyrighted material. If they were trained on only one book, they would constantly regurgitate material from that one book. Because it’s trained on many (millions) books, it’s able to get creative. So the argument of OpenAI really boils down to: “we are not breaking copyright law, because we have used sufficient copyrighted material to avoid directly infringing on copyright”.

      • petrol_sniff_king@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        5 months ago

        Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

        I agree, but the fact that shills for this technology are also wrong about it is at least interesting.

        Rhetorically speaking, I don’t know if that’s useless.

        I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy,

        I do like this point a lot.

        If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

        I do miss when the likes of cleverbot was just a fun novelty on the Internet.

    • interdimensionalmeme@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      The solution is any AI must always be released on a strong copyleft and possibly abolish copyright outright has it has only served the powerful by allowing them to enclose humanity common intellectual heritage (see Disney’s looting and enclosing if ancestral children stories). If you choose to strengthen the current regime, don’t expect things to improve for you as an irrelevant atomised individual,

    • Riccosuave@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

      • v_krishna@lemmy.ml
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        That seems more like an argument for free higher education rather than restricting what corpuses a deep learning model can train on

    • Dran@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Devil’s Advocate:

      How do we know that our brains don’t work the same way?

      Why would it matter that we learn differently than a program learns?

      Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?

      • EldritchFeminity@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        Because we’re talking pattern recognition levels of learning. At best, they’re the equivalent of parrots mimicking human speech. They take inputs and output data based on the statistical averages from their training sets - collaging pieces of their training into what they think is the right answer. And I use the word think here loosely, as this is the exact same process that the Gaussian blur tool in Photoshop uses.

        This matters in the context of the fact that these companies are trying to profit off of the output of these programs. If somebody with an eidetic memory is trying to sell pieces of works that they’ve consumed as their own - or even somebody copy-pasting bits from Clif Notes - then they should get in trouble; the same as these companies.

        Given A and B, we can understand C. But an LLM will only be able to give you AB, A(b), and B(a). And they’ve even been just spitting out A and B wholesale, proving that they retain their training data and will regurgitate the entirety of copyrighted material.

  • General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Let’s engage in a little fantasy. Someone invents a magic machine that is able to duplicate apartments, condos, houses, … You want to live in New York? You can copy yourself a penthouse overlooking the Central Park for just a few cents. It’s magic. You don’t need space. It’s all in a pocket dimension like the Tardis or whatever. Awesome, right? Of course, not everyone would like that. The owner of that penthouse, for one. Their multi-million dollar investment is suddenly almost worthless. They would certainly demand that you must not copy their property without consent. And so would a lot of people. And what about the poor construction workers, ask the owners of constructions companies? And who will pay to have any new house built?

    So in this fantasy story, the government goes and bans the magic copy machine. Taxes are raised to create a big new police bureau to monitor the country and to make sure that no one use such a machine without a license.

    That’s turned from magical wish fulfillment into a dystopian story. A society that rejects living in a rent-free wonderland but instead chooses to make itself poor. People work to ensure poverty, not to create wealth.

    You get that I’m talking about data, information, knowledge. The first magic machine was the printing press. Now we have computers and the Internet.

    I’m not talking about a utopian vision here. Facts, scientific theories, mathematical theorems, … All such is free for all. Inventors can get patents, but only for 20 years and only if they publish them. They can keep their invention secret and take their chances. But if they want a government enforced monopoly, they must publish their inventions so that others may learn from it.

    In the US, that’s how the Constitution demands it. The copyright clause: [The United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

    Cutting down on Fair Use makes everyone poorer and only a very few, very rich people richer. Have you ever thought about where the money goes if AI training requires a license?

    For example, to Reddit, because Reddit has rights to all those posts. So do Facebook and Xitter. Of course, there’s also old money, like the NYT or Getty. The NYT has the rights to all their old issue about a century back. If AI training requires a license, they can sell all their old newspapers again. That’s pure profit. Do you think they will their employees raises out of the pure goodness of their heart if they win their lawsuits? They have no legal or economics reason to do so. The belief that this would happen is trickle-down economics.

  • spacesatan@lazysoci.al
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    I’m I the only person that remembers that it was “you wouldn’t steal a car” or has everyone just decided to pretend it was “you wouldn’t download a car” because that’s easier to dunk on.

  • auzy@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    As others have said, it isn’t inspired always, sometimes it literally just copies stuff.

    This feels like it was written by someone who invested their money in AI companies because they’re worried about their stocks

  • kibiz0r@midwest.social
    link
    fedilink
    English
    arrow-up
    0
    ·
    5 months ago

    Not even stealing cheese to run a sandwich shop.

    Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.

    • TheKMAP@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      Whatever happened to copying isn’t stealing?

      I think the crux of the conversation is whether or not the world is better with ChatGPT. I say yes. We can tackle the disinformation in another effort.

      • calcopiritus@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        5 months ago

        When you copy to consume yourself it’s way different than when you copy to sell the copy for a lower price.

        • TheKMAP@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          5 months ago

          They’re not selling the copy, bruh. They’re selling a technology that very few understand. Smart people pretend they get it, but they don’t. That’s how rare the math is.

          • calcopiritus@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            5 months ago

            So because you don’t understand it, everything it does should be legal?

            It’s not rare maths. There are trns of thousands of AI experts. And most CS graduates (millions) have a good understanding on how they work, just not the specifics of the maths.

            Yeah, they’re not selling a copy, they are just selling a subscription to a copying machine loaded with the information needed to make a copy. Totally different.

            I should start a business of printers and attach a USB with the PNG of a dollar bill. And of course my printers won’t have any government mandated firmware that disables printing fake money.

            I’m not printing fake money! It’s my clients! Totally legal.