Artificial Intelligence

Out of context: Reply #987

  • Started
  • Last post
  • 1,323 Responses
  • yuekit1

    https://archive.is/vAw3g/d08b8c1…

    Did tech companies violate copyright by training their models on other people's work?

    This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.

    https://archive.is/vAw3g

    • Do we need permission to read a book and influence your thoughts or creativity?microkorg
    • Do we needed to pay and ask for permission to every photographer whose pictures were used to train face detection ?palimpsest
    • Also, the data "not used" is as useful as the data that "is used" in training.palimpsest
    • If you don't want what you have learned or created turned into influence then keep it to yourself. Don't publish it anywhere. ;)microkorg
    • humans are not a.i ... thats why they violated it...neverscared
    • u have to give the human a tech device to engange with media like.. filming movies in the 2000´s with a camera and then influence...clearly violation.neverscared
    • like the programmers who tune the a.i .. literally puttin their hands on like u pressed rec. on a camera...neverscared
    • let the machines read, so i can talk to them!imbecile
    • this is dumb, who doesn't build an art direction deck with other people's photos? do we ask for permission? everything is fucking derivativedoesnotexist
    • this is not a pitch deck these companies are charging money for a commercial product... based of data they didn't licensejonny_quest_lives
    • ifyour pitch deck was leaked online and u used scrap that was from a photographer with means your agency would be suedjonny_quest_lives
    • let me tell you about the time an agency i worked for went live with a website using a house industries font they didn't license...jonny_quest_lives
    • House industries came after them so fast and for so much the Creative Director called an emergency meeting to ban the use of House Industries fonts outrightjonny_quest_lives
    • it's all fun and games until Neutraface is implemented on a semi public devsite and House industries legal contacts you before client sends feedbackjonny_quest_lives
    • Just think about all the data that goes into AI training, you can't license it all. It's like paying IKEA for using a bookshelf in a single movie shot.palimpsest
    • In facial recognition the image of a fire hydrant (non-target data) is as useful as the image of a face. Imagine licensing all the images used there.palimpsest
    • The companies acquired pirated material, it's illegal and wrong.
      But the violation of copyright by training models on others people's work is another question.
      palimpsest
    • Also, we are talking about training AI, not the output. I don't think it's feasible to license every single piece of data that goes into training.palimpsest
    • fire hydrant isn't a getty image of brad pittjonny_quest_lives
    • just as Getty would have a copyright of the image of brad pitt... Pitt would have likeness rights limiting Getty to only use the image for editorialjonny_quest_lives
    • or non commercial work... Midjourney is trying to thread the needle of passing their original infringement onto their users. it just won't fly.jonny_quest_lives
    • the tech is neat almost magical but it's only magical with high quality data...jonny_quest_lives
    • You are not allowed to take someones music and use it in your songs without permission, but you are allowed to listen to music and learn from it and make your o_niko
    • ...own. Who cares where that music came from really, whether you paid for a CD or went to see a band or your mama sang it to you, you learn by hearing._niko
    • These authors are saying, "you heard my songs somewhere, maybe you girlfriends mix-tape or maybe you downloaded it for free, or maybe it was on the radio_niko
    • the point is that you learned from our music and so you owe us moneys."_niko
    • It I show someone a pic of Brad Pitt and tell them to find me an actor that does or doesn't look like him. Should I license the pic?palimpsest
    • Repeating myself: in training, which is the issue here, the images of not Brad Pitt are as necessary as those of Brad Pitt.palimpsest
    • _niko has made a good point from the human learning (training) experience.

      We're talking about practice here. What are we talking about? Practice!
      palimpsest
    • also repeating myself but you can't train your ai on images of brad pitt without securing an authorization of brad pitt if you are selling a commercial servicejonny_quest_lives
    • that can replicate 1;1 likenesses derivatives of brad pitt at commercial scale... tke that functionality of the SOFTWAREjonny_quest_lives
    • it's SOFTWARE.. loaded up and trained on data...DATA they needed to secure licensing for for commercial use. the software guesses probabilities of pixelsjonny_quest_lives
    • based on pixels they trained it on added noise to then denoises and did that thousands of times until the machine was tunedjonny_quest_lives
    • you may find value in what the software can do now but that doesn't mean thes companies aren't liable... and the people who had their shit taken can seekjonny_quest_lives
    • compensation or ask the companies to kill the training models and do it all over again with licensed commercial datajonny_quest_lives
    • either way the companies will either settle or be penalized. genie is out of the bottle... it's a different world...jonny_quest_lives
    • but it's just software... it doesn't learn it's not sentient data and functionality can be removed/added at any timejonny_quest_lives
    • it's weird... what if Adobe had done the same as these Ai startups and just gaffled normies data and scrapped artstation and deviant art and thousands ofjonny_quest_lives
    • portfolio sites and trained a generative image mode then released it and said too bad it's just like a human let it learnjonny_quest_lives
    • I am not against licensing for training. If the the only job of the software would be to recognize pics of Brad Pitt.palimpsest
    • I just don't see how feasible it would be to license everything that something like ChatGPT is trained with.palimpsest
    • That's the ai startups headache.... Not copyright holders... sort out your training data...jonny_quest_lives
    • LOL, OK.palimpsest
    • see what happens in the courts... the midjourney case will set the precedent... either they settle with getty or go to trialjonny_quest_lives
    • and either pay damages some peeps say they may have to retrain their models who knows... lawyers will always tell you the goal is not to be suedjonny_quest_lives
    • https://cdn.vox-cdn.…jonny_quest_lives
    • "This case arises from Stability AI’s brazen infringement of Getty Images’
      intellectual property on a staggering scale. Upon information and belief,
      jonny_quest_lives
    • Upon information and belief, Stability AI has copied more than 12 million photographs from Getty Images’ collection,jonny_quest_lives
    • along with the associated
      captions and metadata, without permission from or compensation to Getty Images
      jonny_quest_lives
    • as part of its
      efforts to build a competing business"
      jonny_quest_lives
    • Our Agency Getty rep once met with our agencies IT department because Art Director's were hoarding stock they downloadedjonny_quest_lives
    • Apparently that's a violation of TOS stock is to be downloaded per project as the images on Getty's current site are the only ones they can license youjonny_quest_lives
    • Hoarding stock for other projects puts you and your client at risk if that asset drops of their site i.e someone purchases exculsive rightsjonny_quest_lives
    • they discovered us because we went to license stock that they hadn't hosted online for 4 years.... we could't provide a receipt or date of downloadjonny_quest_lives
    • courtesy visit from the getty rep... it department meeting... purge of art directors shared folders and raised rates for a dedicated download portaljonny_quest_lives
    • i think proving midjourney bulk scrapped their website for preview thumbnails will be a layup for Gettyjonny_quest_lives
    • https://www.gettyima…jonny_quest_lives
    • the midjourney getty case seems the most clear cut... unless discovery in these other cases shows these startups were violating IPjonny_quest_lives
    • left and right because they assumed some "research" rule set covered them... newsflash they aren't universities and if u haven't published academic papersjonny_quest_lives
    • hard to establish a research defense... charging users for outputs definitely negates the research clausejonny_quest_lives
    • but what do i know? Adobe won't let me use "Handicap" or "Pimp" in the generative fill prompt dialog so i figure TOS and ban hammers are comingjonny_quest_lives
    • keep an eye on these young cats they are up and comers: https://disneyanimat…jonny_quest_lives
    • relatively young startup get the word out: https://la.disneyres…jonny_quest_lives

View thread