OpenAI gives an impressive non-answer when asked if Sora was trained on YouTube videos

Ben Schoon | May 10 2024 - 10:15 am PT

OpenAI’s impressive video AI, Sora, caught the world by storm earlier this year, but the company still won’t admit whether or not videos from YouTube were used to train the AI model.

In an interview at the Bloomberg Technology Summit, OpenAI’s COO Brad Lightcap spoke on potential business applications of AI tech. Unsurprisingly, Sora is one of those potential use cases. However, when asked about whether or not YouTube videos were used to train OpenAI’s Sora, Lightcap refused to give an actual answer.

When directly asked to “clear up once and for all” if YouTube was used to train Sora, Lightcap said:

Yeah, I mean look, the conversation around data is really important.

We obviously need to know where that data comes from. We just put out a post this week actually about this exact topic which is basically that there needs to be like a content ID system for AI that lets creators understand as they create stuff where it’s going, who’s training on it, being able to opt in and out of training, being able to opt in and out of use. Also like, on the other side of that, being able to actively allow your content to be put into a model or to be accessed by a model because there may be this other economic opportunity on the other end of this. And that’s something we’re exploring too, how do you actually go create an entirely different social contract with the web, with creators, with publishers, where as these models go off in the world and do things that are useful, create value, to the extent they’re able to like, reference and incorporated content from the web, there should be some sort of way that people can kind of get a benefit from that.

So, yeah, we’re looking at this problem, it’s really hard. We don’t have all the answers yet.

It’s an impressive non-answer, as YouTube wasn’t addressed a single time in the whole reply.

Advertisement - scroll for more content

OpenAI did indeed publish a post earlier this week on “understanding the source of what we see and hear online.” The post doesn’t address anything with YouTube yet again, but rather discusses how OpenAI is working to help build a standard for content authenticity, as well as how it’s building new ways to identify content that was produced by OpenAI’s tools.

It was reported earlier this year that OpenAI used “over a million” hours of YouTube content, against the platform’s rules, to train GPT-4, though Google did the same for Gemini according to the report.

OpenAI says that Sora will be available later this year.

You can see the full interview below. We’ve timestamped the embed to the question about YouTube, which is followed by the interviewer’s cheeky response of “so no answer on YouTube, for now.”