Skip to main content

OpenAI gives an impressive non-answer when asked if Sora was trained on YouTube videos

OpenAI’s impressive video AI, Sora, caught the world by storm earlier this year, but the company still won’t admit whether or not videos from YouTube were used to train the AI model.

In an interview at the Bloomberg Technology Summit, OpenAI’s COO Brad Lightcap spoke on potential business applications of AI tech. Unsurprisingly, Sora is one of those potential use cases. However, when asked about whether or not YouTube videos were used to train OpenAI’s Sora, Lightcap refused to give an actual answer.

When directly asked to “clear up once and for all” if YouTube was used to train Sora, Lightcap said:

Yeah, I mean look, the conversation around data is really important.

We obviously need to know where that data comes from. We just put out a post this week actually about this exact topic which is basically that there needs to be like a content ID system for AI that lets creators understand as they create stuff where it’s going, who’s training on it, being able to opt in and out of training, being able to opt in and out of use. Also like, on the other side of that, being able to actively allow your content to be put into a model or to be accessed by a model because there may be this other economic opportunity on the other end of this. And that’s something we’re exploring too, how do you actually go create an entirely different social contract with the web, with creators, with publishers, where as these models go off in the world and do things that are useful, create value, to the extent they’re able to like, reference and incorporated content from the web, there should be some sort of way that people can kind of get a benefit from that.

So, yeah, we’re looking at this problem, it’s really hard. We don’t have all the answers yet.

It’s an impressive non-answer, as YouTube wasn’t addressed a single time in the whole reply.

OpenAI did indeed publish a post earlier this week on “understanding the source of what we see and hear online.” The post doesn’t address anything with YouTube yet again, but rather discusses how OpenAI is working to help build a standard for content authenticity, as well as how it’s building new ways to identify content that was produced by OpenAI’s tools.

It was reported earlier this year that OpenAI used “over a million” hours of YouTube content, against the platform’s rules, to train GPT-4, though Google did the same for Gemini according to the report.

OpenAI says that Sora will be available later this year.

You can see the full interview below. We’ve timestamped the embed to the question about YouTube, which is followed by the interviewer’s cheeky response of “so no answer on YouTube, for now.”

More on AI:

Follow Ben: Twitter/XThreads, Bluesky, and Instagram

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Ben Schoon Ben Schoon

Ben is a Senior Editor for 9to5Google.

Find him on Twitter @NexusBen. Send tips to schoon@9to5g.com or encrypted to benschoon@protonmail.com.


Manage push notifications

notification icon
We would like to show you notifications for the latest news and updates.
notification icon
You are subscribed to notifications
notification icon
We would like to show you notifications for the latest news and updates.
notification icon
You are subscribed to notifications