Google is giving web publishers a new way to control AI training data and “whether their sites help improve Bard and Vertex AI generative APIs.”
Large language models (LLMs) are trained on massive amounts of data, including web content. Google in July called for the creation of a modern robots.txt for AI. In lieu of an industry-wide standard, Google is updating its own platform:
By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.
Google-Extended, which is part of robots.txt, specifically applies to training Bard and Vertex AI (which is available to third-parties as a Google Cloud offering), as well as “future generations of models that power those products.” More information for publishers is available here.
Google says it has heard how web publishers “want greater choice and control over how their content is used for emerging generative AI use cases.” The company calls this an “important step in providing transparency and control that we believe all providers of AI models should make available.”
…we’re committed to engaging with the web and AI communities to explore additional machine-readable approaches to choice and control for web publishers. We look forward to sharing more soon.
More on Google AI:
- Google SGE now available for teens, adding About this result
- Bard Extensions let Google access your Gmail and Docs to get things done
- Google Bard now in Europe, supports 40+ languages, pinning chats, and more
- Bard gets Google Sheets export, improved logic and reasoning skills
FTC: We use income earning auto affiliate links. More.
Comments