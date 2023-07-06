Large language models are trained on massive amounts of data, including the web. Google is now calling for “machine-readable means for web publisher choice and control for emerging AI and research use cases,” or a modern robots.txt.

Google says web publishers having “choice and control” over their content is an important part of maintaining a vibrant ecosystem. It points to how robots.txt files allow sites to set whether search engines can crawl and index their content.

However, we recognize that existing web publisher controls were developed before new AI and research use cases.

As such, Google wants to bring together “web publishers, civil society, academia and more fields from around the world” to discuss the modern equivalent of robots.txt for AI training. It notes how that community-developed web standard, which is almost 30 years old, has been “simple and transparent.”

The company today has the Search Generative Experience, Bard, and is actively training Gemini, its next-generation foundation model.

Google wants a public discussion with a sign-up form today letting groups express interest ahead of it kicking off: “The Mailing List is intended for members of the web and AI communities who wish to receive future messages regarding the process to develop new machine-readable means to provide web publisher choice and control.”

It will be “convening those interested in participating over the coming months.”

More on Google AI: