Robots.txt just isn’t up to the task of keeping public data out of AI training sets. The only thing between a website and an internet scraper’s database is a tiny file called robots.txt, a small configuration file that tells web crawlers for services ranging from Google to OpenAI which parts of a site they canContinue reading “Nicholas Vincent explains why robots.txt is no longer enough to protect against web scraping.”

You must be logged in to post a comment.