How to Stop Your Data From Being Used to Train AI

Posted on:
Key Points

Mireshghallah explains that companies can make it complicated to opt out of having data used for AI training, and even where it is possible, many people dont have a clear idea about the permissions theyve agreed to or how data is being used..

Grammarly does not currently offer an opt-out process for personal accounts, but self-serve business accounts can choose to opt out from having their data used to train Grammarlys machine-learning model..

OpenAI also says if you have a high volume of images hosted online that you want removed from training data, then it may be more efficient to add GPTBot to the robots.txt file of the website where the images are hosted...

This doesnt include user emails or private content, an Automattic spokesperson says.. Tumblr has a prevent third-party sharing option to stop what you publish being used for AI training, as well as being shared with other third parties such as researchers..

We are also trying to work with crawlers (like https://commoncrawl.org/) to prevent content from being scraped and sold without giving our users choice or control over how their content is used, an Automattic spokesperson says.. If you are hosting your own website, you can update your robots.txt file to tell AI bots not to scrape the pages..