Key Points
Two announcements Wednesday offer evidence that large language models can in fact be trained without the permissionless use of copyrighted materials...
And the nonprofit Fairly Trained announced that it has awarded its first certification for a large language model built without copyright infringement, showing that technology like that behind ChatGPT can be built in a different way to the AI industrys contentious norm...
On Wednesday, researchers released what they claim is the largest available AI dataset for language models composed purely of public domain content..
As far as I am aware, this is currently the largest public domain dataset to date for training LLMs, says Stella Biderman, the executive director of EleutherAI, an open source, collective project that releases AI models..
Although it doesnt have additional LLMs on its docket, Fairly Trained recently certified its first company to offer AI voice models, the Spanish voice-changing startup VoiceMod, as well as its first AI band, a heavy-metal project called Frostbite Orckings...
You might be interested in
AI Is Becoming More Powerful—but Also More Secretive
19, Oct, 23The companies behind ChatGPT and other popular and powerful AI systems aren’t transparent enough about their training data and how they work, according to a new report from Stanford University.
Meta joins AI chatbot race with own large language model for researchers
26, Feb, 23Smaller models trained on more tokens which are pieces of words are easier to retrain and finetune for specific potential product use cases
Selective Forgetting Can Help AI Learn Better
10, Mar, 24Erasing key information during training allows machine learning models to learn new languages faster and more easily.
OpenAI used YouTube data to train some of its models: Report
15, Jun, 23The outlet also reported that Google, which owns YouTube, has been using the video sharing platform’s data to train its own model Gemini. Read more on The Hindu