How are Indian languages faring in the age of AI and language models?

Posted on:
Key Points

But beyond Sanskrit, how are other Indian languages faring in the realm of artificial intelligence (AI), at a time when its language-based applications have taken the world by storm?..

If you take [an English-based] model and fine-tune it using a Hindi corpus, the model might be able to reuse some of its understanding of English grammar, but it will still need to learn representations of individual Hindi words and the relationships between them, Dr. Kulkarni, chief data scientist at Pune-based DeepTek AI, said.. Sentences in different languages are thus tokenised in different ways, even if they have the same meaning..

The availability of text in each language is going to be a long-tail a few languages with lots of text, many languages with few examples and this is going to affect models dealing with the latter, Makarand Tapaswi, a senior machine learning scientist at Wadhwani AI, a non-profit, and assistant professor at the computer vision group at IIIT Hyderabad, said...

To train language models, he said, AI4Bharat has a corpus called IndicCorp with 22 Indian languages, and its CommonCrawl website-crawler can support 10-15 Indian languages...

There has been a spurt in activity of open-source modest-sized models in English, and it indicates that we could build promising models for Indian languages, which becomes a springboard for further innovation..

You might be interested in

Why India risks falling behind in the AI race

30, Jun, 23

Indias startup landscape, meanwhile, is caught in a time warp, with embarrassed investors marking down their stakes in Byjus, an online education company collapsing under the weight of its own reckless growth.