Pragna-1B Data Training

Benefits of Open-source Multilingual AI Models Like Pragna-1B

Training Pragna-1B required a special focus because large datasets for Indian languages are rare. Here’s what they used:

Bhasha: Soket AI Labs created their dataset called Bhasha, translating millions of English Wikipedia articles into Hindi and other Indian languages.
Bhasha-wiki-indic: This is a filtered version of Bhasha focusing on content specific to India, helping the model understand Indian culture and context.
Bhasha-SFT: This dataset trains the model for various tasks like question answering and conversation, making it more versatile.
External Datasets: They also included existing datasets like SlimPajama (mostly English) and Sangraha-Verified (verified data in multiple Indian languages) to further enrich the training process.

Soket AI Partners Google Cloud To Launch Multilingual AI Model

Indian AI is witnessing a big step forward with the introduction of Pragna-1B. This new initiative is a collaboration between Soket AI Labs, a leading Indian AI research firm, and Google Cloud, the global tech giant. Pragna-1B is a game-changer designed specifically to bridge the language gap in India. As India’s first open-source multilingual AI model, Pragna-1B provides developers with cutting-edge Machine Learning (ML) and Natural Language Processing (NLP) capabilities.

Read In Short:

Soket AI Labs partners with Google Cloud to unveil Pragna-1B, India’s first open-source multilingual AI model.

Pragna-1B provides developers with advanced Multilingual Language Processing (MLP) capabilities, catering to Hindi, English, Bengali, and Gujarati.

The open-source nature of Pragna-1B fosters collaboration and accelerates the development of Vernacular language AI solutions in India.

Pragna-1B Data Training

Soket AI Partners Google Cloud To Launch Multilingual AI Model

Similar Reads

Contact Us