Pragna-1B Data Training
Training Pragna-1B required a special focus because large datasets for Indian languages are rare. Here’s what they used:
- Bhasha: Soket AI Labs created their dataset called Bhasha, translating millions of English Wikipedia articles into Hindi and other Indian languages.
- Bhasha-wiki-indic: This is a filtered version of Bhasha focusing on content specific to India, helping the model understand Indian culture and context.
- Bhasha-SFT: This dataset trains the model for various tasks like question answering and conversation, making it more versatile.
- External Datasets: They also included existing datasets like SlimPajama (mostly English) and Sangraha-Verified (verified data in multiple Indian languages) to further enrich the training process.
Soket AI Partners Google Cloud To Launch Multilingual AI Model
Indian AI is witnessing a big step forward with the introduction of Pragna-1B. This new initiative is a collaboration between Soket AI Labs, a leading Indian AI research firm, and Google Cloud, the global tech giant. Pragna-1B is a game-changer designed specifically to bridge the language gap in India. As India’s first open-source multilingual AI model, Pragna-1B provides developers with cutting-edge Machine Learning (ML) and Natural Language Processing (NLP) capabilities.
Read In Short:
- Soket AI Labs partners with Google Cloud to unveil Pragna-1B, India’s first open-source multilingual AI model.
- Pragna-1B provides developers with advanced Multilingual Language Processing (MLP) capabilities, catering to Hindi, English, Bengali, and Gujarati.
- The open-source nature of Pragna-1B fosters collaboration and accelerates the development of Vernacular language AI solutions in India.
Contact Us