ѕкт αι ℓαвѕ
AI & ML interests
Recent Activity
Organizations
badhiya ji
We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.
💎 Key Highlights:
•• Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.
•• Pure Quality: Curated from 500+ Elite Sources
•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.
🤝 Open for Collaboration!
We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.
Explore the Dataset on Hugging Face:
🔗 Shrijanagain/SKT-OMNI-CORPUS-146T-V1
#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia
We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.
💎 Key Highlights:
•• Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.
•• Pure Quality: Curated from 500+ Elite Sources
•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.
🤝 Open for Collaboration!
We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.
Explore the Dataset on Hugging Face:
🔗 Shrijanagain/SKT-OMNI-CORPUS-146T-V1
#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia
Danger
If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.
I'm not saying that an 140 whetever trillion parameter model can't exist, I'm just telling that your "paper" is misleading users to believe that someone single handed made an AGI.
Just be realistic, try making a 140 Billion model once and reply me how much time it took to train it from scratch.
Training a 140B model is a calculation of compute; designing a 146T architecture is a matter of engineering. While you're stuck on the 'time' it takes others, I’m focused on the MoE scaling and dataset curation for SKT AI. If you’re so concerned about the realism, do 𝗚𝗼 𝗔𝗻𝗱 𝗖𝗵𝗲𝗰𝗸 𝗢𝘂𝘁 𝗢𝘃𝗲𝗿 𝗥𝗲𝗽𝗼 𝗟𝗼𝗹
I have better things to do in my free time than look at a ""paper"" written by artificial intelligence.
That’s the difference—you have 'free time' to argue, I’m busy engineering the future of Indian AI. If you can’t tell the difference between a roadmap and a chatbot output, that’s on you. Enjoy your free time while I keep building. Do go and check out our repo lol
If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.
I'm not saying that an 140 whetever trillion parameter model can't exist, I'm just telling that your "paper" is misleading users to believe that someone single handed made an AGI.
Just be realistic, try making a 140 Billion model once and reply me how much time it took to train it from scratch.
Training a 140B model is a calculation of compute; designing a 146T architecture is a matter of engineering. While you're stuck on the 'time' it takes others, I’m focused on the MoE scaling and dataset curation for SKT AI. If you’re so concerned about the realism, do 𝗚𝗼 𝗔𝗻𝗱 𝗖𝗵𝗲𝗰𝗸 𝗢𝘂𝘁 𝗢𝘃𝗲𝗿 𝗥𝗲𝗽𝗼 𝗟𝗼𝗹
If 'fun at parties' means ignoring the potential of a 146 trillion parameter model, then yeah, I’m the most boring person you'll ever meet. I’ll let the results do the talking from here.