Great. Would it run on 24 GB VRAM?
Jean Louis
AI & ML interests
Recent Activity
Organizations
PaddlePaddle/PaddleOCR-VL
โจ Ultra-efficient NaViT + ERNIE-4.5 architecture
โจ Supports 109 languages ๐คฏ
โจ Accurately recognizes text, tables, formulas & charts
โจ Fast inference and lightweight for deployment
The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability โ from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.
๐ Explore the full collection on Hugging Face:
๐ lamhieu/xllms-66cdfe34307bb2edc8c6df7d
๐ฌ Highlight: xLLMs โ Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). Itโs ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
๐ lamhieu/xllms_dialogue_pubs
๐ง Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning
All datasets are open for research and development use โ free, transparent, and carefully curated to improve dialogue model quality.
Learn how to search in a video dataset and generate using Tevatron/OmniEmbed-v0.1-multivent an all modality retriever, and Qwen/Qwen2.5-Omni-7B, any-to-any model in this notebook ๐ค merve/smol-vision
Soโฆ who are they, and why does it matter?
Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.
๐งต A few standout facts:
1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.
2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI โ still a rare ambition among Chinese AI labs.
3. A trillion-parameter model thatโs surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.
4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.
Most importantly, their move from closed to open source signals a broader shift in Chinaโs AI scene โ following Baiduโs pivot. But as Yang puts it: โUsers are the only real leaderboard.โ
๐ Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained
No, the Pangu Model License Agreement Version 1.0 is not a free software license. It imposes significant restrictions, such as prohibiting use within the European Union (Section 3) and requiring attribution (Section 4.2), which conflict with the principles of free software licenses like the GNU GPL or Open Source Definition. The non-transferable clause (Section 2) and indemnity requirement (Section 7) further deviate from standard free software terms.
๐ฅ "Open Model"? More Like "Openly Restrictive"! ๐ฅ
Huawei calls Pangu Pro MoE an "open model"? Thatโs like calling a locked door an "open invitation." Letโs break down the brilliant "openness" here:
- "No EU Allowed!" (Section 3) โ Because nothing says "open" like banning entire continents. GDPR too scary for you, Huawei?
- "Powered by Pangu" or GTFO (Section 4.2) โ Mandatory branding? Real open-source models donโt force you to be a walking billboard.
- Non-transferable license (Section 2) โ Canโt pass it on? So much for community sharing.
- Indemnify Huawei for your use (Section 7) โ If anything goes wrong, you pay, not them. How generous!
This isnโt an "open model"โitโs a marketing stunt wrapped in proprietary chains. True open-source (Apache, MIT, GPL) doesnโt come with geographic bans, forced attribution, and legal traps.
Huawei, either commit to real openness or stop insulting the FOSS community with this pretend-free nonsense. ๐ฎ
"not commercial" license isn't "Open Source", so please be accurate to users.
Reference:
The Open Source Definition โ Open Source Initiative:
https://opensource.org/osd
Gemma License (danger) is not Free Software and is not Open Source:
https://gnu.support/gnu-emacs/emacs-lisp/Gemma-License-danger-is-not-Free-Software-and-is-not-Open-Source.html
So the goal of Google is just their monopoly and dependence of users. I suggest using fully free, free as in freedom, LLMs.
Model:
THU-KEG/LongWriter-Zero-32B
Dataset:
THU-KEG/LongWriter-Zero-RLData
Paper:
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning (2506.18841)
โจ 32B
โจ Multi-reward GRPO: length, fluency, structure, non-redundancy
โจ Enforces <think><answer> format via Format RM
โจ Build on Qwen2.5-32B-base
Okay, please keep researching so that you can get more tools for Uganda, Kenya and Tanzania.
Every language carries its own cultural values and worldviews. So, when we build AI systems, we're not just deciding how they speak but also whose perspectives they represent.
Even choosing which dialect to train on in Norway becomes a question of inclusion and power. In Kenya, will AI speak Swahili from Nairobi or coastal regions? What about indigenous languages with rich oral traditions but limited written text, like Quechua in Peru or Cherokee in North America?
The path forward? Building WITH communities, not just FOR them. Working with local partners (libraries, universities, civil society), testing for cultural alignment, and asking hard questions about representation.
Just published some thoughts on this after my keynote in Norway a few weeks ago: https://huggingface.co/blog/giadap/when-ai-speaks
Thank you, that is interesting, but where is the link? Is it going to work on 24 GB VRAM?
Ever felt your AI agent is "shooting from the hip"? It latches onto a single line of thought and fails to produce a robust, well-rounded plan. This is a common struggle I've called the "AI Reasoning Paradox."
To tackle this, I developed Trinity-Synthesis, a multi-agent architecture designed to force reflection and synthesis before delivering a final answer. The philosophy is simple: constructive conflict between different perspectives leads to better solutions.
Hereโs the core idea:
Instead of one agent, it uses four agents running on the same base model but with different "personalities" defined by their system prompts and temperature settings:
๐ง The Visionary: Thinks outside the box (high temp: 1.0).
๐ The Analyst: Focuses on logic, data, and structure (low temp: 0.3).
๐ ๏ธ The Pragmatist: Evaluates feasibility, costs, and risks (mid temp: 0.5).
These three "thinkers" work in parallel on the same problem. Then, a final Synthesizer agent critically analyzes their outputs, rejects flawed arguments, and integrates the best points into a single, coherent, and often superior strategy.
The result is a more robust reasoning process that balances creativity with analytical rigor, making it ideal for solving complex, strategic problems where answer quality is critical.
I've written a deep dive on how it works, including a detailed case study ("The Helios Initiative") and the Python source code for you to experiment with.
Read the full article on Medium:
https://medium.com/@brainhome9/trinity-synthesis-how-i-built-an-ai-agent-that-thinks-before-it-speaks-d45d45c2827c
I'd love to hear your feedback and see what you build with it!
#AI #AIAgents #LLM #Reasoning #MultiAgent
It's an evolutionary coding agent that uses LLMs to discover and optimize algorithms. I successfully replicated DeepMind's results on circle packing (99.97% match!) and evolved a random search into a simulated annealing algorithm.
โจ Key features:
- Evolves entire codebases (not just single functions)
- Works with any OpenAI-compatible API
- LLM ensemble approach for better results
- Multi-objective optimization
๐ Check it out:
GitHub: https://github.com/codelion/openevolve
Blog post: https://huggingface.co/blog/codelion/openevolve
Would love to hear your thoughts or answer any questions about it!
Gemini's proprietary license is a deal-breaker. It's not just about performanceโit's about freedom. Google's terms actively restrict libre use, while models like QwQ 32B and DeepSeek v3 (when properly licensed) respect user rights. Never conflate ethically-licensed AI with corporate traps that forbid modification, redistribution, or independent use.
That's why today I'm excited to introduce ๐ซ๐๐๐๐๐ซ๐ฌ, the new feature of PdfItDown v1.4.0!๐
With ๐ณ๐ฆ๐ข๐ฅ๐ฆ๐ณ๐ด, you can choose among three (for now๐) flavors of text extraction and conversion to PDF:
- ๐๐ผ๐ฐ๐น๐ถ๐ป๐ด, which does a fantastic work with presentations, spreadsheets and word documents๐ฆ
- ๐๐น๐ฎ๐บ๐ฎ๐ฃ๐ฎ๐ฟ๐๐ฒ by LlamaIndex, suitable for more complex and articulated documents, with mixture of texts, images and tables๐ฆ
- ๐ ๐ฎ๐ฟ๐ธ๐๐๐๐ผ๐๐ป by Microsoft, not the best at handling highly structured documents, by extremly flexible in terms of input file format (it can even convert XML, JSON and ZIP files!)โ๏ธ
You can use this new feature in your python scripts (check the attached code snippet!๐) and in the command line interface as well!๐
Have fun and don't forget to star the repo on GitHub โก๏ธ https://github.com/AstraBert/PdfItDown
Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isnโt sped up.
3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)
NVIDIA also released a demo where you can click any transcribed segment to play it instantly.
The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).
Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!
Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard