DevQuasar/NovaSky-AI.Sky-T1-32B-Flash-GGUF Text Generation β’ 33B β’ Updated Feb 21, 2025 β’ 8 β’ 1
view post Post 7270 π’ New Research Alert: Making Language Models Smaller & Smarter!Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance. The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.π Key Findings:β’ 77% parameter reduction.β’ Maintained model capabilities.β’ Improved generalization.Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORTCode: https://github.com/joaopauloschuler/less-parameters-llm See translation 2 replies Β· π 19 19 π₯ 8 8 π€― 3 3 π 2 2 π§ 1 1 + Reply
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper β’ 2501.15570 β’ Published Jan 26, 2025 β’ 25
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper β’ 2410.10812 β’ Published Oct 14, 2024 β’ 18
Addition is All You Need for Energy-efficient Language Models Paper β’ 2410.00907 β’ Published Oct 1, 2024 β’ 151