Upcycling Experiments
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)
This collection has no items.
Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)