Submitted by Jingfeng Yao 102 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 424 4