Submitted by Wenqi Zhang 23 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification ZJU-OmniAI 28 4