配置 GSM8K SVAMP GSM8K-hard MultiArith
zero_shot 74.15% 81.00% 35.86% 99.44%
eval_full_s0.0015_a0.0002 73.84% 81.33% 36.62% 99.44%
eval_full_s0.0015_a0.00025 73.16% 83.00% 37.45% 97.22%
avg_full 73.50% 82.17% 37.04% 98.50%
eval_r4_s0.02_a0.005 75.97% 85.33% 39.35% 100.00%
eval_r4_s0.02_a0.0075 76.65% 83.33% 36.39% 96.67%
eval_r4_s0.025_a0.0025 75.36% 85.33% 38.06% 97.22%
avg_r4 75.99% 84.66% 37.93% 97.96%
eval_r8_s0.0225_a0.0035 75.66% 83.00% 37.98% 98.33%
eval_r8_s0.0225_a0.0025 76.95% 84.33% 38.67% 99.44%
eval_r8_s0.0225_a0.0025 76.42% 82.33% 37.98% 97.78%
avg_r8 76.34% 83.22% 38.21% 98.52%
eval_r16_s0.0225_a0.005 75.82% 85.67% 39.12% 98.89%
eval_r16_s0.0225_a0.0025 74.83% 85.00% 37.83% 96.67%
eval_r16_s0.0225_a0.0035 75.66% 83.67% 38.89% 98.33%
avg_r16 75.44% 84.78% 38.61% 97.96%
eval_r64_s0.01_a0.0025 77.03% 83.33% 36.62% 96.11%
eval_r64_s0.0085_a0.0025 75.59% 84.33% 37.68% 96.11%
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support