MiniMax-M2.7 is highly verbose and slow
#18
by weisunding - opened
I upgraded from M2.5 to M2.7, I am using 4 x A100(80G), the M2.5 is compared to be on-the-fly, but the M2.7 is really verbose and slow, I am running with the VLLM, it's getting worse after some amount of requests.
Try to use other frameworks. Do you use L2 and L3 cache? how is your token generation per sec? how many seqs ?