add cache_position to mask_kwargs in modeling_step3p7.py

#13

In transformers version 5.0.0, the create_causal_mask function requires the cache_position argument.

WinstonDeng changed pull request status to merged

Sign up or log in to comment