| # EditCoder |
|
|
| The EditCoder models are the fine-tuned models described in the following paper: |
|
|
| ``` |
| @inproceedings{cassano2023edit, |
| title={{Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions}}, |
| author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha}, |
| booktitle={The First International Workshop on Large Language Model for Code}, |
| year={2024}, |
| url={https://arxiv.org/abs/2312.12450} |
| } |
| ``` |
|
|
| This repository has several models. The root is the fine-tune of DeepSeek Coder 33B on the EditPackFT dataset. The other models are |
| in subdirectories. You can do this: |
|
|
| ```bash |
| AutoModelForCausalLM.from_pretrained("nuprl/EditCoder", subfolder=DIR_NAME) |
| ````` |
|
|
| ## Prompt |
| The model has been trained on the following prompt format: |
| ``` |
| ## Code Before: |
| {before} |
| ## Instruction: |
| {instruction} |
| ## Code After: |
| {after} |
| ``` |
|
|
| Here is a python function that can be used for formatting the prompt correctly: |
| ```py |
| def edit_prompt(old, instr): |
| before = f"""## Code Before:\n{old}\n""" |
| instr = f"""## Instruction:\n{instr}\n""" |
| after = f"""## Code After:\n""" |
| return before + instr + after |
| ``` |
|
|
| ## Training Code |
|
|
| We provide the full pipeline that was used for training our own edit-coder model. |
| The pipeline and instructions can be found on our [GitHub repository](https://github.com/nuprl/CanItEdit/tree/main/editcoder). |