LLM4CodeRE-S2S-V1 / README.md
JeloH's picture
Update README.md
931f8c2 verified
---
base_model:
- JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI
library_name: peft
---
# Model Card for JeloH/LLM4CodeRE-S2S-V1
LLM4CodeRE-S2S-V1 is a PEFT fine-tuned causal language model for reverse-engineering-oriented code translation tasks. It supports sequence-to-sequence style prompting for mapping between source code, assembly, and binary-related representations.
## Model Details
### Model Description
LLM4CodeRE-S2S-V1 is a multi-task reverse engineering model built on top of `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It was fine-tuned using LoRA on instruction-style code translation tasks, including assembly-to-source and source-to-assembly conversion, along with binary-related code transformation tasks.
The model uses a **causal language modeling objective** with **sequence-to-sequence style prompting**. Here, “S2S” refers to prompt-based input-output translation within a single causal sequence, rather than a traditional encoder-decoder architecture.
## Uses
![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/V16QXd0EKLps7k0PdRDva.png)
![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/NWnY_5iuqEVE4RZ5pru2B.png)
### Direct Use
This model is intended for research and experimental reverse-engineering tasks, including:
- Assembly to source code (`Asm2Src`)
- Source code to assembly (`Src2Asm`)
- Binary to source code (`Binary2Src`)
- Source code to binary (`Src2Binary`)
- Binary to assembly (`Binary2Asm`)
### Downstream Use [optional]
Potential downstream uses include:
- reverse engineering research
- code translation experiments
- educational use in code understanding
- program analysis and representation learning pipelines
### Results
![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/1vLQ7-EzAZqvUluZF8xq9.png)
### Citation
Jelodar, H., Bai, S., Nwankwo, T. E., Hamedi, P., Meymani, M., Razavi-Far, R., & Ghorbani, A. A. (2026).
LLM4CodeRE: Generative AI for code decompilation analysis and reverse engineering. arXiv. https://doi.org/10.48550/arXiv.2604.06095
## How to Get Started with the Model
Use the code below to get started with the model.
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "JeloH/LLM4CodeRE-S2S-V1"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
def generate_output(task, input_text):
if task == "Asm2Src":
prompt = f"Task: Asm2Src. Convert assembly to C/C++:\n\n{input_text}\n\nSource code:"
elif task == "Src2Asm":
prompt = f"Task: Src2Asm. Convert C/C++ to assembly:\n\n{input_text}\n\nAssembly:"
else:
raise ValueError("Only Asm2Src and Src2Asm are supported in this example")
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
return result[len(prompt):].strip()
src_code_example = """// Main function
int main() {
return 0;
}
"""
asm_code_example = """push ebp
mov ebp, esp
mov eax, 0
pop ebp
ret
"""
for task, text in [("Src2Asm", src_code_example), ("Asm2Src", asm_code_example)]:
print(f"\n===== {task} =====\n")
print("INPUT:\n", text)
print("\nOUTPUT:\n", generate_output(task, text))```