LLM4CodeRE-S2S-V1 / README.md
JeloH's picture
Update README.md
931f8c2 verified
metadata
base_model:
  - JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI
library_name: peft

Model Card for JeloH/LLM4CodeRE-S2S-V1

LLM4CodeRE-S2S-V1 is a PEFT fine-tuned causal language model for reverse-engineering-oriented code translation tasks. It supports sequence-to-sequence style prompting for mapping between source code, assembly, and binary-related representations.

Model Details

Model Description

LLM4CodeRE-S2S-V1 is a multi-task reverse engineering model built on top of JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI. It was fine-tuned using LoRA on instruction-style code translation tasks, including assembly-to-source and source-to-assembly conversion, along with binary-related code transformation tasks.

The model uses a causal language modeling objective with sequence-to-sequence style prompting. Here, “S2S” refers to prompt-based input-output translation within a single causal sequence, rather than a traditional encoder-decoder architecture.

Uses

image

image

Direct Use

This model is intended for research and experimental reverse-engineering tasks, including:

  • Assembly to source code (Asm2Src)
  • Source code to assembly (Src2Asm)
  • Binary to source code (Binary2Src)
  • Source code to binary (Src2Binary)
  • Binary to assembly (Binary2Asm)

Downstream Use [optional]

Potential downstream uses include:

  • reverse engineering research
  • code translation experiments
  • educational use in code understanding
  • program analysis and representation learning pipelines

Results

image

Citation

Jelodar, H., Bai, S., Nwankwo, T. E., Hamedi, P., Meymani, M., Razavi-Far, R., & Ghorbani, A. A. (2026). LLM4CodeRE: Generative AI for code decompilation analysis and reverse engineering. arXiv. https://doi.org/10.48550/arXiv.2604.06095

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "JeloH/LLM4CodeRE-S2S-V1"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
model.eval()

def generate_output(task, input_text):
    if task == "Asm2Src":
        prompt = f"Task: Asm2Src. Convert assembly to C/C++:\n\n{input_text}\n\nSource code:"
    elif task == "Src2Asm":
        prompt = f"Task: Src2Asm. Convert C/C++ to assembly:\n\n{input_text}\n\nAssembly:"
    else:
        raise ValueError("Only Asm2Src and Src2Asm are supported in this example")

    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            do_sample=False
        )

    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return result[len(prompt):].strip()

src_code_example = """// Main function
int main() {
    return 0;
}
"""

asm_code_example = """push ebp
mov ebp, esp
mov eax, 0
pop ebp
ret
"""

for task, text in [("Src2Asm", src_code_example), ("Asm2Src", asm_code_example)]:
    print(f"\n===== {task} =====\n")
    print("INPUT:\n", text)
    print("\nOUTPUT:\n", generate_output(task, text))```