JeloH
/

LLM4CodeRE-S2S-V1

Model card Files Files and versions

LLM4CodeRE-S2S-V1 / README.md

JeloH's picture

Update README.md

931f8c2 verified 4 days ago

|

history blame contribute delete

3.65 kB

	---
	base_model:
	- JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI
	library_name: peft
	---

	# Model Card for JeloH/LLM4CodeRE-S2S-V1

	LLM4CodeRE-S2S-V1 is a PEFT fine-tuned causal language model for reverse-engineering-oriented code translation tasks. It supports sequence-to-sequence style prompting for mapping between source code, assembly, and binary-related representations.

	## Model Details

	### Model Description

	LLM4CodeRE-S2S-V1 is a multi-task reverse engineering model built on top of `JeloH/xGenq-qwen2.5-coder-7b-instruct-OKI`. It was fine-tuned using LoRA on instruction-style code translation tasks, including assembly-to-source and source-to-assembly conversion, along with binary-related code transformation tasks.

	The model uses a causal language modeling objective with sequence-to-sequence style prompting. Here, “S2S” refers to prompt-based input-output translation within a single causal sequence, rather than a traditional encoder-decoder architecture.


	## Uses




	![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/V16QXd0EKLps7k0PdRDva.png)


	![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/NWnY_5iuqEVE4RZ5pru2B.png)


	### Direct Use

	This model is intended for research and experimental reverse-engineering tasks, including:

	- Assembly to source code (`Asm2Src`)
	- Source code to assembly (`Src2Asm`)
	- Binary to source code (`Binary2Src`)
	- Source code to binary (`Src2Binary`)
	- Binary to assembly (`Binary2Asm`)

	### Downstream Use [optional]

	Potential downstream uses include:

	- reverse engineering research
	- code translation experiments
	- educational use in code understanding
	- program analysis and representation learning pipelines


	### Results


	![image](https://cdn-uploads.huggingface.co/production/uploads/65f88233cc82fb70825a3480/1vLQ7-EzAZqvUluZF8xq9.png)

	### Citation

	Jelodar, H., Bai, S., Nwankwo, T. E., Hamedi, P., Meymani, M., Razavi-Far, R., & Ghorbani, A. A. (2026).
	LLM4CodeRE: Generative AI for code decompilation analysis and reverse engineering. arXiv. https://doi.org/10.48550/arXiv.2604.06095


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "JeloH/LLM4CodeRE-S2S-V1"
	device = "cuda" if torch.cuda.is_available() else "cpu"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto"
	)
	model.eval()

	def generate_output(task, input_text):
	if task == "Asm2Src":
	prompt = f"Task: Asm2Src. Convert assembly to C/C++:\n\n{input_text}\n\nSource code:"
	elif task == "Src2Asm":
	prompt = f"Task: Src2Asm. Convert C/C++ to assembly:\n\n{input_text}\n\nAssembly:"
	else:
	raise ValueError("Only Asm2Src and Src2Asm are supported in this example")

	inputs = tokenizer(prompt, return_tensors="pt").to(device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=512,
	do_sample=False
	)

	result = tokenizer.decode(outputs[0], skip_special_tokens=True)
	return result[len(prompt):].strip()

	src_code_example = """// Main function
	int main() {
	return 0;
	}
	"""

	asm_code_example = """push ebp
	mov ebp, esp
	mov eax, 0
	pop ebp
	ret
	"""

	for task, text in [("Src2Asm", src_code_example), ("Asm2Src", asm_code_example)]:
	print(f"\n===== {task} =====\n")
	print("INPUT:\n", text)
	print("\nOUTPUT:\n", generate_output(task, text))```