File size: 2,168 Bytes
5513fd3
 
 
f0bdac0
 
 
 
 
 
 
 
 
5513fd3
 
f0bdac0
5513fd3
f0bdac0
 
5513fd3
f0bdac0
5513fd3
f0bdac0
5513fd3
f0bdac0
5513fd3
f0bdac0
 
 
 
 
 
 
5513fd3
f0bdac0
5513fd3
f0bdac0
5513fd3
f0bdac0
 
 
 
5513fd3
f0bdac0
5513fd3
f0bdac0
5513fd3
f0bdac0
 
 
5513fd3
f0bdac0
 
5513fd3
f0bdac0
 
 
5513fd3
f0bdac0
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
base_model: google/gemma-3-1b-it
library_name: peft
tags:
  - gemma
  - peft
  - lora
  - classification
  - korean
  - academic-conference
  - lightweight-model
license: apache-2.0
---

# ๋…ผ๋ฌธ ์ œ๋ชฉ โ†’ ํ•™์ˆ ๋Œ€ํšŒ ๋ถ„๋ฅ˜ LLM (IITP ์‹ค๋ฌด ๊ธฐ๋ฐ˜ ๊ฒฝ๋Ÿ‰ AI)

์ด ๋ชจ๋ธ์€ ๋…ผ๋ฌธ ์ œ๋ชฉ์„ ์ž…๋ ฅํ•˜๋ฉด ํ•ด๋‹น ๋…ผ๋ฌธ์ด ๋ฐœํ‘œ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ํ•™์ˆ ๋Œ€ํšŒ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ํ•œ๊ตญ์–ด ๊ฒฝ๋Ÿ‰ LLM์ž…๋‹ˆ๋‹ค.  
Agent AI ํ™œ์šฉ ํ™•์‚ฐ๊ณผ ๋งž๋ฌผ๋ ค, ์—ฐ๊ตฌํ˜„์žฅ์—์„œ ์ž์—ฐ์–ด ๊ธฐ๋ฐ˜์˜ ๋ถ„๋ฅ˜ ์—…๋ฌด๋ฅผ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ์‹ค๋ฌด ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›(IITP)์˜ ์ •์ฑ… ์ˆ˜ํ˜œ์ž๋กœ์„œ, ์‹ค์ œ ๊ธฐ๊ด€์—์„œ ์ง๋ฉดํ•œ '๋…ผ๋ฌธ-ํ•™์ˆ ๋Œ€ํšŒ ๋ถ„๋ฅ˜' ์—…๋ฌด๋ฅผ ํšจ์œจํ™”ํ•˜๋Š” ๋ฐ ๊ธฐ์—ฌํ•˜๊ณ ์ž ๊ธฐํš๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

---

## ๐Ÿง  Model Details

- **Base Model**: `google/gemma-3-1b-it`
- **Fine-tuning method**: LoRA (PEFT)
- **Language**: Korean
- **Task**: Classification (๋…ผ๋ฌธ ์ œ๋ชฉ โ†’ ํ•™์ˆ ๋Œ€ํšŒ)
- **Developed by**: ๋ณ€์ •ํ 
- **Affiliation**: ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›(IITP) ์—…๋ฌด ์ง€์›์šฉ Test ๋ชจ๋ธ
- **Fine-tuned on**: ํ•œ๊ตญ์—ฐ๊ตฌ์žฌ๋‹จ ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์‹ฌ์‚ฌ ๋ฐ์ดํ„ฐ (๊ณต๊ฐœ CSV ํ™œ์šฉ)

---

## ๐Ÿงพ Dataset

- **์›๋ณธ**: `ํ•œ๊ตญ์—ฐ๊ตฌ์žฌ๋‹จ_ํ•™์ˆ ๋Œ€ํšŒ๋…ผ๋ฌธ์‹ฌ์‚ฌ_20241231.csv`
- **๊ตฌ์„ฑ**: `{"text": ๋…ผ๋ฌธ ์ œ๋ชฉ, "label": ํ•™์ˆ ๋Œ€ํšŒ๋ช…}` ํ˜•ํƒœ์˜ JSONL ๋ณ€ํ™˜
- **์ƒ˜ํ”Œ ์ˆ˜**: ์•ฝ 9,000๊ฑด
- **์ „์ฒ˜๋ฆฌ ๋ฐฉ์‹**: `[INST] ๋…ผ๋ฌธ ์ œ๋ชฉ: {์ œ๋ชฉ} ์–ด๋–ค ํ•™์ˆ ๋Œ€ํšŒ๋ช…์ธ๊ฐ€์š”? [/INST] {ํ•™์ˆ ๋Œ€ํšŒ๋ช…}` ํ˜•์‹์œผ๋กœ Prompt ์ƒ์„ฑ

---

## ๐Ÿš€ Model Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("JeongHeum/gemma3-korean-academic-classifier")
tokenizer = AutoTokenizer.from_pretrained("JeongHeum/gemma3-korean-academic-classifier")

prompt = "[INST] ๋…ผ๋ฌธ ์ œ๋ชฉ: ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์Œ์„ฑ ์ธ์‹ ์‹œ์Šคํ…œ [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=20)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# ์˜ˆ์‹œ ์ถœ๋ ฅ: ํ•œ๊ตญ์Œ์„ฑ์ฒ˜๋ฆฌํ•™ํšŒ