Freakz3z nielsr HF Staff commited on
Commit
4d810c3
·
verified ·
1 Parent(s): 10f1653

Add pipeline tag and link to paper (#1)

Browse files

- Add pipeline tag and link to paper (148b75054cbcbe200c880cbd4433da7ad104d3ed)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,7 +1,11 @@
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
 
 
3
  library_name: transformers
 
4
  model_name: qwen-json
 
5
  tags:
6
  - unsloth
7
  - trl
@@ -9,14 +13,11 @@ tags:
9
  - reinforcement-learning
10
  - json
11
  - recipe
12
- license: apache-2.0
13
- language:
14
- - en
15
  ---
16
 
17
  # RL-Struct: Bridging the Structure Gap
18
 
19
- [中文版本](./README_CN.md)
20
 
21
  We introduce **RL-Struct**, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging **GRPO (Gradient Regularized Policy Optimization)** and a **Multi-dimensional Reward Function**, our model achieves superior structural reliability without the high inference latency of constrained decoding.
22
 
@@ -53,4 +54,4 @@ Do not include any other text, explanations, or markdown. Only output valid JSON
53
  | :--- | :---: | :---: | :---: |
54
  | GPT-3.5 (Zero-shot) | 45.5% | 82.1% | 88.0% |
55
  | LLaMA-3-8B (SFT) | 78.2% | 85.4% | 86.0% |
56
- | **RL-Struct (Ours)** | **89.7%** | **92.1%** | **84.5%** |
 
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ language:
4
+ - en
5
  library_name: transformers
6
+ license: apache-2.0
7
  model_name: qwen-json
8
+ pipeline_tag: text-generation
9
  tags:
10
  - unsloth
11
  - trl
 
13
  - reinforcement-learning
14
  - json
15
  - recipe
 
 
 
16
  ---
17
 
18
  # RL-Struct: Bridging the Structure Gap
19
 
20
+ [中文版本](./README_CN.md) | [📚 Paper](https://huggingface.co/papers/2512.00319)
21
 
22
  We introduce **RL-Struct**, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging **GRPO (Gradient Regularized Policy Optimization)** and a **Multi-dimensional Reward Function**, our model achieves superior structural reliability without the high inference latency of constrained decoding.
23
 
 
54
  | :--- | :---: | :---: | :---: |
55
  | GPT-3.5 (Zero-shot) | 45.5% | 82.1% | 88.0% |
56
  | LLaMA-3-8B (SFT) | 78.2% | 85.4% | 86.0% |
57
+ | **RL-Struct (Ours)** | **89.7%** | **92.1%** | **84.5%** |