Add pipeline tag and link to paper (#1)

- Add pipeline tag and link to paper (148b75054cbcbe200c880cbd4433da7ad104d3ed)

Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,7 +1,11 @@
 ---
 base_model: Qwen/Qwen3-4B-Instruct-2507
 library_name: transformers
 model_name: qwen-json
 tags:
 - unsloth
 - trl
@@ -9,14 +13,11 @@ tags:
 - reinforcement-learning
 - json
 - recipe
-license: apache-2.0
-language:
-- en
 ---
 # RL-Struct: Bridging the Structure Gap
-[中文版本](./README_CN.md)
 We introduce **RL-Struct**, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging **GRPO (Gradient Regularized Policy Optimization)** and a **Multi-dimensional Reward Function**, our model achieves superior structural reliability without the high inference latency of constrained decoding.
@@ -53,4 +54,4 @@ Do not include any other text, explanations, or markdown. Only output valid JSON
 | :--- | :---: | :---: | :---: |
 | GPT-3.5 (Zero-shot) | 45.5% | 82.1% | 88.0% |
 | LLaMA-3-8B (SFT) | 78.2% | 85.4% | 86.0% |
-| **RL-Struct (Ours)** | **89.7%** | **92.1%** | **84.5%** |

 ---
 base_model: Qwen/Qwen3-4B-Instruct-2507
+language:
+- en
 library_name: transformers
+license: apache-2.0
 model_name: qwen-json
+pipeline_tag: text-generation
 tags:
 - unsloth
 - trl
 - reinforcement-learning
 - json
 - recipe
 ---
 # RL-Struct: Bridging the Structure Gap
+[中文版本](./README_CN.md) | [📚 Paper](https://huggingface.co/papers/2512.00319)
 We introduce **RL-Struct**, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging **GRPO (Gradient Regularized Policy Optimization)** and a **Multi-dimensional Reward Function**, our model achieves superior structural reliability without the high inference latency of constrained decoding.
 | :--- | :---: | :---: | :---: |
 | GPT-3.5 (Zero-shot) | 45.5% | 82.1% | 88.0% |
 | LLaMA-3-8B (SFT) | 78.2% | 85.4% | 86.0% |
+| **RL-Struct (Ours)** | **89.7%** | **92.1%** | **84.5%** |