Improve model card: Update license and add sample usage
Browse filesThis PR improves the model card for the `noystl/mistral-e2e` model by:
- Updating the `license` metadata to `apache-2.0` for greater specificity and alignment with common open-source practices.
- Correcting the paper title in the introductory sentence to match the official title: "CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation".
- Adding a practical "Sample Usage" section, demonstrating how to use the model with the `transformers` library to perform recombination extraction, which is the model's primary intended use. This makes it easier for users to get started directly from the Hugging Face Hub.
These changes enhance the accuracy and utility of the model card.
README.md
CHANGED
|
@@ -6,12 +6,59 @@ datasets:
|
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
library_name: transformers
|
| 9 |
-
license:
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
---
|
| 12 |
|
| 13 |
-
This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Idea
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
**Bibtex**
|
| 17 |
```bibtex
|
|
@@ -24,9 +71,4 @@ The model can be used for the information extraction task of identifying recombi
|
|
| 24 |
primaryClass={cs.CL},
|
| 25 |
url={https://arxiv.org/abs/2505.20779},
|
| 26 |
}
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
**Quick Links**
|
| 30 |
-
- π [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
|
| 31 |
-
- π [Paper](https://arxiv.org/abs/2505.20779)
|
| 32 |
-
- π οΈ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
|
|
|
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
library_name: transformers
|
| 9 |
+
license: apache-2.0
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
---
|
| 12 |
|
| 13 |
+
This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.
|
| 14 |
+
|
| 15 |
+
The model can be used for the information extraction task of identifying recombination examples within scientific text.
|
| 16 |
+
|
| 17 |
+
**Quick Links**
|
| 18 |
+
- π [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
|
| 19 |
+
- π [Paper](https://arxiv.org/abs/2505.20779)
|
| 20 |
+
- π οΈ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
|
| 21 |
+
|
| 22 |
+
## Sample Usage
|
| 23 |
+
|
| 24 |
+
You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.
|
| 25 |
+
|
| 26 |
+
```python
|
| 27 |
+
from transformers import pipeline, AutoTokenizer
|
| 28 |
+
import torch
|
| 29 |
+
|
| 30 |
+
model_id = "noystl/mistral-e2e"
|
| 31 |
+
|
| 32 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 33 |
+
|
| 34 |
+
# Initialize the text generation pipeline
|
| 35 |
+
generator = pipeline(
|
| 36 |
+
"text-generation",
|
| 37 |
+
model=model_id,
|
| 38 |
+
tokenizer=tokenizer,
|
| 39 |
+
torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
|
| 40 |
+
device_map="auto", # Automatically select best device (GPU or CPU)
|
| 41 |
+
trust_remote_code=True # Required for custom model components
|
| 42 |
+
)
|
| 43 |
+
|
| 44 |
+
# Example abstract for recombination extraction
|
| 45 |
+
abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
|
| 46 |
+
This suggests a novel way to improve accuracy in image classification tasks."""
|
| 47 |
+
|
| 48 |
+
# Format the input prompt as expected by the model
|
| 49 |
+
prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
|
| 50 |
+
Abstract: {abstract}\
|
| 51 |
+
Recombination:"
|
| 52 |
+
|
| 53 |
+
# Generate the output. Use do_sample=False for deterministic extraction.
|
| 54 |
+
# max_new_tokens should be set appropriately for the expected JSON output.
|
| 55 |
+
outputs = generator(prompt, max_new_tokens=200, do_sample=False)
|
| 56 |
+
|
| 57 |
+
# Print the generated text, which should contain the extracted recombination in JSON format
|
| 58 |
+
print(outputs[0]["generated_text"])
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).
|
| 62 |
|
| 63 |
**Bibtex**
|
| 64 |
```bibtex
|
|
|
|
| 71 |
primaryClass={cs.CL},
|
| 72 |
url={https://arxiv.org/abs/2505.20779},
|
| 73 |
}
|
| 74 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|