Text Generation
Transformers
English
nielsr HF Staff commited on
Commit
9fdcb57
Β·
verified Β·
1 Parent(s): 7e3eb77

Improve model card: Update license and add sample usage

Browse files

This PR improves the model card for the `noystl/mistral-e2e` model by:

- Updating the `license` metadata to `apache-2.0` for greater specificity and alignment with common open-source practices.
- Correcting the paper title in the introductory sentence to match the official title: "CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation".
- Adding a practical "Sample Usage" section, demonstrating how to use the model with the `transformers` library to perform recombination extraction, which is the model's primary intended use. This makes it easier for users to get started directly from the Hugging Face Hub.

These changes enhance the accuracy and utility of the model card.

Files changed (1) hide show
  1. README.md +51 -9
README.md CHANGED
@@ -6,12 +6,59 @@ datasets:
6
  language:
7
  - en
8
  library_name: transformers
9
- license: cc
10
  pipeline_tag: text-generation
11
  ---
12
 
13
- This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Idea Recombination in Scientific Literature](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.
14
- The model can be used for the information extraction task of identifying recombination examples within scientific text. For detailed usage instructions and reproduction of results, please refer to the Github repository linked above.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  **Bibtex**
17
  ```bibtex
@@ -24,9 +71,4 @@ The model can be used for the information extraction task of identifying recombi
24
  primaryClass={cs.CL},
25
  url={https://arxiv.org/abs/2505.20779},
26
  }
27
- ```
28
-
29
- **Quick Links**
30
- - 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
31
- - πŸ“ƒ [Paper](https://arxiv.org/abs/2505.20779)
32
- - πŸ› οΈ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
 
6
  language:
7
  - en
8
  library_name: transformers
9
+ license: apache-2.0
10
  pipeline_tag: text-generation
11
  ---
12
 
13
+ This Hugging Face repository contains a fine-tuned Mistral model trained for the task of extracting recombination examples from scientific abstracts, as described in the paper [CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation](https://huggingface.co/papers/2505.20779). The model utilizes a LoRA adapter on top of a Mistral base model.
14
+
15
+ The model can be used for the information extraction task of identifying recombination examples within scientific text.
16
+
17
+ **Quick Links**
18
+ - 🌐 [Project](https://noy-sternlicht.github.io/CHIMERA-Web)
19
+ - πŸ“ƒ [Paper](https://arxiv.org/abs/2505.20779)
20
+ - πŸ› οΈ [Code](https://github.com/noy-sternlicht/CHIMERA-KB)
21
+
22
+ ## Sample Usage
23
+
24
+ You can use this model with the Hugging Face `transformers` library to extract recombination instances from text. The model expects a specific prompt format for this task.
25
+
26
+ ```python
27
+ from transformers import pipeline, AutoTokenizer
28
+ import torch
29
+
30
+ model_id = "noystl/mistral-e2e"
31
+
32
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
33
+
34
+ # Initialize the text generation pipeline
35
+ generator = pipeline(
36
+ "text-generation",
37
+ model=model_id,
38
+ tokenizer=tokenizer,
39
+ torch_dtype=torch.bfloat16, # Use bfloat16 for better performance on compatible GPUs
40
+ device_map="auto", # Automatically select best device (GPU or CPU)
41
+ trust_remote_code=True # Required for custom model components
42
+ )
43
+
44
+ # Example abstract for recombination extraction
45
+ abstract = """The multi-granular diagnostic approach of pathologists can inspire Histopathological image classification.
46
+ This suggests a novel way to improve accuracy in image classification tasks."""
47
+
48
+ # Format the input prompt as expected by the model
49
+ prompt = f"Extract any recombination instances (inspiration/combination) from the following abstract:\
50
+ Abstract: {abstract}\
51
+ Recombination:"
52
+
53
+ # Generate the output. Use do_sample=False for deterministic extraction.
54
+ # max_new_tokens should be set appropriately for the expected JSON output.
55
+ outputs = generator(prompt, max_new_tokens=200, do_sample=False)
56
+
57
+ # Print the generated text, which should contain the extracted recombination in JSON format
58
+ print(outputs[0]["generated_text"])
59
+ ```
60
+
61
+ For more advanced usage, including training and evaluation, please refer to the [GitHub repository](https://github.com/noy-sternlicht/CHIMERA-KB).
62
 
63
  **Bibtex**
64
  ```bibtex
 
71
  primaryClass={cs.CL},
72
  url={https://arxiv.org/abs/2505.20779},
73
  }
74
+ ```