findthehead commited on
Commit
38fe695
·
1 Parent(s): e8ea541

initial commit

Browse files
Files changed (1) hide show
  1. README.md +27 -125
README.md CHANGED
@@ -17,13 +17,8 @@ CVEParrot is a Google T5 model fine-tuned on CVE (Common Vulnerabilities and Exp
17
 
18
  ## Model Description
19
 
20
- - **Developed by:** findthehead
21
- - **Base Model:** Google T5
22
- - **Training Data:** CVE Database
23
- - **Language:** English
24
- - **License:** Apache 2.0
25
-
26
- This model has been specifically trained to understand and generate content related to cybersecurity vulnerabilities, CVE descriptions, and security intelligence.
27
 
28
  ## Use Cases
29
 
@@ -33,86 +28,39 @@ This model has been specifically trained to understand and generate content rela
33
  - Automated vulnerability documentation
34
  - CVE information extraction and summarization
35
 
36
- ## How to Use
37
-
38
- ### Option 1: Using Hugging Face Transformers (Safetensors)
39
-
40
- Install the required dependencies:
41
-
42
- ```bash
43
- pip install transformers torch
44
- ```
45
-
46
- **Inference Code:**
47
 
48
  ```python
49
- from transformers import T5Tokenizer, T5ForConditionalGeneration
50
-
51
- # Load model and tokenizer
 
 
 
 
 
52
  model_name = "Prachir-AI/cveparrot"
53
- tokenizer = T5Tokenizer.from_pretrained(model_name)
54
- model = T5ForConditionalGeneration.from_pretrained(model_name)
55
-
56
- # Prepare input
57
- input_text = "Describe CVE-2024-1234"
58
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids
59
-
60
- # Generate output
61
- outputs = model.generate(
62
- input_ids,
63
- max_length=512,
64
- num_beams=4,
65
- early_stopping=True,
66
- temperature=0.7,
67
- do_sample=True
68
- )
69
 
70
- # Decode and print result
71
- generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
72
- print(generated_text)
73
- ```
74
 
75
- **Advanced Usage with Custom Parameters:**
76
 
77
- ```python
78
- from transformers import T5Tokenizer, T5ForConditionalGeneration
79
 
80
- # Load model and tokenizer
81
- model_name = "Prachir-AI/cveparrot"
82
- tokenizer = T5Tokenizer.from_pretrained(model_name)
83
- model = T5ForConditionalGeneration.from_pretrained(model_name)
84
 
85
- # Move to GPU if available
86
- import torch
87
- device = "cuda" if torch.cuda.is_available() else "cpu"
88
- model = model.to(device)
89
-
90
- # Example prompts
91
- prompts = [
92
- "Explain the security vulnerability:",
93
- "Describe the CVE:",
94
- "What is the impact of:",
95
- ]
96
-
97
- input_text = prompts[0] + " CVE-2024-1234"
98
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
99
-
100
- # Generate with custom parameters
101
- outputs = model.generate(
102
- input_ids,
103
- max_length=256,
104
- min_length=50,
105
- num_beams=5,
106
- no_repeat_ngram_size=2,
107
- early_stopping=True,
108
- temperature=0.8,
109
- top_k=50,
110
- top_p=0.95,
111
- do_sample=True
112
- )
113
 
114
- generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
115
- print(generated_text)
116
  ```
117
 
118
  ### Option 2: Using GGUF Model with Ollama (Local Inference)
@@ -208,50 +156,4 @@ This model was fine-tuned on CVE database entries to understand and generate sec
208
  - CVE descriptions and technical details
209
  - Vulnerability severity and impact analysis
210
  - Security patches and mitigation strategies
211
- - Affected software and version information
212
-
213
- ## Limitations
214
-
215
- - The model is trained on historical CVE data and may not have information about very recent vulnerabilities
216
- - Generated content should be verified against official CVE databases
217
- - The model may occasionally generate plausible but incorrect security information
218
- - Not a replacement for professional security analysis
219
-
220
- ## Ethical Considerations
221
-
222
- This model is designed for:
223
- - ✅ Security research and education
224
- - ✅ Vulnerability analysis and documentation
225
- - ✅ Automated security intelligence gathering
226
- - ✅ Assisting security professionals
227
-
228
- This model should NOT be used for:
229
- - ❌ Creating or exploiting vulnerabilities
230
- - ❌ Malicious hacking activities
231
- - ❌ Unauthorized security testing
232
-
233
- ## Citation
234
-
235
- If you use this model in your research or applications, please cite:
236
-
237
- ```bibtex
238
- @model{cveparrot2024,
239
- author = {findthehead},
240
- title = {CVEParrot: A T5 Model for CVE Analysis},
241
- year = {2024},
242
- publisher = {HuggingFace},
243
- url = {https://huggingface.co/Prachir-AI/cveparrot}
244
- }
245
- ```
246
-
247
- ## Developer
248
-
249
- - **HuggingFace:** [findthehead](https://huggingface.co/findthehead)
250
-
251
- ## Feedback and Contributions
252
-
253
- For issues, questions, or contributions, please visit the model repository on HuggingFace.
254
-
255
- ## License
256
-
257
- This model is released under the Apache 2.0 License. See LICENSE file for details.
 
17
 
18
  ## Model Description
19
 
20
+ - **Developed by:** [Subhay Roy Chowdhury(findthehead)](https://huggingface.co/findthehead)
21
+ - **Base Model:** Google T5 Small
 
 
 
 
 
22
 
23
  ## Use Cases
24
 
 
28
  - Automated vulnerability documentation
29
  - CVE information extraction and summarization
30
 
31
+ ## Inference Code
 
 
 
 
 
 
 
 
 
 
32
 
33
  ```python
34
+ import warnings
35
+ import os
36
+ os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"
37
+ os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
38
+ os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
39
+ warnings.filterwarnings("ignore")
40
+ import torch
41
+ from transformers import AutoTokenizer, T5ForConditionalGeneration
42
  model_name = "Prachir-AI/cveparrot"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
 
 
 
 
45
 
46
+ model = T5ForConditionalGeneration.from_pretrained(model_name)
47
 
48
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
49
+ model.to(device)
50
 
51
+ prompt = "Provide detailed information about CVE-2021-3184."
52
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
 
 
53
 
54
+ with torch.no_grad():
55
+ output_ids = model.generate(
56
+ **inputs,
57
+ max_new_tokens=128,
58
+ temperature=1.0,
59
+ do_sample=True,
60
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
+ response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
63
+ print(response)
64
  ```
65
 
66
  ### Option 2: Using GGUF Model with Ollama (Local Inference)
 
156
  - CVE descriptions and technical details
157
  - Vulnerability severity and impact analysis
158
  - Security patches and mitigation strategies
159
+ - Affected software and version information