YongganFu commited on
Commit
fb5d4e1
·
verified ·
1 Parent(s): c50ca7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -97,6 +97,41 @@ setattr(config, "attention_implementation_new", "flash_attention_2")
97
  model = AutoModelForCausalLM.from_pretrained(repo_name, config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
98
  ```
99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  ## Citation
101
  ```
102
  @misc{fu2025nemotronflash,
 
97
  model = AutoModelForCausalLM.from_pretrained(repo_name, config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
98
  ```
99
 
100
+ ## Running Nemotron-Flash with TensorRT-LLM
101
+
102
+ ### Setup
103
+ Installation + quick start for TensorRT-LLM: <a href="https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html">Tutorial</a>.
104
+
105
+ ### Quick example
106
+
107
+ An example script for running through the generation workflow:
108
+ ```
109
+ cd examples/auto_deploy
110
+ python build_and_run_ad.py --model nvidia/Nemotron-Flash-3B-Instruct --args.yaml-extra nemotron_flash.yaml
111
+ ```
112
+
113
+ ### Serving with trtllm-serve
114
+
115
+ - Spin up a trtllm server (more details are in this <a href="https://nvidia.github.io/TensorRT-LLM/commands/trtllm-serve/trtllm-serve.html#starting-a-server">doc</a>):
116
+ ```
117
+ trtllm-serve serve nvidia/Nemotron-Flash-3B-Instruct \
118
+ --backend _autodeploy \
119
+ --trust_remote_code \
120
+ --extra_llm_api_options examples/auto_deploy/nemotron_flash.yaml
121
+ ```
122
+
123
+ - Send a request (more details are in this <a href="https://nvidia.github.io/TensorRT-LLM/examples/curl_chat_client.html">doc</a>):
124
+ ```
125
+ curl http://localhost:8000/v1/chat/completions \
126
+ -H "Content-Type: application/json" \
127
+ -d '{
128
+ "model": "nvidia/Nemotron-Flash-3B-Instruct",
129
+ "messages":[{"role": "user", "content": "Where is New York?"}],
130
+ "max_tokens": 16,
131
+ "temperature": 0
132
+ }'
133
+ ```
134
+
135
  ## Citation
136
  ```
137
  @misc{fu2025nemotronflash,