Instructions to use tiiuae/falcon-40b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-40b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-40b-instruct", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tiiuae/falcon-40b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-40b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-40b-instruct
- SGLang
How to use tiiuae/falcon-40b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-40b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-40b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-40b-instruct with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-40b-instruct
Custom 4-bit Finetuning 5-7 times faster inference than QLora
pinnedπ 3
1
#9 opened almost 3 years ago
by
rmihaylov
Install & run tiiuae/falcon-40b-instruct easily using llmpm
#95 opened 3 months ago
by
sarthak-saxena
Update README.md
#94 opened 4 months ago
by
cherry0328
I'm not fine with quality of this model compared to the size in billions of parameters.
#93 opened about 1 year ago
by
JLouisBiz
Update tokenizer_config.json
#92 opened over 1 year ago
by
Snanni
Update tokenizer_config.json
1
#91 opened over 1 year ago
by
Maryammmmmm
falcon-40b-instruct error on Inference endpoint while deploying
#90 opened over 1 year ago
by
digitalsanjeev
AI World
#89 opened about 2 years ago
by
MohammadMuzamil
Adding `safetensors` variant of this model
#88 opened over 2 years ago
by
Dennison33
combining falcon 40b instruct with langchain
#87 opened over 2 years ago
by
rra21
Update generation_config.json
1
#85 opened over 2 years ago
by
nkasmanoff
Update generation_config.json
1
#84 opened over 2 years ago
by
nkasmanoff
Getting gibberish output with Falcon-40b instruct
π 1
2
#83 opened over 2 years ago
by
harsh244
Falcon 40B Inference on GKE Autopilot A100 40GB
3
#82 opened over 2 years ago
by
bshongwe
Adding `safetensors` variant of this model
#81 opened over 2 years ago
by
Flolight
Adding `safetensors` variant of this model
2
#80 opened over 2 years ago
by
Flolight
CPU or GPU
1
#76 opened almost 3 years ago
by
lalit34
Optimizing Inference Time for Chat Conversations on Falcon
#73 opened almost 3 years ago
by
humza-sami
Use input attention mask instead of casual mask in attention
#72 opened almost 3 years ago
by
CyberZHG
is there a way to not use trust_remote = True
#71 opened almost 3 years ago
by
momentumhd
Unable to load and run finetuned falcon model
#70 opened almost 3 years ago
by
DioulaD
Parameters contains nan numbers when loading model locally
#69 opened almost 3 years ago
by
yunsxie
ValueError: sharded is not supported for AutoModel ERROR
π 2
8
#68 opened almost 3 years ago
by
peyers
ValueError in KoboldAI when loading the model
1
#66 opened almost 3 years ago
by
JermemyHaschal
Cannot set "instructions" when invoking inference endpoint
1
#65 opened almost 3 years ago
by
aruana
Changes in modelling_RW.py to be able to handle past_key_values for faster model generations
#64 opened almost 3 years ago
by
puru22
Model sometimes generates '</s>'
1
#63 opened almost 3 years ago
by
jlzhou
Correct blogpost link
#62 opened almost 3 years ago
by
isydmr
Error: ShardCannotStart
π 3
#61 opened almost 3 years ago
by
Bhupesh2003
Finetuning Falcon-40B-Instruct For ChatBot Use Case
π 2
1
#59 opened almost 3 years ago
by
sdkramer10
Adding `safetensors` variant of this model
2
#58 opened almost 3 years ago
by
nth-attempt
Add `tokenizer_class` to get `pipeline` to load tokenizer
#57 opened almost 3 years ago
by
chiragjn
Adding `safetensors` variant of this model
#56 opened almost 3 years ago
by
shayan
ValueError: Error raised by inference API: Model tiiuae/falcon-40b-instruct time out using HuggingFaceHub
1
#55 opened almost 3 years ago
by
nicoleds
Question about Apache 2.0 license
π 1
2
#54 opened almost 3 years ago
by
psinger
Running the Falcon-40B-Instruct model on Azure Kubernetes Service
#53 opened almost 3 years ago
by
zioproto
Experimental ggml demos
2
#52 opened almost 3 years ago
by
matthoffner
Truncated output from API call through langchain
4
#51 opened almost 3 years ago
by deleted
Experiences with complex instructions
1
#50 opened almost 3 years ago
by
Tuana
Update README.md
#49 opened almost 3 years ago
by
saattrupdan
Why Rotary Positional Embeddings Over Alibi?
π 11
#48 opened almost 3 years ago
by
mallorbc
About Input validation error: `inputs` tokens + `max_new_tokens` must be <= 1512.
π€ 4
3
#47 opened almost 3 years ago
by
Holynull
is Alibi version available for fine tuning to a large context window?
3
#46 opened almost 3 years ago
by
run
Finetune Falcon-4b with large token size.
π 2
2
#44 opened almost 3 years ago
by
amnasher
Model returns entire input prompt together with output
π€ 1
11
#43 opened almost 3 years ago
by
andee96
Instruction prompt
β€οΈ 1
3
#42 opened almost 3 years ago
by
mazzaqq
Update README.md
#41 opened almost 3 years ago
by
zagg8705
Arabic Language support
π 3
2
#40 opened almost 3 years ago
by
Hgdawy
Request: DOI
#39 opened almost 3 years ago
by
ongkn
what is the input token length of Falcon-40B and -7B models?
π 6
3
#38 opened almost 3 years ago
by
sermolin