Instructions to use projecte-aina/aguila-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use projecte-aina/aguila-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="projecte-aina/aguila-7b", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("projecte-aina/aguila-7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use projecte-aina/aguila-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "projecte-aina/aguila-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "projecte-aina/aguila-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/projecte-aina/aguila-7b
- SGLang
How to use projecte-aina/aguila-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "projecte-aina/aguila-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "projecte-aina/aguila-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "projecte-aina/aguila-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "projecte-aina/aguila-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use projecte-aina/aguila-7b with Docker Model Runner:
docker model run hf.co/projecte-aina/aguila-7b
Error deploying Aguila on AWS SageMaker
Hello!
I'm trying to deploy this model on AWS SageMaker by following the steps provided in the documentation. However, I'm encountering some errors during the endpoint creation process. I've double-checked my configurations, but the issues persist.
If anyone has experience deploying this model on AWS SageMaker or any insights into resolving similar errors, I'd greatly appreciate your help. Thanks in advance for any assistance you can offer!
Errors:
Error: DownloadError
utils.convert_files(local_pt_files, local_st_files)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files
convert_file(pt_file, sf_file)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={
"format": "pt"
})
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten
raise RuntimeError(
And the following error:
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.h.6.mlp.dense_4h_to_h.weight', 'transformer.h.26.mlp.dense_4h_to_h.weight', 'transformer.h.4.self_attention.query_key_value.weight', 'transformer.h.22.mlp.dense_h_to_4h.weight', 'transformer.h.23.mlp.dense_h_to_4h.weight', 'transformer.h.5.mlp.dense_h_to_4h.weight', 'transformer.h.25.mlp.dense_h_to_4h.weight', 'transformer.h.25.mlp.dense_4h_to_h.weight', 'transformer.h.0.mlp.dense_4h_to_h.weight', 'transformer.h.11.mlp.dense_h_to_4h.weight', 'transformer.h.29.self_attention.dense.weight', 'transformer.h.24.self_attention.query_key_value.weight', 'transformer.h.24.mlp.dense_h_to_4h.weight', 'transformer.h.14.mlp.dense_4h_to_h.weight', 'transformer.h.1.self_attention.dense.weight', 'transformer.h.13.mlp.dense_4h_to_h.weight', 'transformer.h.8.self_attention.query_key_value.weight', 'transformer.h.20.self_attention.query_key_value.weight', 'transformer.h.27.mlp.dense_h_to_4h.weight', 'transformer.h.22.self_attention.query_key_value.weight', 'transformer.h.11.self_attention.query_key_value.weight', 'transformer.h.23.self_attention.query_key_value.weight', 'transformer.h.13.self_attention.dense.weight', 'transformer.h.15.mlp.dense_h_to_4h.weight', 'transformer.h.9.mlp.dense_h_to_4h.weight', 'transformer.h.15.self_attention.query_key_value.weight', 'transformer.h.24.mlp.dense_4h_to_h.weight', 'transformer.h.31.self_attention.query_key_value.weight', 'transformer.h.7.self_attention.dense.weight', 'transformer.h.27.self_attention.query_key_value.weight', 'transformer.h.1.mlp.dense_h_to_4h.weight', 'transformer.h.21.mlp.dense_4h_to_h.weight', 'transformer.h.24.self_attention.dense.weight', 'transformer.h.16.mlp.dense_4h_to_h.weight', 'transformer.h.20.mlp.dense_4h_to_h.weight', 'transformer.h.27.self_attention.dense.weight', 'transformer.h.4.mlp.dense_4h_to_h.weight', 'transformer.h.3.mlp.dense_h_to_4h.weight', 'transformer.h.25.self_attention.dense.weight', 'transformer.h.7.mlp.dense_4h_to_h.weight', 'transformer.h.17.self_attention.query_key_value.weight', 'transformer.h.19.self_attention.dense.weight', 'transformer.h.12.self_attention.query_key_value.weight', 'transformer.h.3.self_attention.dense.weight', 'transformer.h.28.mlp.dense_h_to_4h.weight', 'transformer.h.19.mlp.dense_h_to_4h.weight', 'transformer.h.20.self_attention.dense.weight', 'transformer.h.14.self_attention.query_key_value.weight', 'transformer.h.21.mlp.dense_h_to_4h.weight', 'transformer.h.12.mlp.dense_h_to_4h.weight', 'transformer.h.29.mlp.dense_h_to_4h.weight', 'transformer.h.6.mlp.dense_h_to_4h.weight', 'transformer.h.14.mlp.dense_h_to_4h.weight', 'transformer.h.30.self_attention.dense.weight', 'transformer.h.10.self_attention.query_key_value.weight', 'transformer.h.6.self_attention.query_key_value.weight', 'transformer.h.10.mlp.dense_4h_to_h.weight', 'transformer.h.23.mlp.dense_4h_to_h.weight', 'transformer.h.21.self_attention.query_key_value.weight', 'transformer.h.30.self_attention.query_key_value.weight', 'transformer.h.8.mlp.dense_h_to_4h.weight', 'transformer.h.30.mlp.dense_h_to_4h.weight', 'transformer.h.18.self_attention.query_key_value.weight', 'transformer.h.5.mlp.dense_4h_to_h.weight', 'transformer.h.15.mlp.dense_4h_to_h.weight', 'transformer.h.26.self_attention.dense.weight', 'transformer.h.9.self_attention.query_key_value.weight', 'transformer.h.17.mlp.dense_4h_to_h.weight', 'transformer.h.10.mlp.dense_h_to_4h.weight', 'transformer.h.6.self_attention.dense.weight', 'transformer.h.2.mlp.dense_4h_to_h.weight', 'transformer.h.5.self_attention.dense.weight', 'transformer.h.9.mlp.dense_4h_to_h.weight', 'transformer.h.3.mlp.dense_4h_to_h.weight', 'transformer.h.17.mlp.dense_h_to_4h.weight', 'transformer.h.27.mlp.dense_4h_to_h.weight', 'transformer.h.29.self_attention.query_key_value.weight', 'transformer.h.5.self_attention.query_key_value.weight', 'transformer.h.11.self_attention.dense.weight', 'transformer.h.19.mlp.dense_4h_to_h.weight', 'transformer.h.16.mlp.dense_h_to_4h.weight', 'transformer.h.8.mlp.dense_4h_to_h.weight', 'transformer.h.30.mlp.dense_4h_to_h.weight', 'transformer.h.31.mlp.dense_h_to_4h.weight', 'transformer.h.1.mlp.dense_4h_to_h.weight', 'transformer.h.28.self_attention.dense.weight', 'transformer.h.22.mlp.dense_4h_to_h.weight', 'transformer.h.31.self_attention.dense.weight', 'transformer.h.4.mlp.dense_h_to_4h.weight', 'transformer.h.19.self_attention.query_key_value.weight', 'transformer.h.0.self_attention.dense.weight', 'transformer.h.1.self_attention.query_key_value.weight', 'transformer.h.17.self_attention.dense.weight', 'transformer.h.18.self_attention.dense.weight', 'transformer.h.23.self_attention.dense.weight', 'transformer.h.28.self_attention.query_key_value.weight', 'transformer.h.12.mlp.dense_4h_to_h.weight', 'transformer.h.16.self_attention.query_key_value.weight', 'transformer.h.22.self_attention.dense.weight', 'transformer.h.18.mlp.dense_4h_to_h.weight', 'transformer.h.2.self_attention.query_key_value.weight', 'transformer.h.18.mlp.dense_h_to_4h.weight', 'transformer.h.8.self_attention.dense.weight', 'transformer.h.12.self_attention.dense.weight', 'transformer.h.29.mlp.dense_4h_to_h.weight', 'transformer.h.10.self_attention.dense.weight', 'transformer.h.26.mlp.dense_h_to_4h.weight', 'transformer.h.31.mlp.dense_4h_to_h.weight', 'transformer.h.3.self_attention.query_key_value.weight', 'transformer.h.16.self_attention.dense.weight', 'transformer.h.9.self_attention.dense.weight', 'transformer.h.21.self_attention.dense.weight', 'transformer.h.0.self_attention.query_key_value.weight', 'transformer.h.28.mlp.dense_4h_to_h.weight', 'transformer.word_embeddings.weight', 'transformer.h.0.mlp.dense_h_to_4h.weight', 'transformer.h.4.self_attention.dense.weight', 'transformer.h.13.self_attention.query_key_value.weight', 'transformer.h.7.mlp.dense_h_to_4h.weight', 'transformer.h.2.mlp.dense_h_to_4h.weight', 'transformer.h.7.self_attention.query_key_value.weight', 'transformer.h.11.mlp.dense_4h_to_h.weight', 'transformer.h.2.self_attention.dense.weight', 'transformer.h.13.mlp.dense_h_to_4h.weight', 'transformer.h.14.self_attention.dense.weight', 'transformer.h.15.self_attention.dense.weight', 'transformer.h.25.self_attention.query_key_value.weight', 'transformer.h.26.self_attention.query_key_value.weight', 'transformer.h.20.mlp.dense_h_to_4h.weight'}, {'transformer.h.22.input_layernorm.bias', 'transformer.h.17.input_layernorm.bias', 'transformer.h.20.input_layernorm.weight', 'transformer.h.20.input_layernorm.bias', 'transformer.h.1.input_layernorm.bias', 'transformer.h.18.input_layernorm.bias', 'transformer.h.28.input_layernorm.bias', 'transformer.h.7.input_layernorm.bias', 'transformer.h.5.input_layernorm.weight', 'transformer.h.8.input_layernorm.weight', 'transformer.h.0.input_layernorm.weight', 'transformer.h.9.input_layernorm.bias', 'transformer.h.12.input_layernorm.weight', 'transformer.h.19.input_layernorm.weight', 'transformer.h.30.input_layernorm.bias', 'transformer.h.31.input_layernorm.weight', 'transformer.h.6.input_layernorm.bias', 'transformer.h.7.input_layernorm.weight', 'transformer.h.6.input_layernorm.weight', 'transformer.ln_f.weight', 'transformer.h.5.input_layernorm.bias', 'transformer.h.13.input_layernorm.weight', 'transformer.h.13.input_layernorm.bias', 'transformer.h.30.input_layernorm.weight', 'transformer.h.19.input_layernorm.bias', 'transformer.h.18.input_layernorm.weight', 'transformer.h.16.input_layernorm.bias', 'transformer.h.27.input_layernorm.bias', 'transformer.h.21.input_layernorm.weight', 'transformer.h.14.input_layernorm.weight', 'transformer.h.16.input_layernorm.weight', 'transformer.h.10.input_layernorm.bias', 'transformer.h.25.input_layernorm.bias', 'transformer.h.23.input_layernorm.bias', 'transformer.h.29.input_layernorm.weight', 'transformer.h.11.input_layernorm.weight', 'transformer.h.3.input_layernorm.bias', 'transformer.ln_f.bias', 'transformer.h.22.input_layernorm.weight', 'transformer.h.28.input_layernorm.weight', 'transformer.h.0.input_layernorm.bias', 'transformer.h.1.input_layernorm.weight', 'transformer.h.14.input_layernorm.bias', 'transformer.h.24.input_layernorm.bias', 'transformer.h.8.input_layernorm.bias', 'transformer.h.21.input_layernorm.bias', 'transformer.h.10.input_layernorm.weight', 'transformer.h.12.input_layernorm.bias', 'transformer.h.27.input_layernorm.weight', 'transformer.h.31.input_layernorm.bias', 'transformer.h.11.input_layernorm.bias', 'transformer.h.23.input_layernorm.weight', 'transformer.h.26.input_layernorm.bias', 'transformer.h.29.input_layernorm.bias', 'transformer.h.15.input_layernorm.bias', 'transformer.h.2.input_layernorm.weight', 'transformer.h.2.input_layernorm.bias', 'transformer.h.24.input_layernorm.weight', 'transformer.h.4.input_layernorm.weight', 'transformer.h.26.input_layernorm.weight', 'transformer.h.15.input_layernorm.weight', 'transformer.h.4.input_layernorm.bias', 'transformer.h.9.input_layernorm.weight', 'transformer.h.25.input_layernorm.weight', 'transformer.h.3.input_layernorm.weight', 'transformer.h.17.input_layernorm.weight'}].
A potential way to correctly save your model is to use `save_model`.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors
Hi!
The error is related to safetensors, We have uploaded safetensors, but we still having some trouble with the text-generation-inference container, so it might still fail.
Sorry for the inconvenience, We hope to fix it soon.
Hi!
Everything should work now :-)
We have tested with v0.9.3 of the text-generation-inference.
Sorry for the delay,
Best regards,
Joan