Instructions to use webbigdata/ALMA-7B-Ja-GPTQ-Ja-En with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use webbigdata/ALMA-7B-Ja-GPTQ-Ja-En with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="webbigdata/ALMA-7B-Ja-GPTQ-Ja-En")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("webbigdata/ALMA-7B-Ja-GPTQ-Ja-En")
model = AutoModelForCausalLM.from_pretrained("webbigdata/ALMA-7B-Ja-GPTQ-Ja-En")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use webbigdata/ALMA-7B-Ja-GPTQ-Ja-En with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/webbigdata/ALMA-7B-Ja-GPTQ-Ja-En

SGLang

How to use webbigdata/ALMA-7B-Ja-GPTQ-Ja-En with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "webbigdata/ALMA-7B-Ja-GPTQ-Ja-En",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use webbigdata/ALMA-7B-Ja-GPTQ-Ja-En with Docker Model Runner:
```
docker model run hf.co/webbigdata/ALMA-7B-Ja-GPTQ-Ja-En
```

New Version has been released.

2024/03/04
webbigdata/C3TR-Adapter
Memory GPU requirement has increased to 8.1 GB. However, it is possible to run it with the free version of Colab and the performance is much improved!

2023/10/21
ALMA-7B-Ja-V2
Overall performance has been raised.

Below is a description of the old version. We urge you to try the newer version above.

webbigdata/ALMA-7B-Ja-GPTQ-Ja-En

Original ALMA Model ALMA-7B. (26.95GB) is a new paradigm translation model.

ALMA-7B-Ja-GPTQ-Ja-En is a machine translation model that uses ALMA's learning method to translate Japanese to English.(13.3GB)

This model is GPTQ quantized version model that reduces model size(3.9GB) and memory usage, although the performance is probably lower.
And translation ability for languages other than Japanese and English has deteriorated significantly.

Free Colab Sample

If you want to translate the entire file at once, try Colab below.
ALMA_7B_Ja_GPTQ_Ja_En_batch_translation_sample

if you enconter error below. RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
It's mean your memory is not enough. decrease your num_beams or token size.

ALMA (Advanced Language Model-based trAnslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance. Please find more details in their paper.

@misc{xu2023paradigm,
      title={A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models}, 
      author={Haoran Xu and Young Jin Kim and Amr Sharaf and Hany Hassan Awadalla},
      year={2023},
      eprint={2309.11674},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

about this work

This work was done by : webbigdata.

Downloads last month: 13

Paper for webbigdata/ALMA-7B-Ja-GPTQ-Ja-En

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Paper • 2309.11674 • Published Sep 20, 2023 • 33