This model how load colab with chat ui ?

#3
by Xhub1880 - opened

Give me the code

I think you should run it with huggingface transformer for a while. This is because this model's architecture is not a usual multiheaded attention. nor well adopted MAMBA or other quasi-linear scaling models. And because there seems to be an unique quark with flash attention dependency, doubt any OSS llm serving project like vllm would support this novel model in short time.

If you need chat ui, just use gradio. I am pretty sure any commercial ai chatbot would make one in no time.
And also be noted that, this is not an instruct (chat bot) weight.

Sign up or log in to comment