| --- |
| license: mit |
| language: |
| - en |
| library_name: onnxruntime |
| tags: |
| - music-generation |
| - symbolic-music |
| - lilylet |
| - notagen |
| - onnx |
| - int8 |
| pipeline_tag: text-generation |
| --- |
| |
| # LilyNota |
|
|
| A symbolic-music generation model that writes scores in **[Lilylet](https://github.com/k-l-lambda/lilylet)** — a compact, LilyPond-flavored text notation. Given a short metadata prompt (composer / genre / instrument / key / time signature …), the model autoregressively composes a multi-measure, multi-staff piece. |
|
|
| It powers the [**LilyScript**](https://huggingface.co/spaces/k-l-lambda/LilyScript) Space, where generated Lilylet is rendered to a sheet-music score (Verovio) and played back as MIDI. |
|
|
| > Note: the weights here are an early snapshot and will be refreshed as training continues. |
|
|
| ## Model |
|
|
| `LilyNota` — a hierarchical, two-level NotaGen-style decoder (Llama backbone): |
|
|
| - **Patch-level decoder** — 10 layers, hidden 1024, 16 heads. Consumes the score as a stream of fixed-size *patches* (16 tokens each) and produces a hidden state per patch. |
| - **Token-level decoder** — 4 layers. Expands each patch state into its concrete Lilylet tokens. |
|
|
| | | value | |
| |---|---| |
| | params | ~196 M | |
| | base type | llama (bf16 trained) | |
| | patch size | 16 tokens | |
| | vocab | 256 | |
| | context | 1024 patches | |
|
|
| ## Files |
|
|
| ``` |
| model_*.chkpt torch training checkpoint (full precision) |
| .state.yaml training / architecture config |
| tokenizer.json tokenizer |
| onnx/ |
| patch_kv_int8.onnx patch decoder, int8, with KV-cache (incremental) |
| token_kv_int8.onnx token decoder, int8, with KV-cache (incremental) |
| wte.npy token-embedding table [vocab, hidden] |
| geometry.json patch size, special ids, per-level KV geometry |
| ``` |
|
|
| The `onnx/` bundle is **torch-free**: a generator needs only `onnxruntime` + `numpy` |
| to run it (the embedding lookup and sampling live outside the graph). int8 dynamic |
| quantization plus a two-level KV cache make it a fast CPU inference path. |
|
|
| ## Usage |
|
|
| The reference runtime is `StreamingLilyletGenerator` in the |
| [LilyScript Space](https://huggingface.co/spaces/k-l-lambda/LilyScript) |
| (`lilyscript/generator.py`). Sketch: |
|
|
| ```python |
| from lilyscript.generator import StreamingLilyletGenerator |
| |
| gen = StreamingLilyletGenerator(model_dir='onnx', asset_dir='onnx') |
| prompt = '[composer "Beethoven, Ludwig van"]\n[genre "Classical"]\n[instrument "Keyboard"]' |
| for raw, pretty, done in gen.generate_stream(prompt_text=prompt, measures=8, temperature=1.0, seed=42): |
| pass # `pretty` is measure-segmented Lilylet; streams one patch at a time |
| print(pretty) |
| ``` |
|
|
| Output is Lilylet text, e.g.: |
|
|
| ``` |
| [composer "Beethoven, Ludwig van"] |
| [genre "Classical"] |
| \key g \major \time 3/4 \clef "treble" \tempo 4=54 ^\markup "Andante con moto" r2. \\ |
| ... |
| ``` |
|
|
| ## License |
|
|
| MIT. |
|
|