File size: 2,876 Bytes

---
license: mit
language:
  - en
library_name: onnxruntime
tags:
  - music-generation
  - symbolic-music
  - lilylet
  - notagen
  - onnx
  - int8
pipeline_tag: text-generation
---

# LilyNota

A symbolic-music generation model that writes scores in **[Lilylet](https://github.com/k-l-lambda/lilylet)** — a compact, LilyPond-flavored text notation. Given a short metadata prompt (composer / genre / instrument / key / time signature …), the model autoregressively composes a multi-measure, multi-staff piece.

It powers the [**LilyScript**](https://huggingface.co/spaces/k-l-lambda/LilyScript) Space, where generated Lilylet is rendered to a sheet-music score (Verovio) and played back as MIDI.

> Note: the weights here are an early snapshot and will be refreshed as training continues.

## Model

`LilyNota` — a hierarchical, two-level NotaGen-style decoder (Llama backbone):

- **Patch-level decoder** — 10 layers, hidden 1024, 16 heads. Consumes the score as a stream of fixed-size *patches* (16 tokens each) and produces a hidden state per patch.
- **Token-level decoder** — 4 layers. Expands each patch state into its concrete Lilylet tokens.

| | value |
|---|---|
| params | ~196 M |
| base type | llama (bf16 trained) |
| patch size | 16 tokens |
| vocab | 256 |
| context | 1024 patches |

## Files

```
model_*.chkpt                 torch training checkpoint (full precision)
.state.yaml                   training / architecture config
tokenizer.json                tokenizer
onnx/
  patch_kv_int8.onnx          patch decoder, int8, with KV-cache (incremental)
  token_kv_int8.onnx          token decoder, int8, with KV-cache (incremental)
  wte.npy                     token-embedding table [vocab, hidden]
  geometry.json               patch size, special ids, per-level KV geometry
```

The `onnx/` bundle is **torch-free**: a generator needs only `onnxruntime` + `numpy`
to run it (the embedding lookup and sampling live outside the graph). int8 dynamic
quantization plus a two-level KV cache make it a fast CPU inference path.

## Usage

The reference runtime is `StreamingLilyletGenerator` in the
[LilyScript Space](https://huggingface.co/spaces/k-l-lambda/LilyScript)
(`lilyscript/generator.py`). Sketch:

```python
from lilyscript.generator import StreamingLilyletGenerator

gen = StreamingLilyletGenerator(model_dir='onnx', asset_dir='onnx')
prompt = '[composer "Beethoven, Ludwig van"]\n[genre "Classical"]\n[instrument "Keyboard"]'
for raw, pretty, done in gen.generate_stream(prompt_text=prompt, measures=8, temperature=1.0, seed=42):
    pass        # `pretty` is measure-segmented Lilylet; streams one patch at a time
print(pretty)
```

Output is Lilylet text, e.g.:

```
[composer "Beethoven, Ludwig van"]
[genre "Classical"]
\key g \major \time 3/4 \clef "treble" \tempo 4=54 ^\markup "Andante con moto" r2. \\
...
```

## License

MIT.