k-l-lambda
/

LilyNota

Text Generation

music-generation

Model card Files Files and versions

LilyNota / README.md

k-l-lambda's picture

updated README.

4c9eba5 3 days ago

|

history blame contribute delete

2.88 kB

	---
	license: mit
	language:
	- en
	library_name: onnxruntime
	tags:
	- music-generation
	- symbolic-music
	- lilylet
	- notagen
	- onnx
	- int8
	pipeline_tag: text-generation
	---

	# LilyNota

	A symbolic-music generation model that writes scores in [Lilylet](https://github.com/k-l-lambda/lilylet) — a compact, LilyPond-flavored text notation. Given a short metadata prompt (composer / genre / instrument / key / time signature …), the model autoregressively composes a multi-measure, multi-staff piece.

	It powers the [LilyScript](https://huggingface.co/spaces/k-l-lambda/LilyScript) Space, where generated Lilylet is rendered to a sheet-music score (Verovio) and played back as MIDI.

	> Note: the weights here are an early snapshot and will be refreshed as training continues.

	## Model

	`LilyNota` — a hierarchical, two-level NotaGen-style decoder (Llama backbone):

	- Patch-level decoder — 10 layers, hidden 1024, 16 heads. Consumes the score as a stream of fixed-size patches (16 tokens each) and produces a hidden state per patch.
	- Token-level decoder — 4 layers. Expands each patch state into its concrete Lilylet tokens.

	\| \| value \|
	\|---\|---\|
	\| params \| ~196 M \|
	\| base type \| llama (bf16 trained) \|
	\| patch size \| 16 tokens \|
	\| vocab \| 256 \|
	\| context \| 1024 patches \|

	## Files

	```
	model_*.chkpt torch training checkpoint (full precision)
	.state.yaml training / architecture config
	tokenizer.json tokenizer
	onnx/
	patch_kv_int8.onnx patch decoder, int8, with KV-cache (incremental)
	token_kv_int8.onnx token decoder, int8, with KV-cache (incremental)
	wte.npy token-embedding table [vocab, hidden]
	geometry.json patch size, special ids, per-level KV geometry
	```

	The `onnx/` bundle is torch-free: a generator needs only `onnxruntime` + `numpy`
	to run it (the embedding lookup and sampling live outside the graph). int8 dynamic
	quantization plus a two-level KV cache make it a fast CPU inference path.

	## Usage

	The reference runtime is `StreamingLilyletGenerator` in the
	[LilyScript Space](https://huggingface.co/spaces/k-l-lambda/LilyScript)
	(`lilyscript/generator.py`). Sketch:

	```python
	from lilyscript.generator import StreamingLilyletGenerator

	gen = StreamingLilyletGenerator(model_dir='onnx', asset_dir='onnx')
	prompt = '[composer "Beethoven, Ludwig van"]\n[genre "Classical"]\n[instrument "Keyboard"]'
	for raw, pretty, done in gen.generate_stream(prompt_text=prompt, measures=8, temperature=1.0, seed=42):
	pass # `pretty` is measure-segmented Lilylet; streams one patch at a time
	print(pretty)
	```

	Output is Lilylet text, e.g.:

	```
	[composer "Beethoven, Ludwig van"]
	[genre "Classical"]
	\key g \major \time 3/4 \clef "treble" \tempo 4=54 ^\markup "Andante con moto" r2. \\
	...
	```

	## License

	MIT.