TheArtist Music Transformer — LoRA Adapter (Classical)

LoRA adapter that conditions the F1 base (PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80) toward classical chord progressions.

This model was presented in the paper How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling.

GitHub: PearlLeeStudio/TheArtist
Demo: Watch TheArtist in action on YouTube — interactive staff editor, MIDI input, AI generation with live progress, and per-genre LoRA playback across the 13-genre vocabulary.

This release is the best-rank snapshot from a 5-point rank sweep (r ∈ {4, 8, 16, 32, 64}); see §Rank sweep below for the full table and selection criterion.

Base checkpoint note (2026-06-11): This adapter was trained on the released F1 base checkpoint, which full-SHA-256 weight-hash verification (2026-06-11) shows coincides with the pop-only Phase-0 baseline — best-checkpoint selection in the F1 run silently retained the pre-fine-tuning weights (see the Erratum on the base card). The adapter and all evaluation numbers on this card are self-consistent with that released base: every "F1 base" column was measured against the exact weights this adapter was trained on and is served with. The adaptation gains shown here are therefore gains over a pure-pop harmonic prior.

Adapter summary

Field	Value
Base model	`PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80` (F1, 25.6M params)
Adapter type	LoRA (Q/K/V projections)
LoRA rank	64
LoRA alpha	128
LoRA dropout	0.05
Target modules	`w_q`, `w_k`, `w_v`
Trainable parameters	~~1,581,056 (~~6.08% of base)
Adapter file size	~6.0 MB
Base vocabulary	351 tokens (jazz/pop)
Vocabulary extension	+8 genre tokens (`embedding_extension.pt`)
Training epochs	8

Training data

Source

371 chord-progression sequences in the classical (Bach chorale) subset (296 train / 37 val / 38 test), drawn from the public-domain music21 corpus rather than from Chordonomicon. These Bach chorales are license-clean for commercial use; the CC BY-NC restriction below applies only because the LoRA's runtime is loaded on top of an F1 base that itself trained on Chordonomicon.

Filter rule

Bach chorales from music21.corpus — Chordonomicon filter does NOT apply (no classical genre in Chordonomicon)

Splits (song-level, seed=42, 80/10/10)

Partition	Songs	Used for
train	296	this LoRA's training (12-key augmented → 3,552 sequences)
val	37	rank-sweep eval + best-epoch selection during training
test	38	held aside for future paired analysis

Vocabulary

Base: 351 tokens (jazz/pop chord vocab from the F1 base model)
Extension: +8 [GENRE:X] tokens covering 8 new genres (this LoRA adds the [GENRE:classical] token)
Final vocab: 359 tokens (stored alongside the adapter in embedding_extension.pt)

Reproducibility

# 1. classical = Bach chorales from the public-domain music21 corpus (NOT Chordonomicon)
# 2. Extract this genre subset (pulls the Bach chorales from music21.corpus)
uv run python ai/training/extract_genre_subsets.py --genres classical --merge

# 3. Train the LoRA at the released rank
uv run python ai/training/lora_train.py --config ai/training/configs/lora/classical_r64.yaml

Hyperparameters: 8 epochs · batch 32 × accum 2 · lr 3e-4 · 1-epoch warmup · AMP fp16 · best.pt selected by min val_loss.

Rank sweep

The released adapter is the best-rank snapshot from training the same LoRA recipe at five different ranks. Numbers are validation-set token-level metrics (no key augmentation).

Rank	val_loss	val_top1 (%)	val_top5 (%)	Δtop1 vs F1
r=4	1.3663	58.15	85.21	+14.61
r=8	1.3486	58.74	85.62	+15.20
r=16	1.3333	59.61	85.51	+16.07
r=32	1.3174	60.08	86.09	+16.54
r=64	1.3071	60.55	86.09	+17.01 ← selected

Selection criterion: minimum validation cross-entropy loss; val_top1 as tiebreaker.

Evaluation

Validation token-level metrics on the genre-specific val split (37 sequences, no key augmentation).

Metric	F1 base alone	F1 + this LoRA	Δ
Top-1 accuracy (%)	43.54	60.55	+17.01
Top-5 accuracy (%)	72.82	86.09	+13.27
Cross-entropy loss	2.8653	1.3071	-1.5582

Real-song eval

Mean validation top-1/top-5/cross-entropy on 10 held-out real classical songs.

Model	Top-1 (%)	Top-5 (%)	val_loss
F1 base alone	49.55	81.17	2.2389
F1 + this LoRA	61.88	88.87	1.2489
Δ	+12.33	+7.70	-0.9900

Usage

Both the base repo and this LoRA repo ship the project's model.py and tokenizer.py at the repo root, so external users can load this adapter end-to-end without cloning anything from GitHub.

Required dependencies: torch, huggingface_hub, peft, safetensors.

import sys
import torch
import torch.nn as nn
from huggingface_hub import snapshot_download
from peft import PeftModel

# 1. Download the base + LoRA repos. Both bundle model.py and tokenizer.py.
base_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80")
lora_dir = snapshot_download(repo_id="PearlLeeStudio/TheArtist-MusicTransformer-lora-classical")
sys.path.insert(0, base_dir)  # so the next two imports resolve

from model import MusicTransformer
from tokenizer import ChordTokenizer

# 2. Extended tokenizer (351 base + 8 new genre tokens = 359).
tokenizer = ChordTokenizer(include_extra_genres=True)

# 3. Build the model at the BASE vocab size (351)
BASE_VOCAB = 351
model = MusicTransformer(
    vocab_size=BASE_VOCAB,
    d_model=512, n_heads=8, d_ff=2048, n_layers=8,
    max_seq_len=256, dropout=0.0, pad_id=tokenizer.pad_id,
)
ckpt = torch.load(f"{base_dir}/best.pt", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model_state_dict"])

# 4. Grow token_emb + out_proj from 351 -> 359
def _grow_to_extended_vocab(m, new_vocab, none_id):
    d = m.token_emb.embedding_dim
    new_emb = nn.Embedding(new_vocab, d, padding_idx=m.token_emb.padding_idx)
    with torch.no_grad():
        new_emb.weight[:m.token_emb.num_embeddings] = m.token_emb.weight
        for i in range(m.token_emb.num_embeddings, new_vocab):
            new_emb.weight[i] = m.token_emb.weight[none_id]
    m.token_emb = new_emb
    new_out = nn.Linear(d, new_vocab, bias=False)
    with torch.no_grad():
        new_out.weight[:m.out_proj.out_features] = m.out_proj.weight
        for i in range(m.out_proj.out_features, new_vocab):
            new_out.weight[i] = m.out_proj.weight[none_id]
    m.out_proj = new_out

_grow_to_extended_vocab(model, tokenizer.vocab_size, tokenizer.encode_genre("none"))

ext = torch.load(f"{lora_dir}/embedding_extension.pt",
                 map_location="cpu", weights_only=False)
model.token_emb.load_state_dict(ext["token_emb_state"])
model.out_proj.load_state_dict(ext["out_proj_state"])

# 5. Apply the LoRA adapter
model = PeftModel.from_pretrained(model, f"{lora_dir}/adapter")
model.eval()

# 6. Generate a classical continuation.
song = {
    "key": "Cmaj", "time_signature": "4/4", "genre": "classical",
    "bars": [["Cmaj7"], ["Fmaj7"]],
}
prompt_ids = tokenizer.encode_sequence(song)[:-1]
ids = torch.tensor([prompt_ids])
with torch.no_grad():
    for _ in range(32):
        logits = model(ids)
        next_id = torch.multinomial(
            torch.softmax(logits[:, -1, :] / 0.8, dim=-1), 1,
        )
        ids = torch.cat([ids, next_id], dim=-1)
        if next_id.item() == tokenizer.eos_id:
            break
print(tokenizer.decode(ids[0].tolist()))

License and use

The adapter weights are released under CC BY-NC 4.0. Permitted: research, paper replication, portfolio, demo. Not permitted: commercial deployment without separate licensing of upstream data. The classical training data (Bach chorales from the public-domain music21 corpus) is itself license-clean, but the CC BY-NC restriction applies because this LoRA's runtime is loaded on top of an F1 base that trained on Chordonomicon.

Citation

@misc{lee2026chordmix,
  title         = {Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2605.04998},
  archivePrefix = {arXiv}
}

@misc{lee2026chordtimeseries,
  title         = {How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity?},
  author        = {Lee, Jinju},
  year          = {2026},
  eprint        = {2606.07334},
  archivePrefix = {arXiv}
}

Downloads last month: 147

Model tree for PearlLeeStudio/TheArtist-MusicTransformer-lora-classical

Base model

PearlLeeStudio/TheArtist-MusicTransformer-ft-pop80

Adapter

(11)

this model

Papers for PearlLeeStudio/TheArtist-MusicTransformer-lora-classical

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Paper • 2606.07334 • Published 11 days ago

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

Paper • 2605.04998 • Published May 6