Common Lisp Macro Transformations
A fine-tuning dataset for training models to generate Common Lisp macros. Each example is a (before-code) β (macro-definition) β (after-expansion) triple.
Idea
Instead of fine-tuning a model to "write code", fine-tune it to generate CL macros β code that writes code. The model learns to recognize AST patterns and generate transformations, not final output.
Sources
- Let Over Lambda β Doug Hoyte's production macro collection (thephoeron/let-over-lambda)
- On Lisp β Paul Graham's classic Common Lisp macro utilities
Dataset Structure
Each record contains:
instructionβ Task description with the code pattern to addressinputβ The "before" code showing the pattern that needs a macrooutputβ Thedefmacroform that solves itcategoryβ Macro category (capture-management, anaphoric, dispatch, control-flow, DSL, compiler-macro, efficiency, scope)techniqueβ Comma-separated techniques used (gensym, nested-backquote, dlambda, anaphor, code-walking, symbol-macrolet, defsetf, tagbody-go, once-only, macrolet, compiler-macro, recursive-expansion)complexityβ basic, intermediate, or advancedquality_scoreβ Classifier score from 0.0 to 1.0
Categories
| Category | Description | Examples |
|---|---|---|
| capture-management | Hygienic macro writing utilities | defmacro/g!, defmacro!, with-gensyms |
| anaphoric | Deliberate variable capture for conciseness | aif, alambda, alet, aand |
| dispatch | Keyword-based dispatch and inter-closure protocols | dlambda, pandoriclet, with-pandoric |
| control-flow | New evaluation semantics via macros | nlet-tail, condlet, if-match, choose |
| DSL | Domain-specific embedded languages | defunits, _f (generalized setf), dbind |
| compiler-macro | Compile-time optimization of function calls | fformat compiler macro |
| efficiency | Performance-oriented macro techniques | sortf (sorting networks) |
| scope | Lexical scope manipulation | pandoric-eval |
Use for Fine-tuning
The data is in instruction-input-output JSONL format, ready for fine-tuning:
from datasets import load_dataset
ds = load_dataset("j14i/cl-macros", split="train")
Target model size: β€ 30B parameters (the domain is narrow β pattern matching on ASTs and transformations β so a smaller model suffices).
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support