Common Lisp Macro Transformations

A fine-tuning dataset for training models to generate Common Lisp macros. Each example is a (before-code) β†’ (macro-definition) β†’ (after-expansion) triple.

Idea

Instead of fine-tuning a model to "write code", fine-tune it to generate CL macros β€” code that writes code. The model learns to recognize AST patterns and generate transformations, not final output.

Sources

  • Let Over Lambda β€” Doug Hoyte's production macro collection (thephoeron/let-over-lambda)
  • On Lisp β€” Paul Graham's classic Common Lisp macro utilities

Dataset Structure

Each record contains:

  • instruction β€” Task description with the code pattern to address
  • input β€” The "before" code showing the pattern that needs a macro
  • output β€” The defmacro form that solves it
  • category β€” Macro category (capture-management, anaphoric, dispatch, control-flow, DSL, compiler-macro, efficiency, scope)
  • technique β€” Comma-separated techniques used (gensym, nested-backquote, dlambda, anaphor, code-walking, symbol-macrolet, defsetf, tagbody-go, once-only, macrolet, compiler-macro, recursive-expansion)
  • complexity β€” basic, intermediate, or advanced
  • quality_score β€” Classifier score from 0.0 to 1.0

Categories

Category Description Examples
capture-management Hygienic macro writing utilities defmacro/g!, defmacro!, with-gensyms
anaphoric Deliberate variable capture for conciseness aif, alambda, alet, aand
dispatch Keyword-based dispatch and inter-closure protocols dlambda, pandoriclet, with-pandoric
control-flow New evaluation semantics via macros nlet-tail, condlet, if-match, choose
DSL Domain-specific embedded languages defunits, _f (generalized setf), dbind
compiler-macro Compile-time optimization of function calls fformat compiler macro
efficiency Performance-oriented macro techniques sortf (sorting networks)
scope Lexical scope manipulation pandoric-eval

Use for Fine-tuning

The data is in instruction-input-output JSONL format, ready for fine-tuning:

from datasets import load_dataset
ds = load_dataset("j14i/cl-macros", split="train")

Target model size: ≀ 30B parameters (the domain is narrow β€” pattern matching on ASTs and transformations β€” so a smaller model suffices).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support