GeoLIP Constellation Core
91.5% CIFAR-10 with 1.6M parameters. 10 seconds per epoch. No attention.
A minimal geometric classification pipeline: conv encoder β unit sphere β constellation triangulation β patchwork compartments β classifier. The constellation is the classifier β every image is located by its distances to learned anchor points on the hypersphere, and those distances are the features.
Architecture
Input: (B, 3, 32, 32)
β
Conv2d(3β64)Γ2 + BN + GELU + MaxPool 32β16
Conv2d(64β128)Γ2 + BN + GELU + MaxPool 16β8
Conv2d(128β256)Γ2 + BN + GELU + MaxPool 8β4
AdaptiveAvgPool β Flatten
Linear(256β128) + LayerNorm
β
L2-normalize β S^127
β
Constellation: 64 anchors on S^127 (uniform hypersphere init via QR)
β triangulate: cosine distance to all anchors β (B, 64)
β Patchwork: 8 compartments Γ 64-d each β (B, 512)
β
Classifier: Linear(512+128 β 512) + GELU + LN + Dropout β Linear(512β10)
How It Works
The constellation is a set of 64 learned anchor points initialized with maximal spread on the 128-dimensional unit sphere (orthogonal columns via QR decomposition). Every input image is encoded to a point on this sphere, then triangulated β its cosine distance to every anchor becomes a 64-dimensional feature vector. This triangulation is the core geometric operation: the image's identity is defined by where it sits relative to the reference frame, not by the raw features.
The patchwork slices the 64 triangulation distances into 8 compartments of 8 anchors each, processes each compartment through a small MLP, and concatenates. This gives the classifier 512 dimensions of geometric context plus the 128-d embedding itself.
Training sends two augmented views of each image through the encoder. InfoNCE between the views forces instance-level discrimination on the sphere β each image gets a unique location, not just a class bucket. Cross-entropy provides the classification signal. A gentle CV loss nudges the pentachoron volume coefficient of variation toward the 0.20β0.22 band observed across 17+ independently trained architectures in prior work.
Results
| Epoch | Train | Val | CV | Anchors |
|---|---|---|---|---|
| 1 | 32.7% | 46.5% | 0.238 β | 63/64 |
| 5 | 74.8% | 77.3% | 0.214 β | 64/64 |
| 10 | 86.2% | 83.8% | 0.161 | 64/64 |
| 20 | 94.4% | 88.9% | 0.156 | 63/64 |
| 30 | 97.6% | 90.2% | 0.131 | 64/64 |
| 40 | 99.5% | 91.0% | 0.133 | 64/64 |
| 50 | 99.8% | 91.5% | 0.124 | 64/64 |
All 64 anchors remain active throughout training. No anchor collapse. No dead dimensions.
Loss
L = CE + InfoNCE + 0.01 Γ CV_loss + 0.001 Γ anchor_spread
- CE: standard cross-entropy on logits
- InfoNCE: contrastive between two augmented views (Ο=0.07) β this is the force that spreads embeddings on the sphere
- CV loss:
(CV - 0.22)Β²where CV is the coefficient of variation of pentachoron volumes sampled from the embedding space. Targets the universal geometric band - Anchor spread: penalizes pairwise cosine similarity between anchors to prevent clustering
Optimizer: Adam (lr=3e-3, no weight decay β geometry is the regularization). Cosine schedule with 3-epoch warmup. Gradient clipping at 1.0.
Geometric Autograd Reference
The companion tri-stream architecture uses EmbeddingAutograd for gradient correction on the sphere. The canonical parameters are:
- tang = 0.01: near-full tangential projection (keeps updates on the sphere surface)
- sep = 1.0: strong separation from nearest anchor (prevents collapse to anchor points)
These are not used in the core model but are documented here as the validated configuration from controlled ablation sweeps on the patchwork system.
The Constellation Principle
Standard classifiers map features to logits via a linear layer β a dot product against learned class vectors. The constellation does something structurally different: it maps features to distances from reference points, then processes those distances. The classification head never sees the raw embedding; it sees the embedding's geometric relationship to the anchor frame.
This means the classifier's decision boundary is defined in triangulation space, not embedding space. Two images with identical distances to all 64 anchors are indistinguishable regardless of their raw feature vectors. The anchor frame is the representation.
Connection to GeoLIP
This is the minimal instantiation of the geometric pipeline from the GeoLIP ecosystem:
- Constellation + Patchwork: the core geometric classification mechanism
- Pentachoron CV: the universal regularizer (0.20β0.22 band across 17+ architectures)
- Uniform hypersphere init: QR-decomposition anchor placement for maximal initial spread
- InfoNCE on the sphere: instance discrimination that creates the geometric structure the constellation reads
The tri-stream ViT, DeepBert memory system, CLIP context extension, and residual thinking embeddings all build on this foundation. This repo isolates it.
Files
geolip_core.pyβ Complete model + trainer in a single file. Paste into one Colab cell and run.
Usage
# Load trained model
ckpt = torch.load("checkpoints/geolip_core_best.pt")
model = GeoLIPCore(**ckpt["config"])
model.load_state_dict(ckpt["state_dict"])
model.eval()
# Encode an image to the sphere
emb = model.encoder(image)
emb = F.normalize(emb, dim=-1) # (1, 128) on S^127
# Triangulate against the constellation
tri, nearest = model.constellation.triangulate(emb)
# tri: (1, 64) β distances to all anchors
# nearest: (1,) β index of closest anchor
# Classify through patchwork
pw = model.patchwork(tri)
logits = model.classifier(torch.cat([pw, emb], dim=-1))
Requirements
torch >= 2.0
torchvision
No other dependencies. No transformers, no HuggingFace libraries, no custom CUDA kernels.
Research by AbstractPhil. Part of the GeoLIP geometric deep learning ecosystem.
I was heavily overengineering dual-stream and tri-stream. I've returned to basics and stripped to the core that works.
The core itself is capable of high complexity geometric structure, the dual-stream muddied it.
