image

Pixel-1: From-Scratch Text-to-Image Generator ๐ŸŽจ

Pixel-1 is a lightweight, experimental text-to-image model built and trained entirely from scratch. Unlike many modern generators that rely on massive pre-trained diffusion backbones, Pixel-1 explores the potential of a compact architecture to understand and render complex semantic prompts.

๐Ÿš€ The Achievement

Pixel-1 was designed to prove that even a small model can achieve high logical alignment with user prompts. It successfully renders complex concepts like window bars, fence shadows, and specific color contrastsโ€”features usually reserved for much larger models.

Key Features:

  • Built from Scratch: The Generator architecture (Upsampling, Residual Blocks, and Projections) was designed and trained without pre-trained image weights.
  • High Prompt Adherence: Exceptional ability to "listen" to complex instructions (e.g., "Window with metal bars and fence shadow").
  • Efficient Architecture: Optimized for fast inference and training on consumer-grade GPUs (like Kaggle's T4).
  • Latent Understanding: Uses a CLIP-based text encoder to bridge the gap between human language and pixel space.

๐Ÿ—๏ธ Architecture

The model uses a series of Transposed Convolutional layers combined with Residual Blocks to upsample a latent text vector into a 128x128 image.

  • Encoder: CLIP (OpenAI/clip-vit-large-patch14)
  • Decoder: Custom CNN-based Generator with Skip Connections
  • Loss Function: L1/MSE transition
  • Resolution: 128x128 (v1)

๐Ÿ–ผ๏ธ Samples & Prompting

Pixel-1 shines when given high-contrast, descriptive prompts.

Recommended Prompting Style:

"Window with metal bars and fence shadow, high contrast, vivid colors, detailed structure"

Observations: While the current version (v1) produces stylistic, slightly "painterly" or "pixelated" results, its spatial reasoning is remarkably accurate, correctly placing shadows and structural elements according to the text.


๐Ÿ› ๏ธ How to use

import torch
import matplotlib.pyplot as plt
import numpy as np
import os
import shutil
from transformers import AutoTokenizer, CLIPTextModel, AutoModel, AutoConfig

def generate_fixed_from_hub(prompt, model_id="TopAI-1/Pixel-1"):
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"๐Ÿš€ Working on {device}...")

    # 1. ื ื™ืงื•ื™ Cache ื›ื“ื™ ืœื•ื•ื“ื ืฉืืชื” ืžื•ืฉืš ืืช ื”ืชื™ืงื•ื ื™ื ื”ื—ื“ืฉื™ื ืžื”-Hub
    cache_path = os.path.expanduser(f"~/.cache/huggingface/hub/models--{model_id.replace('/', '--')}")
    if os.path.exists(cache_path):
        print("๐Ÿงน Clearing old cache to fetch your latest fixes...")
        shutil.rmtree(cache_path)

    # 2. ื˜ืขื™ื ืช CLIP
    tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-large-patch14")
    text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)

    # 3. ื˜ืขื™ื ืช ื”ืžื•ื“ืœ ื•ื”ืงื•ื ืคื™ื’ ืื•ื˜ื•ืžื˜ื™ืช ืžื”-Hub
    # ื‘ื–ื›ื•ืช ื”-auto_map ื‘-config.json, transformers ื™ืžืฆื ืœื‘ื“ ืืช ื”ืžื—ืœืงื•ืช
    print("๐Ÿ“ฅ Downloading architecture and weights directly from Hub...")
    model = AutoModel.from_pretrained(
        model_id, 
        trust_remote_code=True, 
        force_download=True
    ).to(device)
    
    model.eval()
    print("โœ… Model loaded successfully!")

    # 4. ื™ืฆื™ืจื”
    print(f"๐ŸŽจ Generating: {prompt}")
    inputs = tokenizer(prompt, padding="max_length", max_length=77, truncation=True, return_tensors="pt").to(device)
    
    with torch.no_grad():
        emb = text_encoder(inputs.input_ids).pooler_output
        out = model(emb)

    # 5. ืชืฆื•ื’ื”
    img = (out.squeeze(0).cpu().permute(1, 2, 0).numpy() + 1.0) / 2.0
    plt.figure(figsize=(8, 8))
    plt.imshow(np.clip(img, 0, 1))
    plt.axis('off')
    plt.show()

# ื”ืจืฆื”
generate_fixed_from_hub("Window with metal bars and fence shadow")
Downloads last month
345
Safetensors
Model size
19.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train TopAI-1/Pixel-1