# AI in My Daily Work — Episode 6: Reverse-Engineering Design Systems with 4 AI Agents, a Rule-Based Color Classifier & a Free Rule Engine ## A Semi-Automated Workflow: From Website URL to Figma-Ready Design System (v3.2) *How I built a system that extracts any website's design tokens, classifies colors deterministically, audits them like a senior design team, and generates a visual spec in Figma — for ~$0.003 per run.* [IMAGE: Hero - Complete workflow showing Website -> AI Agents -> Figma Visual Spec] --- ## The Problem Every Designer Knows I've been managing design systems for consumer-facing apps for over 10 years. And there's one task that never gets easier: **auditing existing websites to extract their design tokens.** Whether it's analyzing a competitor, inheriting a legacy project, or bringing consistency to a sprawling multi-brand portfolio, the process is always the same: 1. Open DevTools 2. Inspect elements one by one 3. Copy hex codes to a spreadsheet 4. Manually check contrast ratios 5. Try to identify the type scale (is it 1.2? 1.25? Random?) 6. Repeat for spacing, shadows, border radius... 7. Spend days organizing into a coherent system 8. Manually recreate in Figma as variables 9. Manually build a visual spec page I've done this dozens of times. It takes **3-5 days** for a single website. And by the time you're done, something has already changed. I wanted a system that could think like a design team: - a **data engineer** extracting and normalizing every token - a **color scientist** classifying colors by actual CSS usage (not guessing) - an **analyst** identifying brand colors and patterns - a **senior reviewer** benchmarking against industry standards - and a **chief architect** synthesizing everything into action So I built one. Three versions later, here's what works. --- ## The Solution (In One Sentence) I built a 3-layer system — deterministic extraction + rule-based color classification + 4 AI agents — that acts like an entire design audit team. It outputs W3C DTCG-compliant JSON that feeds directly into Figma via a custom plugin that auto-generates a visual spec page. Cost: ~$0.003 per analysis. --- ## The Complete Workflow [IMAGE: Full workflow diagram showing all 8 steps] Here's the end-to-end process I now use: ``` +--------------------------------------------------------------+ | MY DESIGN SYSTEM WORKFLOW | +--------------------------------------------------------------+ | | | STEP 1: Extract AS-IS (AI Agent App) | | ---------------------------------------- | | * Enter website URL | | * AI auto-discovers pages | | * Extracts colors, typography, spacing, shadows, radius | | * Normalizes: dedup, sort, name (radius, shadows, colors) | | * Color Classifier: deterministic role assignment | | * Rule Engine: WCAG + type scale + spacing grid | | * Download AS-IS JSON (W3C DTCG v1 format) | | | | | | | v | | | | STEP 2: Import to Figma (My Plugin) | | ---------------------------------------- | | * Open Figma | | * Upload AS-IS JSON via custom plugin | | * Plugin auto-detects DTCG format | | * Creates Variables + Paint/Text/Effect Styles | | * Auto-generates Visual Spec Page | | | | | | | v | | | | STEP 3: View AS-IS Visual Spec (Figma) | | ---------------------------------------- | | * Typography (Desktop + Mobile) with AA badges | | * Colors organized by semantic role | | * Spacing scale, Radius display, Shadow elevation | | * Review what exists before modernizing | | | | | | | v | | | | STEP 4: AI Analysis (AI Agent App - Stage 2) | | ---------------------------------------- | | * Free Rule Engine: WCAG, type scale, spacing grid | | * AURORA: Brand color identification (advisory) | | * ATLAS: Industry benchmark comparison (8 systems) | | * SENTINEL: Best practices audit with priorities | | * NEXUS: Final synthesis resolving all contradictions | | | | | | | v | | | | STEP 5: Accept/Reject Suggestions (AI Agent App) | | ---------------------------------------- | | * Review each recommendation | | * Accept or Reject individually | | * I stay in control of what changes | | | | | | | v | | | | STEP 6: Export TO-BE (AI Agent App - Stage 3) | | ---------------------------------------- | | * Generate modernized TO-BE JSON (DTCG compliant) | | * Contains accepted improvements | | * Download new JSON file | | | | | | | v | | | | STEP 7: Import TO-BE to Figma (My Plugin) | | ---------------------------------------- | | * Upload TO-BE JSON via same plugin | | * Figma Variables update with new values | | * New Visual Spec generated for comparison | | | | | | | v | | | | STEP 8: Compare AS-IS vs TO-BE (Figma) | | ---------------------------------------- | | * Side-by-side visual spec pages | | * See exactly what changed and why | | * Ready to use in production | | | +--------------------------------------------------------------+ ``` **Total time:** ~15 minutes (vs 3-5 days manual) --- ## Architecture Overview: Three Layers, One Clear Authority Chain My first attempt (V1) made a classic mistake: **I used a large language model for everything.** V1 cost $0.50-1.00 per run, took 15+ seconds for basic math, and LLMs hallucinated contrast ratios. V2 split the work into rules vs AI. Better, but a new problem emerged: **three competing naming systems** for colors. The normalizer used word-based shades ("blue.light"), the export layer used numeric shades ("blue.500"), and the LLM agent used whatever it felt like ("brand.primary"). The output in Figma was chaos. V3 fixed this with a clear authority chain and a dedicated color classifier: > **Rule-based code handles certainty. LLMs handle ambiguity. And there's ONE naming authority.** [IMAGE: Architecture diagram - Layer 1 (Extraction) -> Layer 2 (Classification + Analysis) -> Layer 3 (4 Named Agents)] ``` +--------------------------------------------------+ | LAYER 1: EXTRACTION + NORMALIZATION (Free) | | +- Crawler + 7-Source Extractor (Playwright) | | +- Normalizer: colors, radius, shadows, typo | | | +- Radius: parse, deduplicate, sort, name | | | +- Shadows: parse, sort by blur, name | | | +- Colors: hue + numeric shade (50-900) | | +- Firecrawl: deep CSS parsing (bypass CORS) | +--------------------------------------------------+ | LAYER 2: CLASSIFICATION + RULE ENGINE (Free) | | +- Color Classifier (815 lines, deterministic) | | | +- CSS evidence -> category -> token name | | | +- Capped: brand(3), text(3), bg(3), etc. | | | +- Every decision logged with evidence | | +- WCAG Contrast Checker (actual FG/BG pairs) | | +- Type Scale Detection (ratio math) | | +- Spacing Grid Analysis (GCD math) | | +- Color Statistics (deduplication) | +--------------------------------------------------+ | LAYER 3: 4 AI AGENTS (~$0.003) | | +- AURORA - Brand Advisor (Qwen 72B) | | +- ATLAS - Benchmark Advisor (Llama 70B) | | +- SENTINEL - Best Practices Audit (Qwen 72B) | | +- NEXUS - Head Synthesizer (Llama 70B) | +--------------------------------------------------+ ``` ### The Naming Authority Chain (V3's Key Innovation) This was the single hardest problem to solve. In V2, three systems produced color names: | System | Convention | Example | Problem | |--------|-----------|---------|---------| | Normalizer | Word shades | `color.blue.light` | Inconsistent | | Export function | Numeric shades | `color.blue.500` | Conflicts | | AURORA LLM | Whatever it wants | `brand.primary` | Unpredictable | **Result in Figma: `blue.300`, `blue.dark`, `blue.light`, `blue.base` in the same export. Unusable.** V3 established a clear chain: ``` 1. Color Classifier (PRIMARY) - deterministic, covers ALL colors +- Rule-based: CSS evidence -> category -> token name +- 100% reproducible, logged with evidence 2. AURORA LLM (SECONDARY) - semantic role enhancer ONLY +- Can promote "color.blue.500" -> "color.brand.primary" +- CANNOT rename palette colors +- Only brand/text/bg/border/feedback roles accepted 3. Normalizer (FALLBACK) - preliminary hue+shade names +- Only used if classifier hasn't run yet ``` One naming authority. No conflicts. Clean Figma output every time. --- ## Layer 1: Extraction + Normalization (No LLM) ### Extraction: 7 Sources A Playwright-powered browser visits each page at **two viewports** (1440px desktop + 375px mobile) and extracts every design token from **8 sources**: [IMAGE: 8 Extraction Sources diagram] ``` --- Playwright (7 internal sources) --- Source 1: Computed Styles -> What the browser actually renders Source 2: CSS Variables -> --primary-color, --spacing-md Source 3: Inline Styles -> style="color: #06b2c4" Source 4: SVG Attributes -> fill, stroke colors Source 5: Stylesheets -> CSS rules, hover states, pseudo-elements Source 6: External CSS -> Fetched & parsed CSS files Source 7: Page Scan -> Brute-force regex on style blocks --- Separate deep extraction --- Source 8: Firecrawl -> Deep CSS parsing (bypasses CORS) ``` ### Normalization: Not Just Dedup The normalizer in V2 was a major pain point. Colors got named, but radius and shadows were passed through raw. Multi-value CSS like `"0px 0px 16px 16px"` became garbage tokens. Percentage values like `"50%"` couldn't be used in Figma. V3's normalizer actually processes everything: **Colors:** Deduplicate by exact hex + RGB distance < 30. Assign hue family + numeric shade (50-900). Never use words like "light" or "dark" for shades. Add role hints from CSS context for the classifier. **Radius:** Parse multi-value shorthand (take max), convert rem/em/% to px, deduplicate by resolved value, sort by size, name semantically (none/sm/md/lg/xl/2xl/full). A raw extraction of `["8px", "0px 0px 16px 16px", "50%", "1rem"]` becomes: ``` radius.sm = 4px (from 0.25rem context) radius.md = 8px radius.xl = 16px (max of 0 0 16 16) radius.full = 9999px (from 50%) ``` **Shadows:** Parse CSS shadow strings into components (offset, blur, spread, color). Filter out spread-only (border simulation) and inset shadows. Sort by blur radius. Deduplicate by blur bucket. Name by elevation (xs/sm/md/lg/xl). If fewer than 5 shadows extracted, interpolate to always produce 5 elevation levels. **Cost: $0.00 | Runtime: ~90 seconds** --- ## Layer 2: Color Classification + Rule Engine (No LLM) ### The Color Classifier (V3's Biggest Addition) This is 815 lines of deterministic code that replaced what AURORA used to do badly. **The problem it solves:** Given 30+ extracted colors, which is the brand primary? Which are text colors? Which are backgrounds? An LLM can reason about this, but inconsistently. The same color might be called "brand.primary" in one run and "accent.main" in the next. And it only named 10 colors, leaving the rest in chaos. The classifier uses CSS evidence: ``` CSS Evidence -> Category: background-color on