Protected

Password required

This work is shared selectively. Reach out at karleeboillot@gmail.com if you need access.

Incorrect password — try again

← Work
Fluent Flex

Token System · Microsoft Fluent Design System · M365

The System Under the System

Every downstream artifact — component spec, theme file, AI-generated UI — is an output of the same token architecture. Design intent lives in the token name, so the system scales to new platforms, new brands, and new AI consumers without touching component logic.

460 → 130 Tokens after architectural refactor
3 tiers Primitives → generics → group tokens
7 families Color · type · spacing · radius · stroke · elevation · motion
3 platforms iOS · Android · Web
15 Creative individuals who shaped the system
B2B → B2C Built for M365 enterprise Copilot, scaled to consumer

Fluent Flex is the token foundation for M365 — the layer between raw design values and the components, themes, and AI systems that consume them. It's not a component library. It's the architecture that makes every other artifact in the system maintainable, themeable, and extendable without rewriting what came before.

The central insight: different types of design values should be named in the style that matches how that value is reasoned about. Spacing uses algorithmic multipliers (-base-200) because the math relationship matters. Color uses role-based names (foreground-neutral-primary) because the intent matters more than the hex. Elevation uses ordinal words (lowest → highest) because position on a scale matters, not the math between steps. Each naming convention is a different answer to the same question: what does a reader — human or model — need to infer the right thing without a lookup table?

That legibility is what allowed a Microsoft AI team to adopt Flex over Tailwind. It's what allows themes to swap at runtime without touching components. And it's what makes the system a foundation for AI-generated UI, not just a design tool.

On the foundational level, we're focused on core principles and consistent rulesets built on predictability. Across every token category, we moved from selecting values to defining relationships — establishing logical and mathematical foundations and guardrails to keep designers and AI-driven outputs consistent and brand-aligned.

Three tiers. One direction of flow.

Primitives prmt-*

Raw values. What something is, not how it's used. Color stops on a ramp, spacing steps, font sizes. Stable across themes.

prmt-color-fog-26 prmt-font-size-16 prmt-spacing-4

Generics --gnrc-*

Semantic intent. What something does in the experience. These swap values per theme — light, dark, brand — while components stay unchanged.

--gnrc-color-foreground-neutral-primary --gnrc-spacing-component-base-200 --gnrc-border-radius-base-300

Group tokens --grp-*

Product override layer. Teams deviate from the system for specific component groups without breaking theme interoperability.

--grp-button-color-background-brand-soft-hover

Themes swap generic token values. Primitives stay constant. Components reference generics. A theme switch updates the entire UI — no component-level changes.

Before

Others tried

Semantic tokens had been attempted — at Microsoft and across the industry. Teams built naming conventions, established tiers, wrote documentation. Some progress happened. None of it fully held at scale. Themes broke. Naming became opaque. Maintenance compounded.

The work

We watched. We learned.

Rather than starting from scratch, we studied what came before — what worked, what didn't, and where things broke under real pressure. The failure modes weren't random. They were architectural. Every prior attempt ran into the same wall: the system worked until it needed to scale. Once real teams with real diverging needs came in, the structure couldn't flex to accommodate them.

The result

Simpler. More meaningful.

Fewer tokens — not because the system does less, but because they're smarter and named better. Critically, the naming was designed to be extensible: teams can add values as they scale without breaking the existing architecture or diverging into their own dialect.

A few modernizations

  • Color — Ramps built in OKLCH, where contrast holds across every hue by math, not by eye. Hover and pressed states are calculated from the base — change one value, interactions follow.
  • Typography — Three ramps for every need: functional UI, content, and content editorial. Line heights are percentage modifiers that respond to font size and scale proportionally — not hardcoded values riding along for the trip.
  • Spacing & radius — Built to flex without fracturing. Scale steps and radius tiers designed to stretch across very different team needs.

The system is more intuitive to read, easier to adopt, and meaningful enough that AI models can reason over token names without a lookup table.

Teams message from Kyle Anderson: loving the flex stuff

Each family solves a different design problem. The decisions inside each one are where it gets interesting.

Chapter 01

Color

A color system built on a model, not a mood board. OKLCH ramps where contrast holds across every hue by math, not manual correction. Hover and pressed states aren't tokenized — they're calculated from the base, so interaction states never need to be maintained separately. Coverage that finally closes the gaps Fluent 2 left open: materiality, editorial expression, data viz, and accent needs. And a clear point of view: monochromatic brand theming as the default design language, not a special case.

algorithmic interactions · monochromatic by default · materiality · extensible theming

Read chapter →
Typography ramps

Chapter 02

Typography

Each text style packages five properties together — family, weight, size, line height, letter spacing — so typographic decisions stay holistic and consistent. Three ramps for every need: functional UI, content for reading, and editorial for output's expressive moments. Line heights respond proportionally to font size, type scales to its container rather than the viewport, and the system covers iOS and Android.

functional · content · editorial · fluid · iOS & Android

Read chapter →

Chapter 03

Spacing

Two density tiers — component for the space inside a control, layout for the space between sections — because spacing means something different depending on where it lives. Token names encode the math: the pixel value is derivable from the name alone, no lookup needed. Designed to stretch across very different team needs without fracturing into separate dialects.

component + layout tiers · self-documenting names · density-ready

Read chapter →

Chapter 04

Styling

Three families, three different models. Radius is algorithmic — tied to component role and nesting depth, so inner elements automatically carry tighter corners than their containers. Elevation is ordinal: six levels where position on the scale is what matters, not the math between steps. Stroke has three semantic levels tied to use case, not preference.

algorithmic radius · ordinal elevation · semantic stroke

Read chapter →

Chapter 05

Motion

Timing, easing curves, and reduced-motion behavior — consistent across components so transitions draw from the same vocabulary rather than ad-hoc per-component decisions.

In development

Coming soon

Why this matters now

The system is legible to models, not just designers.

Flex is the token foundation for the entire M365 design organization — ~500 designers across 24+ brands, all building on the same architecture. The real test was whether it could convince teams outside M365 to choose it over established alternatives.

The MAI consumer design org did exactly that, converting from a Tailwind plan to Flex after a 5-presentation roadshow. When a token name encodes its intent — --gnrc-color-foreground-neutral-primary rather than #242424 — an AI model can reason about the right usage without a lookup table. That legibility was the argument. It won.

"Tokens have made the work of outputting with AI much more accurate."

Peer review · Microsoft, 2026

"Foundational for all of our AI efforts moving forward."