RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

Abstract

TL;DR

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework for unsupervised rare-attribute discovery in diffusion models. RAIGen leverages Matryoshka Sparse Autoencoders and a novel Minority Score combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation.

Overview

The Problem RAIGen Solves

Standard text-to-image models overwhelmingly produce majority attribute combinations. Simply suppressing common outputs does not amplify rare ones — the probability mass redistributes among other majorities. RAIGen instead locates minority attributes that are already encoded inside the model but systematically suppressed during generation.

RAIGen pipeline overview: typical outputs → hidden inside the model → RAIGen → discovered rare attributes — RAIGen uncovers the hidden long tail of rare, suppressed attributes encoded in diffusion models — without predefined categories or external world models.

Contributions

What We Introduce

First Unsupervised Rare-Attribute Framework

RAIGen identifies minority features directly from diffusion model internals — no predefined categories, no external VLMs required.

Matryoshka Sparse Autoencoders (MSAEs)

Hierarchical decomposition of diffusion representations into interpretable sparse features at multiple levels of granularity, from broad concepts to fine-grained details.

Minority Score

A novel signal combining activation rarity and semantic distinctiveness to automatically surface suppressed attributes across any architecture.

Minority Score

s(z) = d ⊙ (1 − ν)

For each MSAE neuron z_i we compute two complementary signals:

ν_i — Activation frequency: the fraction of samples where neuron z_i fires. Low ν_i means a rarer feature.

d_i — Semantic distinctiveness: cosine distance between the neuron's activation-weighted CLIP centroid and the global dataset centroid. High d_i means the feature is semantically far from the majority.

Neurons with high minority score are both infrequent and semantically separated from dominant patterns — hallmarks of genuine minority attributes.

Qualitative Results

Discovered Rare Attributes

RAIGen reveals contextual, stylistic, interaction, and compositional rare attributes across prompts. Each image pair shows the generated image (top) and its MSAE activation heatmap (bottom), highlighting the spatial regions driving the minority neuron.

Rare attributes discovered for Doctor, Sheriff, Writer prompts on SDXL — Rare attributes discovered for *Doctor*, *Sheriff*, and *Writer* prompts using SDXL. RAIGen surfaces attributes such as *doctor in a framed portrait*, *sheriff on horseback*, and *writer with curly/afro-textured hair* — each appearing in <20% of generated images.

Rare attributes discovered on COCO captions — Rare attributes discovered on COCO-style captions. RAIGen finds stylistically rare variants such as *cartoon-faced luggage*, *front-facing train with smoke plumes*, and *snowboarder with sun overhead*.

Quantitative Results

RAIGen Finds Genuinely Rare Attributes

Attribute Presence measures how often a discovered attribute appears in generated images. Lower is rarer. RAIGen attributes appear in fewer than 20% of images, confirming our method surfaces genuinely underrepresented features.

<20%

Attribute presence for RAIGen discoveries

<3/10

Human judges find RAIGen attributes (user study)

Attribute Presence ↓ — WinoBias & COCO

Model	Approach	WinoBias	COCO
SD v1.4	OpenBias	0.941	0.933
SD v1.4	RAIGen	0.205	0.220
SDXL	OpenBias	0.941	0.933
SDXL	RAIGen	0.194	0.199

User Study — Human Presence ↓ (per profession)

Profession	Mean ↓	95% CI
Analyst	1.35	[1.03, 1.67]
CEO	0.70	[0.44, 0.96]
Doctor	1.18	[0.97, 1.39]
Salesperson	1.45	[0.99, 1.91]
Sheriff	2.64	[2.21, 3.07]

Citation

BibTeX

If you find our work useful, please cite:

@article{sreelatha2026raigen, title = {RAIGen: Rare Attribute Identification in Text-to-Image Generative Models}, author = {Vadakkeeveetil Sreelatha, Silpa and Wang, Dan and Belongie, Serge and Awais, Muhammad and Dutta, Anjan}, journal = {arXiv preprint arXiv:2602.06806}, year = {2026} }