AI Upskilling for UX Researchers: The SAE × E-P-I-A-S Framework

Adapted from the Product Designer AI Upskilling Framework for UX Research. This framework recognizes that UXR has distinct accountability structures around evidence integrity, methodological rigor, and participant protection. Built on UXR expertise and insights from Jake Rhodes and Monty Hammontree.

Note: This is not a test, and it’s not a judgment. Most teams are early in automation. That’s normal. The point is to give you a shared language for “what’s happening” and “what’s next.”

How to Use This

Two axes:

1) E-P-I-A-S = how deeply you’ve internalized a skill

❶ E: Explorer❷ P: Practitioner❸ I: Integrator❹ A: Architect❺ S: Steward
Trying things; learning basicsBuilding consistent habitsMaking it part of workflowBuilding systems others useSetting standards; teaching others

2) SAE Level = how much responsibility AI holds in your workflow

To reduce “forced syntax” without losing SAE: think of SAE as your autonomy envelopehow far the work can go without you touching the wheel.
(We’ll still use SAE L0–L5 so it stays compatible with the original framework.)

For UXR, the “driving stack” maps to:

Driving DomainUXR Equivalent
PerceptionEvidence intake (data collection, what’s observed)
Decision-makingInference + judgment (what it means, what to do)
ControlExecution (study ops, analysis, delivery)

How to navigate

  1. Find your E-P-I-A-S stage. Are you experimenting, running consistent workflows, or building systems others can operate?
  2. Find your SAE Level. L0 (manual) → L4 (mostly automated). Most UXRs in early 2026 are L1–L2.
  3. Plan growth. Go deeper (E→S within your level) or wider (move up a level). In UXR, depth usually wins.

Key difference from product design: UXR carries heightened accountability for evidence integrity and participant protection. An S-Steward at L1 with solid verification habits beats an E-Explorer at L4 running pipelines they can’t validate.


Step 1: E-P-I-A-S Stages

Non-AI UX Researcher Progression

❶ Explorer❷ Practitioner❸ Integrator❹ Architect❺ Steward
Learning methods; variable quality; needs guidanceConsistent process; repeatable methods; reliable evidenceResearch embedded in product decisions; clear evidence chainsBuilding research systems others adoptSetting org standards; mentoring; maintaining infrastructure

Career Ladder Parallel (Approximate)

❶ Explorer❷ Practitioner❸ Integrator❹ Architect❺ Steward
Associate ResearcherUX ResearcherSenior ResearcherStaff/PrincipalDirector/Lead

Step 2: Find Your SAE Level

SAE Levels for UXR

“Make it safe” framing: If you read these and think “wow, we are nowhere near this” — good. You’re paying attention. Most teams are early. The goal is clarity, not guilt.
LevelNameWho’s ResponsiblePlain EnglishExamples
L0 🔬No AutomationHuman does everythingAI may organize, but doesn’t generate research outputsManual coding, spreadsheets, slides
L1 🔬➕Research AssistanceHuman drives; AI assists one functionAI drafts/summarizes; you verify everythingSummarize an interview; draft a screener
L2 🔬🧠Partial AutomationHuman supervises; AI does multi-step workAI synthesizes across sources; you validate in-loopFeed 10 transcripts → themes + quotes + draft readout
L3 🔬😴Conditional AutomationAI drives within bounds; human is fallbackAI runs end-to-end inside rules; escalates when uncertainBounded synthesis with codebook; escalates on thin evidence
L4 🔬🤖High AutomationAI drives within bounds; no human standbyAI executes unattended; humans audit periodicallyWeekly insight digests; auto-tagging; exception alerts
L5 🔬✨Full AutomationAI drives everywhereAI handles novel studies, methods, ethics, org politicsTheoretical only

Key Clarifications (less ambiguous, more concrete)

  • L2 ≠ autonomous research. You still supervise, validate, and own the interpretation.
  • Big shift is L2 → L3: you go from “watching every run” to “trusting runs unless escalated.”
  • L4 only works in constrained domains with stable methods and measurable gates (e.g., tagging support tickets, aggregating survey verbatims).
  • L5 is theoretical. We’re nowhere close.

What do we mean by “production” in UXR?

This is a common confusion, so we’ll be explicit:

  • Not “production”: building a Qualtrics-class product, a platform, or a public-facing tool.
  • “Production” here means: a research workflow that is team-reliable:
  • someone else can run it,
  • it has documentation,
  • it has monitoring or checks,
  • it produces outputs that are traceable back to evidence.
Example: a shared prompt pack + verification checklist can count as “production-quality,” even if it’s just a doc.

What’s a “Research ODD”?

ODD = Operational Design Domain. The safe operating zone for AI work.

If “ODD” feels like forced syntax: treat it as your research autonomy envelopethe boundaries that keep automation safe.
ComponentExample
Study type constraints“Only synthesize existing interviews; no novel method selection”
Inputs allowedApproved transcripts, tagged repository, prior codebook
Decision boundariesRoadmap prioritization → escalate; sensitive segments → escalate
Quality gates25% transcript spot-checks; contradiction checks; hallucination checks
TraceabilityEvery claim links to quotes; confidence labels required
Risk boundariesPrivacy, medical, financial, minors, legal → always escalate

Simple Self-Assessment

LevelWhat your work looks likeWhere it happens
L0 Manual“I do research manually. Tools store data but don’t generate interpretations.”Docs, spreadsheets, Miro, slides
L1 AI-Assisted“I use AI for one task at a time (summaries, screeners). I verify against source.”ChatGPT, Claude, Granola, transcription tools
L2 Partially Automated“AI does multi-step synthesis. I supervise in-loop and correct before it ships.”Marvin, Looppanel, Dovetail AI
L3 Guided Automation“I define a Research ODD (autonomy envelope). AI runs until ambiguity, then escalates.”Repository workflows with verification gates
L4 Mostly Automated“In bounded areas, AI produces outputs without me present. I audit periodically.”Automated pipelines, eval suites

Step 3: E-P-I-A-S at Each SAE Level

L0: Manual 🔬

Human-only execution. Tools don’t interpret or generate.

ExplorerPractitionerIntegratorArchitectSteward
Learning methods; variable qualityConsistent practice; repeatable techniquesWorkflow integrated with product dev; clear evidence chainsBuilt reusable systems others adoptSet org standards; mentor others

L1: AI-Assisted 🔬➕

AI suggests; human decides. AI reduces cognitive load, not responsibility.

Tools: ChatGPT, Claude, Granola, Otter, Looppanel, Dovetail, Great Question

AI helps with: Summarizing interviews, drafting screeners/guides, rewriting for audiences, transcription

You still: Verify every synthesis against source

ExplorerPractitionerIntegratorArchitectSteward
Trying AI for summaries; heavy rewriting; ad hoc verificationDaily AI use; saved prompts; always verify against transcriptsAI embedded across full task with sources noted and claims linkedShared prompt libraries and verification templates others trustTeam standards for AI-assisted work; mentors on verification

Critical habits:

  • Always verify AI summaries against original transcripts
  • Never trust AI quotes without checking source
  • Document when AI assisted and what you verified

L2: Partially Automated 🔬🧠

AI builds bounded synthesis; human validates.

Tools: Marvin, Looppanel, Dovetail AI, Condens, Great Question, Maze AI

AI helps with: Multi-interview synthesis, theme proposals with quotes, draft readouts, tagging

You still: Validate every theme against source evidence

ExplorerPractitionerIntegratorArchitectSteward
Trying synthesis tools; lots of manual verificationRepeatable themes from clear inputs; “definition of done” checklistAI synthesis fits known patterns; full traceability documentedReusable templates + verification protocolsTeam norms for safe vs. risky automation

Critical habits:

  • Sample-verify 20–30% of AI themes against transcripts
  • Look for contradicting evidence
  • Check that quoted text actually exists
  • Mark insight confidence levels
  • Maintain claim→source links

Reality check (grounded): Many synthesis tools in 2026 are primarily text-based. They don’t “see” usability behavior (hesitation, misclicks, confusion) unless you explicitly capture it in notes.


L3: Guided Automation 🔬😴

Repository-centric, human-at-checkpoints execution.

Environment: Persistent research infrastructure with verification gates

You own: The checkpoints. Work spans studies, not just sessions.

ExplorerPractitionerIntegratorArchitectSteward
Learning to define Research ODDs; inconsistent resultsReliable multi-study workflows with explicit checkpointsClear decision framing: what AI does, what humans validateShared repository workflows others can runOrg standards for repository-based AI research

Verification systems:

  • Inter-rater reliability (AI vs. human coding)
  • Contradiction detection across studies
  • Confidence scoring (automated + human)
  • Audit trails for every synthesis decision
  • Clear escalation triggers

L4: Mostly Automated 🔬🤖

Pipeline-centric, system-run execution. You audit outcomes, not steps.

ExplorerPractitionerIntegratorArchitectSteward
Experimenting with pipelines; heavy validation neededOperating pipelines with repeatable patternsEnd-to-end workflows for bounded study typesBuilt production-quality infrastructure others operateGovernance for autonomous research at scale

What L4 looks like (concrete examples):

  • Weekly automated insight digests for instrumented product areas (support, feedback, surveys)
  • Always-on tagging across support tickets and feedback
  • Automated quality gates before human review
  • Exception-based workflow: humans intervene only when rules trigger

L5: Full Automation 🔬✨

Theoretical. AI handles novel studies, methods, ethics, org politics end-to-end.

Not a credible operating mode in 2026.


Evidence Integrity Hierarchy

LayerWhat it isAI role
Source materialOriginal transcripts, recordingsNever modified by AI
Extracted dataQuotes, timestamps, coded segmentsAI-assisted, human-verified
Synthesized themesPatterns across sourcesAI-assisted, human-validated
Interpreted insightsMeaning and implicationsHuman-led, AI-assisted
RecommendationsActions to takeHuman-owned

AI-Moderated Interviews (2026 Reality Check)

StrengthsWeaknesses
Scale (hundreds of sessions)Misses non-verbal cues
24/7 across time zonesClunky turn-taking
Language accessibilityHigh fraud rates
Consistent deliveryCan’t adapt to unexpected revelations
Lower costLoses depth on sensitive topics

Best practice: Think of AI moderation as “smarter surveys with follow-ups,” not replacement for human moderation.


How to Know AI Is Helping

AI is helping if it:

  • Reduces time-to-insight without sacrificing validity
  • Increases evidence coverage
  • Improves claim→source traceability
  • Surfaces patterns you might have missed
  • Frees time for interpretation

If none of those improve, you’re exploring — which is fine. Don’t confuse tool novelty with research improvement.


Progress Markers

TransitionThreshold
L0 → L1You safely delegate one bounded function with basic verification habits
L1 → L2You can chain tasks with consistent quality using repeatable prompts + spot-checking
L2 → L3You define a Research ODD with explicit constraints and escalation triggers
L3 → L4You harden the safety case: traceability, validation gates, predictable escalation

The Big Takeaway

SAE maturity is about who holds responsibility for evidence validity and judgment — not which tools you use.

`

L0: research fundamentals (still matter)

L1: better thinking (first drafts in minutes)

L2: broader synthesis (verified themes in hours)

L3: persistent verification (scaled insights across studies)

L4: autonomous monitoring (pipelines run while you sleep)

L5: goal-setting only (not there yet)

`

Go deep before you go wide.

An S-Steward at L1 with solid verification habits beats an E-Explorer at L4 who can’t validate their pipeline.


Part of the Design in Tech Report 2026. Feedback welcome.