AI Upskilling for UX Researchers: The SAE × E-P-I-A-S Framework
Adapted from the Product Designer AI Upskilling Framework for UX Research. This framework recognizes that UXR has distinct accountability structures around evidence integrity, methodological rigor, and participant protection. Built on UXR expertise and insights from Jake Rhodes and Monty Hammontree.
Note: This is not a test, and it’s not a judgment. Most teams are early in automation. That’s normal. The point is to give you a shared language for “what’s happening” and “what’s next.”
How to Use This
Two axes:
1) E-P-I-A-S = how deeply you’ve internalized a skill
| ❶ E: Explorer | ❷ P: Practitioner | ❸ I: Integrator | ❹ A: Architect | ❺ S: Steward |
|---|---|---|---|---|
| Trying things; learning basics | Building consistent habits | Making it part of workflow | Building systems others use | Setting standards; teaching others |
2) SAE Level = how much responsibility AI holds in your workflow
To reduce “forced syntax” without losing SAE: think of SAE as your autonomy envelope — how far the work can go without you touching the wheel.
(We’ll still use SAE L0–L5 so it stays compatible with the original framework.)
For UXR, the “driving stack” maps to:
| Driving Domain | UXR Equivalent |
|---|---|
| Perception | Evidence intake (data collection, what’s observed) |
| Decision-making | Inference + judgment (what it means, what to do) |
| Control | Execution (study ops, analysis, delivery) |
How to navigate
- Find your E-P-I-A-S stage. Are you experimenting, running consistent workflows, or building systems others can operate?
- Find your SAE Level. L0 (manual) → L4 (mostly automated). Most UXRs in early 2026 are L1–L2.
- Plan growth. Go deeper (E→S within your level) or wider (move up a level). In UXR, depth usually wins.
Key difference from product design: UXR carries heightened accountability for evidence integrity and participant protection. An S-Steward at L1 with solid verification habits beats an E-Explorer at L4 running pipelines they can’t validate.
Step 1: E-P-I-A-S Stages
Non-AI UX Researcher Progression
| ❶ Explorer | ❷ Practitioner | ❸ Integrator | ❹ Architect | ❺ Steward |
|---|---|---|---|---|
| Learning methods; variable quality; needs guidance | Consistent process; repeatable methods; reliable evidence | Research embedded in product decisions; clear evidence chains | Building research systems others adopt | Setting org standards; mentoring; maintaining infrastructure |
Career Ladder Parallel (Approximate)
| ❶ Explorer | ❷ Practitioner | ❸ Integrator | ❹ Architect | ❺ Steward |
|---|---|---|---|---|
| Associate Researcher | UX Researcher | Senior Researcher | Staff/Principal | Director/Lead |
Step 2: Find Your SAE Level
SAE Levels for UXR
“Make it safe” framing: If you read these and think “wow, we are nowhere near this” — good. You’re paying attention. Most teams are early. The goal is clarity, not guilt.
| Level | Name | Who’s Responsible | Plain English | Examples |
|---|---|---|---|---|
| L0 🔬 | No Automation | Human does everything | AI may organize, but doesn’t generate research outputs | Manual coding, spreadsheets, slides |
| L1 🔬➕ | Research Assistance | Human drives; AI assists one function | AI drafts/summarizes; you verify everything | Summarize an interview; draft a screener |
| L2 🔬🧠 | Partial Automation | Human supervises; AI does multi-step work | AI synthesizes across sources; you validate in-loop | Feed 10 transcripts → themes + quotes + draft readout |
| L3 🔬😴 | Conditional Automation | AI drives within bounds; human is fallback | AI runs end-to-end inside rules; escalates when uncertain | Bounded synthesis with codebook; escalates on thin evidence |
| L4 🔬🤖 | High Automation | AI drives within bounds; no human standby | AI executes unattended; humans audit periodically | Weekly insight digests; auto-tagging; exception alerts |
| L5 🔬✨ | Full Automation | AI drives everywhere | AI handles novel studies, methods, ethics, org politics | Theoretical only |
Key Clarifications (less ambiguous, more concrete)
- L2 ≠ autonomous research. You still supervise, validate, and own the interpretation.
- Big shift is L2 → L3: you go from “watching every run” to “trusting runs unless escalated.”
- L4 only works in constrained domains with stable methods and measurable gates (e.g., tagging support tickets, aggregating survey verbatims).
- L5 is theoretical. We’re nowhere close.
What do we mean by “production” in UXR?
This is a common confusion, so we’ll be explicit:
- Not “production”: building a Qualtrics-class product, a platform, or a public-facing tool.
- “Production” here means: a research workflow that is team-reliable:
- someone else can run it,
- it has documentation,
- it has monitoring or checks,
- it produces outputs that are traceable back to evidence.
Example: a shared prompt pack + verification checklist can count as “production-quality,” even if it’s just a doc.
What’s a “Research ODD”?
ODD = Operational Design Domain. The safe operating zone for AI work.
If “ODD” feels like forced syntax: treat it as your research autonomy envelope — the boundaries that keep automation safe.
| Component | Example |
|---|---|
| Study type constraints | “Only synthesize existing interviews; no novel method selection” |
| Inputs allowed | Approved transcripts, tagged repository, prior codebook |
| Decision boundaries | Roadmap prioritization → escalate; sensitive segments → escalate |
| Quality gates | 25% transcript spot-checks; contradiction checks; hallucination checks |
| Traceability | Every claim links to quotes; confidence labels required |
| Risk boundaries | Privacy, medical, financial, minors, legal → always escalate |
Simple Self-Assessment
| Level | What your work looks like | Where it happens |
|---|---|---|
| L0 Manual | “I do research manually. Tools store data but don’t generate interpretations.” | Docs, spreadsheets, Miro, slides |
| L1 AI-Assisted | “I use AI for one task at a time (summaries, screeners). I verify against source.” | ChatGPT, Claude, Granola, transcription tools |
| L2 Partially Automated | “AI does multi-step synthesis. I supervise in-loop and correct before it ships.” | Marvin, Looppanel, Dovetail AI |
| L3 Guided Automation | “I define a Research ODD (autonomy envelope). AI runs until ambiguity, then escalates.” | Repository workflows with verification gates |
| L4 Mostly Automated | “In bounded areas, AI produces outputs without me present. I audit periodically.” | Automated pipelines, eval suites |
Step 3: E-P-I-A-S at Each SAE Level
L0: Manual 🔬
Human-only execution. Tools don’t interpret or generate.
| Explorer | Practitioner | Integrator | Architect | Steward |
|---|---|---|---|---|
| Learning methods; variable quality | Consistent practice; repeatable techniques | Workflow integrated with product dev; clear evidence chains | Built reusable systems others adopt | Set org standards; mentor others |
L1: AI-Assisted 🔬➕
AI suggests; human decides. AI reduces cognitive load, not responsibility.
Tools: ChatGPT, Claude, Granola, Otter, Looppanel, Dovetail, Great Question
AI helps with: Summarizing interviews, drafting screeners/guides, rewriting for audiences, transcription
You still: Verify every synthesis against source
| Explorer | Practitioner | Integrator | Architect | Steward |
|---|---|---|---|---|
| Trying AI for summaries; heavy rewriting; ad hoc verification | Daily AI use; saved prompts; always verify against transcripts | AI embedded across full task with sources noted and claims linked | Shared prompt libraries and verification templates others trust | Team standards for AI-assisted work; mentors on verification |
Critical habits:
- Always verify AI summaries against original transcripts
- Never trust AI quotes without checking source
- Document when AI assisted and what you verified
L2: Partially Automated 🔬🧠
AI builds bounded synthesis; human validates.
Tools: Marvin, Looppanel, Dovetail AI, Condens, Great Question, Maze AI
AI helps with: Multi-interview synthesis, theme proposals with quotes, draft readouts, tagging
You still: Validate every theme against source evidence
| Explorer | Practitioner | Integrator | Architect | Steward |
|---|---|---|---|---|
| Trying synthesis tools; lots of manual verification | Repeatable themes from clear inputs; “definition of done” checklist | AI synthesis fits known patterns; full traceability documented | Reusable templates + verification protocols | Team norms for safe vs. risky automation |
Critical habits:
- Sample-verify 20–30% of AI themes against transcripts
- Look for contradicting evidence
- Check that quoted text actually exists
- Mark insight confidence levels
- Maintain claim→source links
Reality check (grounded): Many synthesis tools in 2026 are primarily text-based. They don’t “see” usability behavior (hesitation, misclicks, confusion) unless you explicitly capture it in notes.
L3: Guided Automation 🔬😴
Repository-centric, human-at-checkpoints execution.
Environment: Persistent research infrastructure with verification gates
You own: The checkpoints. Work spans studies, not just sessions.
| Explorer | Practitioner | Integrator | Architect | Steward |
|---|---|---|---|---|
| Learning to define Research ODDs; inconsistent results | Reliable multi-study workflows with explicit checkpoints | Clear decision framing: what AI does, what humans validate | Shared repository workflows others can run | Org standards for repository-based AI research |
Verification systems:
- Inter-rater reliability (AI vs. human coding)
- Contradiction detection across studies
- Confidence scoring (automated + human)
- Audit trails for every synthesis decision
- Clear escalation triggers
L4: Mostly Automated 🔬🤖
Pipeline-centric, system-run execution. You audit outcomes, not steps.
| Explorer | Practitioner | Integrator | Architect | Steward |
|---|---|---|---|---|
| Experimenting with pipelines; heavy validation needed | Operating pipelines with repeatable patterns | End-to-end workflows for bounded study types | Built production-quality infrastructure others operate | Governance for autonomous research at scale |
What L4 looks like (concrete examples):
- Weekly automated insight digests for instrumented product areas (support, feedback, surveys)
- Always-on tagging across support tickets and feedback
- Automated quality gates before human review
- Exception-based workflow: humans intervene only when rules trigger
L5: Full Automation 🔬✨
Theoretical. AI handles novel studies, methods, ethics, org politics end-to-end.
Not a credible operating mode in 2026.
Evidence Integrity Hierarchy
| Layer | What it is | AI role |
|---|---|---|
| Source material | Original transcripts, recordings | Never modified by AI |
| Extracted data | Quotes, timestamps, coded segments | AI-assisted, human-verified |
| Synthesized themes | Patterns across sources | AI-assisted, human-validated |
| Interpreted insights | Meaning and implications | Human-led, AI-assisted |
| Recommendations | Actions to take | Human-owned |
AI-Moderated Interviews (2026 Reality Check)
| Strengths | Weaknesses |
|---|---|
| Scale (hundreds of sessions) | Misses non-verbal cues |
| 24/7 across time zones | Clunky turn-taking |
| Language accessibility | High fraud rates |
| Consistent delivery | Can’t adapt to unexpected revelations |
| Lower cost | Loses depth on sensitive topics |
Best practice: Think of AI moderation as “smarter surveys with follow-ups,” not replacement for human moderation.
How to Know AI Is Helping
AI is helping if it:
- Reduces time-to-insight without sacrificing validity
- Increases evidence coverage
- Improves claim→source traceability
- Surfaces patterns you might have missed
- Frees time for interpretation
If none of those improve, you’re exploring — which is fine. Don’t confuse tool novelty with research improvement.
Progress Markers
| Transition | Threshold |
|---|---|
| L0 → L1 | You safely delegate one bounded function with basic verification habits |
| L1 → L2 | You can chain tasks with consistent quality using repeatable prompts + spot-checking |
| L2 → L3 | You define a Research ODD with explicit constraints and escalation triggers |
| L3 → L4 | You harden the safety case: traceability, validation gates, predictable escalation |
The Big Takeaway
SAE maturity is about who holds responsibility for evidence validity and judgment — not which tools you use.
`
L0: research fundamentals (still matter)
L1: better thinking (first drafts in minutes)
L2: broader synthesis (verified themes in hours)
L3: persistent verification (scaled insights across studies)
L4: autonomous monitoring (pipelines run while you sleep)
L5: goal-setting only (not there yet)
`
Go deep before you go wide.
An S-Steward at L1 with solid verification habits beats an E-Explorer at L4 who can’t validate their pipeline.
Part of the Design in Tech Report 2026. Feedback welcome.