Architecture Decision Records: Mom Test Customer Validation

ADR-001: Adopt Behavior-First Questioning as Default Interview Protocol

Status

Accepted

Context

Product teams need a reliable method to distinguish genuine customer demand from polite agreement. Traditional interview approaches ("What do you think of my idea?", "Would you use this?") produce misleading signal because humans are socially conditioned to be supportive. This false positive data is more dangerous than no data, as it creates confidence in the wrong direction.

Decision

Adopt the Mom Test's three rules as the mandatory protocol for all customer discovery conversations:

Discuss the customer's life and existing behavior, not our idea
Ask about specific past events, not hypothetical futures
Maintain a listen-to-talk ratio of at least 70/30

All team members conducting customer conversations must demonstrate proficiency in these rules before conducting interviews independently.

Consequences

Positive: Dramatically improved signal quality. Reduced risk of building features nobody wants. Team decisions become evidence-based. Negative: Initial learning curve. Founders must suppress the instinct to pitch. Some team members may resist the discipline. Neutral: Does not replace quantitative validation — must be paired with surveys, analytics, and market sizing.

Alternatives Considered

Survey-first approach: Rejected because surveys at early stage lack the depth to uncover real behavior patterns and are easily gamed by social desirability bias.
Pitch-and-observe approach: Rejected because pitching during discovery contaminates responses with social pressure.
Jobs-to-Be-Done interviews only: Considered but determined to be complementary rather than a replacement. JTBD provides the strategic framework; Mom Test provides the tactical conversation discipline.

ADR-002: Classify All Interview Data into Three Reliability Tiers

Status

Accepted

Context

Customer conversations produce a mix of reliable and unreliable data. Teams frequently treat all data equally — giving the same weight to an enthusiastic "I love it!" (unreliable) as to "I spent $500 last month trying to solve this" (highly reliable). Without a classification system, dangerous noise gets treated as signal.

Decision

Implement a three-tier data classification system applied to all interview notes:

Tier 1 — Behavioral Facts: Past actions, workflows, money spent, tools tried, time invested. High reliability. Base decisions on this.
Tier 2 — Contextual Signals: Emotional reactions, frustration indicators, engagement level. Medium reliability. Use as supporting evidence.
Tier 3 — Noise: Compliments, hypothetical statements ("I would..."), feature wishlists, generic claims ("I always..."). Low reliability. Acknowledge and discard from decision-making.

Notetakers must tag all captured data with its tier during or immediately after conversations.

Consequences

Positive: Prevents dangerous noise from influencing product decisions. Creates a shared vocabulary for data quality across the team. Negative: Requires discipline to discard flattering feedback. May feel counterintuitive to dismiss positive signals. Neutral: Tier 2 data remains subjective and requires judgment in application.

Alternatives Considered

No classification (treat all data equally): Rejected because this is the primary failure mode the framework is designed to prevent.
Binary classification (signal/noise): Rejected because it loses the nuance of Tier 2 contextual signals that have legitimate supporting value.

ADR-003: Require Commitment Escalation in Every Conversation

Status

Accepted

Context

A common failure pattern in customer discovery is accumulating "zombie leads" — prospects who express enthusiasm in conversations but never take any concrete action. Without a mechanism to test commitment, teams mistake politeness for demand. The Mom Test emphasizes that "actions speak louder than words," but teams need a structured way to measure this.

Decision

Every customer conversation must end with a commitment ask — a request for a concrete next step that costs the participant something (time, reputation, or money). The commitment ladder is:

Time: Follow-up meeting scheduled
Reputation: Introduction to colleague or decision-maker
Effort: Agreement to test a prototype or provide detailed feedback
Money: Pre-order, deposit, or paid pilot
Contract: Letter of intent or formal agreement

If a prospect fails to advance past Level 1 after 2–3 touchpoints, classify as a zombie lead and deprioritize.

Consequences

Positive: Provides an objective measure of genuine interest. Prevents accumulation of false pipeline. Accelerates learning about real vs. perceived demand. Negative: Some team members may feel uncomfortable "asking for things" during research conversations. Requires coaching on how to ask naturally. Neutral: Some valuable conversations may not lend themselves to commitment asks (e.g., purely exploratory industry research). Use judgment.

Alternatives Considered

NPS-style scoring of conversations: Rejected because NPS measures satisfaction, not commitment. A satisfied non-buyer is still a non-buyer.
No structured commitment tracking: Rejected because this is the pattern that produces zombie leads.

ADR-004: Mandate Full Team Participation in Customer Learning

Status

Accepted

Context

A known anti-pattern in customer discovery is the "learning bottleneck" — where one person (typically the CEO, product manager, or "business person") conducts all customer conversations and then tells the rest of the team what to build. This creates an information asymmetry that enables political power ("the customer told me...") rather than shared understanding. It also means customer insights are filtered through one person's interpretations and biases.

Decision

All core team members (founders, product, engineering leads, design leads) must:

Participate in at least 2 customer conversations per month (as interviewer or notetaker)
Attend 100% of post-batch team review sessions
Have access to the full conversation notes repository

No single person may be the exclusive interpreter of customer data for the team.

Consequences

Positive: Eliminates the learning bottleneck. Creates empathy across the entire team. Engineers who hear customer pain directly build with more context. Reduces "telephone game" distortion of customer insights. Negative: Time investment for non-customer-facing team members. Scheduling complexity. Neutral: Different team members will interpret the same conversation differently — this is a feature, not a bug, as it surfaces blind spots.

Alternatives Considered

Dedicated research team only: Rejected for early-stage teams where shared context is critical. Acceptable for enterprise organizations if paired with mandatory cross-team insight sessions.
Written summaries instead of participation: Rejected because summaries lose nuance, emotional signals, and the educational effect of hearing customers directly.

ADR-005: Phase AI Interview Tools After Manual Foundation

Status

Accepted

Context

AI-powered interview platforms (Prelaunch, Marvin, Cusmos) emerged in 2024–2025, offering the ability to conduct Mom Test-aligned conversations at scale. The temptation for time-pressured teams is to skip manual conversations entirely and rely on AI from day one. However, practitioner consensus and platform guidance both indicate that AI scales execution but doesn't teach the mindset. Teams that skip manual conversations tend to design poor research goals, ask shallow questions, and misinterpret AI-generated insights.

Decision

Adopt a three-phase approach:

Phase 1 — Manual (Conversations 1–15): All conversations conducted by humans, in person or via video. Team learns the methodology, develops intuition, and internalizes the three rules.
Phase 2 — AI-Assisted (Conversations 15–50): Use AI tools for transcription, note analysis, and question refinement. Continue conducting conversations manually.
Phase 3 — AI-Scaled (Conversations 50+): Deploy AI interview platforms for parallel segment validation. Human team focuses on high-stakes conversations, commitment escalation, and relationship management.

No team may deploy AI interview platforms without completing Phase 1.

Consequences

Positive: Ensures foundational methodology is internalized before scaling. Prevents garbage-in/garbage-out problem with AI tools. Negative: Slower initial throughput compared to immediate AI deployment. May frustrate teams under time pressure. Neutral: Phase boundaries are guidelines, not rigid gates. Experienced teams with prior Mom Test practice may compress Phase 1.

Alternatives Considered

AI-first from day one: Rejected because AI tools require well-formed research goals that teams can only develop through manual practice.
Manual-only (no AI ever): Rejected because AI tools offer genuine scaling value once foundations are solid, and the 2025–2026 tooling has reached sufficient maturity.
Parallel manual + AI from the start: Rejected because managing two conversation streams simultaneously while learning the methodology creates cognitive overload.

These ADRs document the foundational decisions for adopting and operationalizing the Mom Test methodology within a product organization. Each decision is reversible — revisit as the team's maturity and tooling landscape evolve.

ADR-001: Adopt Behavior-First Questioning as Default Interview Protocol​

Status​

Context​

Decision​

Consequences​

Alternatives Considered​

ADR-002: Classify All Interview Data into Three Reliability Tiers​

Status​

Context​

Decision​

Consequences​

Alternatives Considered​

ADR-003: Require Commitment Escalation in Every Conversation​

Status​

Context​

Decision​

Consequences​

Alternatives Considered​

ADR-004: Mandate Full Team Participation in Customer Learning​

Status​

Context​

Decision​

Consequences​

Alternatives Considered​

ADR-005: Phase AI Interview Tools After Manual Foundation​

Status​

Context​

Decision​

Consequences​

Alternatives Considered​

ADR-001: Adopt Behavior-First Questioning as Default Interview Protocol

Status

Context

Decision

Consequences

Alternatives Considered

ADR-002: Classify All Interview Data into Three Reliability Tiers

Status

Context

Decision

Consequences

Alternatives Considered

ADR-003: Require Commitment Escalation in Every Conversation

Status

Context

Decision

Consequences

Alternatives Considered

ADR-004: Mandate Full Team Participation in Customer Learning

Status

Context

Decision

Consequences

Alternatives Considered

ADR-005: Phase AI Interview Tools After Manual Foundation

Status

Context

Decision

Consequences

Alternatives Considered