Skip to content

Defect Cause Catalog

A catalog of where defects are usually created in the value stream, methods for automated detection, where adding AI to the flow adds value, and systemic remediation strategies.

Category Defect Cause Earliest Detection Auto Detection Detection Method AI Improvement Systemic Correction
Product & Discovery Building the wrong thing Discovery Production Adoption dashboards (Amplitude; Mixpanel); funnel drop-off alerts; usage thresholds on <X% adoption after N days Synthesize user feedback; support tickets; and usage data to surface misalignment signals earlier than production metrics Validated user research before backlog entry; dual-track agile; kill features that miss adoption thresholds
Product & Discovery Solving a problem nobody has Discovery Production Ticket topic clustering vs. feature investment; survey correlation with releases; feature request tracking Semantic analysis of interview transcripts; forums; and tickets to identify actual vs. assumed pain Problem validation as a stage gate; publish problem brief before solution brief; quantify user pain
Product & Discovery Correct problem; wrong solution Discovery Production A/B testing frameworks; feature flag cohort comparison; statistical significance calculators; task completion tracking Evaluate prototypes against problem definitions; generate alternative solution approaches Prototype multiple approaches; measurable success criteria before building; experiment with flags first
Product & Discovery Meets spec but misses user intent Requirements Production UX analytics (FullStory; Hotjar); rage-click and error-loop detection; task completion rate tracking Review acceptance criteria for missing user outcomes; analyze session replays at scale for frustration patterns Acceptance criteria as user outcomes; not functional checklists; regular usability testing cadence
Product & Discovery Over-engineering beyond need Design CI Static analysis for dead code and unused abstractions (SonarQube; ESLint); complexity scoring; LOC per feature Flag unnecessary abstraction layers; premature optimization; and complexity vs. actual requirements in code review YAGNI as team norm; time-box spikes; justify every abstraction layer; review architecture vs. actual scale quarterly
Product & Discovery Prioritizing wrong work Discovery Production DORA metrics vs. business outcomes; WSJF scoring; cost of delay dashboards; capacity vs. outcome tracking Synthesize roadmap; customer data; and market signals to surface opportunity cost WSJF for prioritization; regular portfolio reviews with outcome data; publish what you chose NOT to do
Integration & Boundaries Interface mismatches CI CI Consumer-driven contract tests (Pact); schema validation (OpenAPI; protobuf/buf); API compat checks per PR Predict which consumers break from API changes based on usage patterns when formal contracts don't exist Contract tests mandatory per boundary; API-first with generated clients; version + migration plan before merge
Integration & Boundaries Wrong assumptions about upstream/downstream Design Staging Chaos engineering (Gremlin; Litmus); synthetic transactions; fault injection; circuit breaker monitoring Review code/docs to identify undocumented behavioral assumptions (timeouts; retries; error semantics) Document behavioral contracts; not just schemas; defensive coding at boundaries; circuit breakers as default
Integration & Boundaries Race conditions Pre-Commit Pre-Commit Thread sanitizers (TSan); race detectors; TLA+; fuzz testing; load testing with concurrency Limited: Can flag concurrency anti-patterns in review but cannot replace formal detection tools Design for idempotency; queues over shared mutable state; lock ordering conventions; concurrency review checklist
Integration & Boundaries Inconsistent distributed state Design Staging Distributed tracing (Jaeger/Zipkin); reconciliation jobs; saga completion monitoring; anomaly detection Review designs for missing compensation logic or consistency model mismatches Choose consistency model deliberately per use case; saga with compensating transactions; event sourcing
Knowledge & Communication Implicit domain knowledge not in code Coding CI Magic number detection; git knowledge-concentration metrics (CodeScene; git-fame); onboarding time tracking High Value: Identify undocumented business rules; missing 'why' in code; and knowledge gaps a new dev would hit DDD with ubiquitous language; embed rules in code not wikis; pair across experience levels; rotate ownership
Knowledge & Communication Ambiguous requirements Requirements CI Flag stories without acceptance criteria; BDD spec coverage tracking; defect classification for 'requirements gap' High Value: Review requirements for ambiguity; missing edge cases; contradictions; generate test scenarios from specs Three Amigos before work starts; example mapping; executable specs as source of truth; given/when/then required
Knowledge & Communication Tribal knowledge loss Coding CI Bus factor from git history (CodeScene; git-fame); single-author concentration alerts; doc freshness checks Generate documentation from code/tests; flag where docs have drifted from implementation Pair/mob programming as default; rotate on-call; automate tribal knowledge; living docs from code
Knowledge & Communication Divergent mental models across teams Design CI Divergent naming detection across codebases; contract test failures; integration defect tracking High Value: Compare terminology and domain models across codebases to detect semantic mismatches Shared domain model; explicit bounded contexts; regular cross-team syncs; shared glossary enforced via linting
Change & Complexity Unintended side effects CI CI Automated test suites; mutation testing (Stryker; PIT); change impact analysis flagging downstream consumers Reason about semantic change impact beyond syntactic dependencies Small focused commits; trunk-based development; feature flags to decouple deploy from release
Change & Complexity Accumulated technical debt CI CI Complexity trends; duplication scoring; dependency cycles; SonarQube quality gates; TODO/HACK counts Identify architectural drift; abstraction decay; and calcified workarounds that static analysis misses Refactoring as part of every story; dedicated debt budget; boy scout rule; treat rising complexity as leading indicator
Change & Complexity Unanticipated feature interactions Staging Staging Combinatorial/pairwise testing; feature flag interaction matrix (LaunchDarkly); canary with auto-rollback; regression suites Reason about feature interactions semantically; flag conflicts that testing matrices miss Feature flags with controlled rollout; modular design; canary deployments with auto-rollback on anomaly
Change & Complexity Configuration drift CI CI IaC drift detection (Terraform plan; Pulumi preview; AWS Config); environment comparison; smoke tests per environment Not needed All infrastructure as code; immutable infrastructure; GitOps; identical provisioning from same source
Testing & Observability Gaps Untested edge cases and error paths CI CI Mutation testing (Stryker; PIT); branch coverage thresholds; property-based testing (Hypothesis; fast-check) Analyze code paths; generate tests for untested boundaries and error paths humans overlook Tests required for every bug fix; property-based testing as standard; boundary value analysis; mutation scores as gate
Testing & Observability Gaps Missing contract tests at boundaries CI CI Boundary inventory vs. contract test inventory; CI fails if new endpoint lacks tests; Pact broker Identify boundaries lacking tests by understanding semantic service relationships Contract tests mandatory per new boundary; type varies by control level (see Contract Testing Strategies tab)
Testing & Observability Gaps Insufficient monitoring Design CI Observability coverage scoring; health endpoint checks; structured logging verification; SLO burn rate alerting Review architectures; flag observability gaps against production readiness checklists Observability as NFR on every service; production readiness checklist enforced; SLOs for every user-facing path
Testing & Observability Gaps Test environments don't reflect production CI CI Automated environment parity checks; synthetic transaction comparison; IaC diff tools; config comparison Not needed Same provisioning for all environments; production-like data in staging; containerization; test in prod with flags
Process & Deployment Long-lived branches Pre-Commit Pre-Commit Branch age alerts; merge conflict frequency; CI dashboard for branch count and divergence Not needed Trunk-based development; merge at least daily; CI rejects stale branches; feature flags eliminate branch need
Process & Deployment Manual pipeline steps CI CI Pipeline audit for manual gates; deployment lead time analysis; pipeline topology analysis Not needed Automate every step commit-to-production; manual approvals only for regulatory; treat pipeline as first-class product
Process & Deployment Batching too many changes per release CI CI Changes-per-deploy metrics; deployment frequency (DORA); batch size threshold alerts Not needed Continuous delivery — every commit is a candidate; single-piece flow; decouple deploy from release with flags
Process & Deployment Inadequate rollback capability CI CI Automated rollback testing in CI; mean time to rollback; migration reversibility checks Not needed Blue/green or canary as default; backward-compatible migrations only; auto-rollback on health failure; practice regularly
Process & Deployment Reliance on human review to catch preventable defects Coding CI Linters; SAST; type systems; complexity scoring catch syntax and known patterns but miss semantic issues High Value: Semantic code review for logic errors; implicit assumptions; missing edge cases; design violations — shift human review from gate to mentoring Automated quality gates in CI; AI review for correctness; reserve human review for knowledge transfer and design decisions; pair programming over async gatekeeping
Process & Deployment Manual review of risks and compliance (CAB) Design Manual Manual — risk checklists; change documentation review; impact assessment meetings; approval workflows High Value: Automated risk scoring from change diff and deployment history; blast radius analysis; auto-approve low-risk; flag high-risk with evidence — eliminates delay without reducing safety Replace CAB with automated pipelines; progressive delivery (canary/blue-green); auto-rollback on anomaly; every commit deployable; CAB adds delay and false confidence without improving safety
Data & State Schema migration / backward compat failures CI CI Schema compatibility checks (Avro; protobuf/buf); migration dry-runs against production-like data Predict downstream impact by understanding how consumers actually use data beyond formal compatibility Expand-then-contract always; never deploy breaking schema changes; schema registry with compat enforcement
Data & State Null/missing data assumptions Pre-Commit Pre-Commit Null safety static analysis (NullAway; TypeScript strict); null-input test generation; NPE monitoring Flag code where optional fields used without null checks; suggest fixes even in non-strict languages Enforce null-safe type systems; Option/Maybe as default; validate at boundaries; assert and handle explicitly
Data & State Concurrency and ordering issues CI CI Thread sanitizers (TSan); load tests with randomized timing; idempotency verification; model checkers (TLA+) Not needed Design for out-of-order delivery; idempotent consumers; version vectors on events; prefer immutable data
Data & State Cache invalidation errors Staging Staging Cache consistency monitoring; TTL verification; stale data detection; hit rate anomaly alerts Review cache invalidation logic for incomplete paths or TTL/change-frequency mismatches Short TTLs over complex invalidation; event-driven invalidation; cache-aside with explicit invalidation; alert on staleness
Dependency & Infrastructure Third-party library breaking changes CI CI Automated upgrade PRs (Dependabot; Renovate); SCA for breaking versions; lock file drift detection Review changelogs to assess breaking change risk; predict compatibility issues from actual usage Pin dependencies; automated upgrade PRs with test gates; abstract over volatile dependencies; evaluate stability before adoption
Dependency & Infrastructure Infrastructure differences across environments CI CI IaC drift detection; config comparison across environments; environment parity scoring; GitOps reconciliation Not needed Single source of truth for all environments; immutable infrastructure; containerization; GitOps with promotion
Dependency & Infrastructure Network partitions / partial failures handled wrong Staging Staging Chaos engineering (Gremlin; Litmus); synthetic transaction monitoring; circuit breaker state monitoring Review architectures for missing failure patterns (no circuit breaker; no retry/backoff; no bulkhead) Design for failure as default; circuit breakers; retries; bulkheads; test failure modes explicitly; game days

Last update: 2026-02-12