Defect Cause Catalog

A catalog of where defects are usually created in the value stream, methods for automated detection, where adding AI to the flow adds value, and systemic remediation strategies.

Category	Defect Cause	Earliest Detection	Auto Detection	Detection Method	AI Improvement	Systemic Correction
Product & Discovery	Building the wrong thing	Discovery	Production	Adoption dashboards (Amplitude; Mixpanel); funnel drop-off alerts; usage thresholds on <X% adoption after N days	Synthesize user feedback; support tickets; and usage data to surface misalignment signals earlier than production metrics	Validated user research before backlog entry; dual-track agile; kill features that miss adoption thresholds
Product & Discovery	Solving a problem nobody has	Discovery	Production	Ticket topic clustering vs. feature investment; survey correlation with releases; feature request tracking	Semantic analysis of interview transcripts; forums; and tickets to identify actual vs. assumed pain	Problem validation as a stage gate; publish problem brief before solution brief; quantify user pain
Product & Discovery	Correct problem; wrong solution	Discovery	Production	A/B testing frameworks; feature flag cohort comparison; statistical significance calculators; task completion tracking	Evaluate prototypes against problem definitions; generate alternative solution approaches	Prototype multiple approaches; measurable success criteria before building; experiment with flags first
Product & Discovery	Meets spec but misses user intent	Requirements	Production	UX analytics (FullStory; Hotjar); rage-click and error-loop detection; task completion rate tracking	Review acceptance criteria for missing user outcomes; analyze session replays at scale for frustration patterns	Acceptance criteria as user outcomes; not functional checklists; regular usability testing cadence
Product & Discovery	Over-engineering beyond need	Design	CI	Static analysis for dead code and unused abstractions (SonarQube; ESLint); complexity scoring; LOC per feature	Flag unnecessary abstraction layers; premature optimization; and complexity vs. actual requirements in code review	YAGNI as team norm; time-box spikes; justify every abstraction layer; review architecture vs. actual scale quarterly
Product & Discovery	Prioritizing wrong work	Discovery	Production	DORA metrics vs. business outcomes; WSJF scoring; cost of delay dashboards; capacity vs. outcome tracking	Synthesize roadmap; customer data; and market signals to surface opportunity cost	WSJF for prioritization; regular portfolio reviews with outcome data; publish what you chose NOT to do
Integration & Boundaries	Interface mismatches	CI	CI	Consumer-driven contract tests (Pact); schema validation (OpenAPI; protobuf/buf); API compat checks per PR	Predict which consumers break from API changes based on usage patterns when formal contracts don't exist	Contract tests mandatory per boundary; API-first with generated clients; version + migration plan before merge
Integration & Boundaries	Wrong assumptions about upstream/downstream	Design	Staging	Chaos engineering (Gremlin; Litmus); synthetic transactions; fault injection; circuit breaker monitoring	Review code/docs to identify undocumented behavioral assumptions (timeouts; retries; error semantics)	Document behavioral contracts; not just schemas; defensive coding at boundaries; circuit breakers as default
Integration & Boundaries	Race conditions	Pre-Commit	Pre-Commit	Thread sanitizers (TSan); race detectors; TLA+; fuzz testing; load testing with concurrency	Limited: Can flag concurrency anti-patterns in review but cannot replace formal detection tools	Design for idempotency; queues over shared mutable state; lock ordering conventions; concurrency review checklist
Integration & Boundaries	Inconsistent distributed state	Design	Staging	Distributed tracing (Jaeger/Zipkin); reconciliation jobs; saga completion monitoring; anomaly detection	Review designs for missing compensation logic or consistency model mismatches	Choose consistency model deliberately per use case; saga with compensating transactions; event sourcing
Knowledge & Communication	Implicit domain knowledge not in code	Coding	CI	Magic number detection; git knowledge-concentration metrics (CodeScene; git-fame); onboarding time tracking	High Value: Identify undocumented business rules; missing 'why' in code; and knowledge gaps a new dev would hit	DDD with ubiquitous language; embed rules in code not wikis; pair across experience levels; rotate ownership
Knowledge & Communication	Ambiguous requirements	Requirements	CI	Flag stories without acceptance criteria; BDD spec coverage tracking; defect classification for 'requirements gap'	High Value: Review requirements for ambiguity; missing edge cases; contradictions; generate test scenarios from specs	Three Amigos before work starts; example mapping; executable specs as source of truth; given/when/then required
Knowledge & Communication	Tribal knowledge loss	Coding	CI	Bus factor from git history (CodeScene; git-fame); single-author concentration alerts; doc freshness checks	Generate documentation from code/tests; flag where docs have drifted from implementation	Pair/mob programming as default; rotate on-call; automate tribal knowledge; living docs from code
Knowledge & Communication	Divergent mental models across teams	Design	CI	Divergent naming detection across codebases; contract test failures; integration defect tracking	High Value: Compare terminology and domain models across codebases to detect semantic mismatches	Shared domain model; explicit bounded contexts; regular cross-team syncs; shared glossary enforced via linting
Change & Complexity	Unintended side effects	CI	CI	Automated test suites; mutation testing (Stryker; PIT); change impact analysis flagging downstream consumers	Reason about semantic change impact beyond syntactic dependencies	Small focused commits; trunk-based development; feature flags to decouple deploy from release
Change & Complexity	Accumulated technical debt	CI	CI	Complexity trends; duplication scoring; dependency cycles; SonarQube quality gates; TODO/HACK counts	Identify architectural drift; abstraction decay; and calcified workarounds that static analysis misses	Refactoring as part of every story; dedicated debt budget; boy scout rule; treat rising complexity as leading indicator
Change & Complexity	Unanticipated feature interactions	Staging	Staging	Combinatorial/pairwise testing; feature flag interaction matrix (LaunchDarkly); canary with auto-rollback; regression suites	Reason about feature interactions semantically; flag conflicts that testing matrices miss	Feature flags with controlled rollout; modular design; canary deployments with auto-rollback on anomaly
Change & Complexity	Configuration drift	CI	CI	IaC drift detection (Terraform plan; Pulumi preview; AWS Config); environment comparison; smoke tests per environment	Not needed	All infrastructure as code; immutable infrastructure; GitOps; identical provisioning from same source
Testing & Observability Gaps	Untested edge cases and error paths	CI	CI	Mutation testing (Stryker; PIT); branch coverage thresholds; property-based testing (Hypothesis; fast-check)	Analyze code paths; generate tests for untested boundaries and error paths humans overlook	Tests required for every bug fix; property-based testing as standard; boundary value analysis; mutation scores as gate
Testing & Observability Gaps	Missing contract tests at boundaries	CI	CI	Boundary inventory vs. contract test inventory; CI fails if new endpoint lacks tests; Pact broker	Identify boundaries lacking tests by understanding semantic service relationships	Contract tests mandatory per new boundary; type varies by control level (see Contract Testing Strategies tab)
Testing & Observability Gaps	Insufficient monitoring	Design	CI	Observability coverage scoring; health endpoint checks; structured logging verification; SLO burn rate alerting	Review architectures; flag observability gaps against production readiness checklists	Observability as NFR on every service; production readiness checklist enforced; SLOs for every user-facing path
Testing & Observability Gaps	Test environments don't reflect production	CI	CI	Automated environment parity checks; synthetic transaction comparison; IaC diff tools; config comparison	Not needed	Same provisioning for all environments; production-like data in staging; containerization; test in prod with flags
Process & Deployment	Long-lived branches	Pre-Commit	Pre-Commit	Branch age alerts; merge conflict frequency; CI dashboard for branch count and divergence	Not needed	Trunk-based development; merge at least daily; CI rejects stale branches; feature flags eliminate branch need
Process & Deployment	Manual pipeline steps	CI	CI	Pipeline audit for manual gates; deployment lead time analysis; pipeline topology analysis	Not needed	Automate every step commit-to-production; manual approvals only for regulatory; treat pipeline as first-class product
Process & Deployment	Batching too many changes per release	CI	CI	Changes-per-deploy metrics; deployment frequency (DORA); batch size threshold alerts	Not needed	Continuous delivery — every commit is a candidate; single-piece flow; decouple deploy from release with flags
Process & Deployment	Inadequate rollback capability	CI	CI	Automated rollback testing in CI; mean time to rollback; migration reversibility checks	Not needed	Blue/green or canary as default; backward-compatible migrations only; auto-rollback on health failure; practice regularly
Process & Deployment	Reliance on human review to catch preventable defects	Coding	CI	Linters; SAST; type systems; complexity scoring catch syntax and known patterns but miss semantic issues	High Value: Semantic code review for logic errors; implicit assumptions; missing edge cases; design violations — shift human review from gate to mentoring	Automated quality gates in CI; AI review for correctness; reserve human review for knowledge transfer and design decisions; pair programming over async gatekeeping
Process & Deployment	Manual review of risks and compliance (CAB)	Design	Manual	Manual — risk checklists; change documentation review; impact assessment meetings; approval workflows	High Value: Automated risk scoring from change diff and deployment history; blast radius analysis; auto-approve low-risk; flag high-risk with evidence — eliminates delay without reducing safety	Replace CAB with automated pipelines; progressive delivery (canary/blue-green); auto-rollback on anomaly; every commit deployable; CAB adds delay and false confidence without improving safety
Data & State	Schema migration / backward compat failures	CI	CI	Schema compatibility checks (Avro; protobuf/buf); migration dry-runs against production-like data	Predict downstream impact by understanding how consumers actually use data beyond formal compatibility	Expand-then-contract always; never deploy breaking schema changes; schema registry with compat enforcement
Data & State	Null/missing data assumptions	Pre-Commit	Pre-Commit	Null safety static analysis (NullAway; TypeScript strict); null-input test generation; NPE monitoring	Flag code where optional fields used without null checks; suggest fixes even in non-strict languages	Enforce null-safe type systems; Option/Maybe as default; validate at boundaries; assert and handle explicitly
Data & State	Concurrency and ordering issues	CI	CI	Thread sanitizers (TSan); load tests with randomized timing; idempotency verification; model checkers (TLA+)	Not needed	Design for out-of-order delivery; idempotent consumers; version vectors on events; prefer immutable data
Data & State	Cache invalidation errors	Staging	Staging	Cache consistency monitoring; TTL verification; stale data detection; hit rate anomaly alerts	Review cache invalidation logic for incomplete paths or TTL/change-frequency mismatches	Short TTLs over complex invalidation; event-driven invalidation; cache-aside with explicit invalidation; alert on staleness
Dependency & Infrastructure	Third-party library breaking changes	CI	CI	Automated upgrade PRs (Dependabot; Renovate); SCA for breaking versions; lock file drift detection	Review changelogs to assess breaking change risk; predict compatibility issues from actual usage	Pin dependencies; automated upgrade PRs with test gates; abstract over volatile dependencies; evaluate stability before adoption
Dependency & Infrastructure	Infrastructure differences across environments	CI	CI	IaC drift detection; config comparison across environments; environment parity scoring; GitOps reconciliation	Not needed	Single source of truth for all environments; immutable infrastructure; containerization; GitOps with promotion
Dependency & Infrastructure	Network partitions / partial failures handled wrong	Staging	Staging	Chaos engineering (Gremlin; Litmus); synthetic transaction monitoring; circuit breaker state monitoring	Review architectures for missing failure patterns (no circuit breaker; no retry/backoff; no bulkhead)	Design for failure as default; circuit breakers; retries; bulkheads; test failure modes explicitly; game days

Last update: 2026-02-12