The HARDEN Method RACI Matrix

The HARDEN Method RACI Matrix: Who Does What, When

The HARDEN Method RACI Matrix in action – Why the biggest automation failures aren’t technical—they’re organizational

The $50K Lesson Every Automation Team Learns the Hard Way

Picture this: Your AI automation is working perfectly in demos. 

The models are accurate, the integrations are solid, and everyone’s excited about the potential savings. 

Six months later, it’s collecting digital dust while your team has quietly returned to manual processes.

What went wrong?

Nine times out of ten, it wasn’t the technology. 

It was the people—specifically, unclear ownership of who decides what, when things go wrong, and who’s accountable for the outcome.

When AI automations move from demos to day-to-day operations, the single biggest failure mode isn’t “bad models”—it’s unclear ownership. 

Who decides the pilot? 

Who fixes flaky runs? 

Who approves a rollback?

The HARDEN Method RACI Matrix answers those questions before work begins, so your team ships faster and sleeps better.

What RACI Means (And Why It Saves Projects)

RACI is a simple way to assign roles for every decision:

R — Responsible: The hands on the keyboard. They execute the work. 

A — Accountable: The single throat to choke. Approves the work, owns the outcome. 

C — Consulted: Experts who give input before decisions or handoffs. 

I — Informed: People kept in the loop after decisions or changes.

In the HARDEN Method, every phase has a clear primary owner (Responsible) while the Project Manager remains Accountable end-to-end—for planning, timeline, risk management, and sign-offs. That keeps decision rights crisp without turning the PM into a bottleneck.

The Cast of Characters: Who Brings What to the Harden Method

Business Analyst (BA): Maps reality, collects baselines, writes the business case, curates examples and edge cases. Fluent in the process and its data—less concerned with tooling.

Product Owner (PO): Sets priorities, clarifies scope, and accepts or rejects outcomes. Voice of the business; keeps the backlog honest.

Solutions Architect (SA): Designs the target solution, data contracts, integrations, and human-in-the-loop points. Plans for the bad day (rollback, replay, failure isolation).

Automation Engineer (AE): Builds the flows (e.g., n8n, make, relevance, voiceflow, etc.), implements guardrails, integrates services, and documents the runbook.

DevOps Engineer: Manages environments, secrets, CI/CD, and infrastructure. Helps with deploys, retries, rate limits.

Quality Assurance Engineer (QA): Tries to break things on purpose—negative tests, load tests, and user acceptance tests. Keeper of the regression suite.

Site Reliability Engineer (SRE): Hardens production—circuit breakers, dashboards/alerts, SLOs, error budgets, rollback/replay drills.

Operations Analyst (OA): Watches the health of the automation over time, tracks KPIs/costs, and feeds improvements back to the backlog.

Project Manager (PM): Orchestrates the entire journey—planning, risk management, communications, sign-offs, change control, and stakeholder alignment.


Guiding principle: One R per step to drive, one A across the whole method to align, many C to enrich decisions, and a small I list so people aren’t spammed.

Who Owns Each Step of the Harden Method

  • Discover — Business Analyst (BA)
  • Design — Solutions Architect (SA)
  • Build — Automation Engineer (AE)
  • Break — Quality Assurance Engineer (QA)
  • Harden — Site Reliability Engineer (SRE)
  • Launch — Project Manager (PM)
  • Monitor — Operations Analyst (OA)

The PM is Accountable across all steps (planning, timeline, risks, sign-offs).

Cross-Functional Collaboration by Step

  • Discover: BA • Product Owner (PO) • PM
  • Design: SA • PO • PM
  • Build: AE • DevOps • PM
  • Break (QA): QA • AE • PM
  • Harden: SRE • DevOps • PM
  • Launch: PM • DevOps
  • Monitor: OA • PM

The Complete RACI Matrix

STEPBAPOSAAEDevOpsQASREOAPM
DiscoverRCA
DesignCRA
BuildRCA
BreakCRA
HardenCRA
LaunchCR/A
MonitorRA

R = Responsible • A = Accountable • C = Consulted

Note: It’s normal to add “I” (Informed) for adjacent teams (e.g., Support leadership during Launch), but keep the I-list lean to avoid noise.

Step-by-Step: Who Does What, When

Step 1 — Discover (Primary: BA; Accountable: PM)

Objective: Determine what’s worth automating and how to prove it was successful.

BA (Responsible) does:

  • Run interviews and pull data to map the current workflow (people, systems, hand-offs, exceptions)
  • Establish baselines: volume per category, first response time, time to resolution, rework %, error sources, cost per item
  • Identify candidate slices that are repeatable and low-risk but high-volume
  • Draft a simple ROI model (hours saved, error reduction, faster cycle time, cost per ticket)
  • Curate a balanced dataset of real examples—including messy, ambiguous, and “urgent” subjects

PO (Consulted) contributes: Clarifies scope, non-negotiables, and acceptance thresholds.

PM (Accountable) ensures: a discovery plan, stakeholder buy-in, risks are logged, and that one pilot is chosen.

Exit gate: Discovery Brief approved—pilot scope + baselines + success criteria.

Anti-pattern to avoid: Choosing a glamorous use case over a measurable one.

Step 2 — Design (Primary: SA; Accountable: PM)

Objective: Design for the bad day as carefully as the good day.

SA (Responsible) does:

  • Define the data model (entities, unique IDs, required fields, schemas)
  • Draw the solution architecture (systems, queues, human-in-the-loop, retries)
  • Specify failure modes and compensating actions (timeouts, 429s/500s, partial writes, dedupe/idempotency)
  • Write the rollback and replay plan—how to undo and how to re-run safely
  • Define observability: structured logs (run_id, entity_id, step, status, latency, cost), metrics, and alert thresholds
  • Produce a test plan outline (negative tests, load/burst, user acceptance)

PO (Consulted) contributes: Edge cases, policy constraints, and prioritization.

PM (Accountable) ensures: Design review is inclusive and fast; decisions are recorded; risks have owners.

Exit gate: Design Pack signed—architecture, schema, test plan, monitoring spec, rollback plan.

Anti-pattern to avoid: Letting the tool drive the design (n8n is the implementation, not the design).

Step 3 — Build (Primary: AE; Accountable: PM)

Objective: Implement fast without creating a fragile script.

AE (Responsible) does:

  • Build modular flows with clear names and comments
  • Add guardrails: idempotency keys, retries with backoff and jitter, timeouts, input validation, safe defaults
  • Manage secrets and environments (dev/stage/prod separation; least-privilege)
  • Implement LLM steps with deterministic outputs (strict JSON) and a normalizer that snaps labels to the allowed catalogue
  • Wire structured logs and basic dashboards in dev/stage
  • Write the runbook: how to operate, pause, retry, and release

DevOps (Consulted) contributes: Environment setup, CI/CD jobs (if used), rate-limit strategies, observability plumbing.

PM (Accountable) ensures: Scope remains tied to the pilot; build reviews happen; documentation is usable by operators.

Exit gate: Working pilot runs end-to-end on sample data in stage; logs complete; runbook drafted.

Anti-pattern to avoid: “It works on my machine” flows with no idempotency, no logs, and no plan for retries.

Step 4 — Break (Primary: QA; Accountable: PM)

Objective: Prove it fails safely and recovers predictably before users see it.

QA (Responsible) does:

  • Negative testing: Empty/garbled inputs, malformed payloads, missing fields, duplicates, unexpected languages/encodings
  • Resilience testing: Simulate slow APIs, 429s, 500s, network blips, partial writes; confirm retries/backoff behave
  • Load/burst testing: Spikes that mimic your busiest hour/day
  • Regression for LLM steps: A fixed test set of real tickets that must always map to the same labels; verify determinism
  • User acceptance testing (UAT): Real operators perform typical/tricky cases; validate clarity of tags, notes, assignments

AE (Consulted) contributes: Quick fixes, prompt/rule adjustments, logging improvements.

PM (Accountable) ensures: Severity is agreed (Sev-1/2/3), fixes are prioritized, and UAT sessions are structured.

Exit gate: No Sev-1/2 defects, regression set passes, UAT sign-off is green.

Anti-pattern to avoid: Treating QA as a checkbox after Build; in HARDEN™, Break is a full phase with veto power.

Step 5 — Harden (Primary: SRE; Accountable: PM)

Objective: Make production problems rare, small, and obvious.

SRE (Responsible) does:

  • Implement circuit breakers: when errors or latency exceed thresholds, flip to “tag + note only” (no risky writes)
  • Define SLOs (service level objectives) and error budgets—e.g., 99% of jobs < 60s; <1% failed writes/day
  • Build dashboards and alerts: throughput, success %, latency, error types, cost per item; every alert includes a “next action”
  • Validate replay/backfill paths and a dead-letter queue for manual review
  • Enforce versioning & change control for prompts, rules, and nodes; quick pre-flight checklist before any change

DevOps (Consulted) contributes: Deployment guardrails, on-call integration, and alert routing.

PM (Accountable) ensures: Rollback drill is rehearsed, monitoring is noise-controlled, and SLOs are agreed with stakeholders.

Exit gate: Go/No-Go meeting approves production readiness; kill switch tested, rollback rehearsed.

Anti-pattern to avoid: Launching with dashboards but no thresholds or actions—alerts without next steps are noise.

Step 6 — Launch (Primary: PM; Accountable: PM)

Objective: A calm, reversible launch that builds trust.

PM (Responsible/Accountable) does:

  • Phased rollout: 10% → 30% → 100% of eligible cases; begin with low-risk categories
  • Training: Short SOPs and a 2-minute walkthrough; show what tags/notes mean and how to override
  • Communications: Who is impacted, what changes for them, where to get help
  • Hypercare: Daily checks for 1–2 weeks; quick iteration on misroutes, unclear notes, or alert noise
  • Reversibility: Demonstrate you can toggle off, roll back, and continue manually without losing context

DevOps (Consulted) contributes: Safe deploys, flags, and rollbacks.

Exit gate: Adoption targets met; no critical incidents occurred during hypercare; feedback has been incorporated into the backlog.

Anti-pattern to avoid: Big-bang launch with no feature flag and no rollback drill.

Step 7 — Monitor (Primary: OA; Accountable: PM)

Objective: Keep proving value—and keep it healthy—as volumes and edge cases change.

OA (Responsible) does:

  • Track KPIs: auto-handled %, first response time, resolution time, rework/hand-offs, accuracy (from the regression set), incident count/MTTR, and cost per item (model + API)
  • Review alert hygiene weekly: tune thresholds, reduce noise, and add the following actions where missing
  • Run a monthly ops review: top failure reasons, cost hotspots, improvement backlog; propose the following 1–2 changes
  • Manage model/rules maintenance: version prompts and models; re-run regression tests before/after changes
  • Propose scale-up (new categories) once KPIs hold for 4–6 weeks

PM (Accountable) ensures That Decisions become tickets, changes follow the lightweight change process, and stakeholders receive a monthly summary.

Exit gate: KPIs hold or improve month-over-month; changes ship with zero surprises; the backlog stays small and meaningful.

Anti-pattern to avoid: “Set and forget”—automations decay if no one owns their metrics.

How to Use This RACI in the Real World

Meeting Cadence and Handoffs

Weekly (30 min): Delivery Stand-Up

  • PM runs it. Who’s blocked? What’s next? Any scope change?
  • QA calls out flakiness in test runs; AE shares fixes; SRE flags rising errors or costs.

Fortnightly (45–60 min): Stage-Gate Review

  • Discover → Design: Approve pilot scope and baselines
  • Design → Build: Sign off on architecture, schema, test plan, rollback
  • Build → Break: Confirm pilot runs end-to-end on sample data
  • Break → Harden: All Sev-1/2 defects closed; UAT green
  • Harden → Launch: SLOs, dashboards, and rollback drills complete
  • Launch → Monitor: Hypercare exit, KPI targets defined

Monthly (45 min): Ops & ROI Review

  • OA presents metrics; PM summarizes changes/risks; PO decides following the categories or improvements.

Artifacts to Keep Current

  • Discovery Brief (scope, baselines, ROI model)
  • Design Pack (architecture, schema, test plan, monitoring spec, rollback)
  • Runbook (operate/pause/replay/rollback)
  • Regression Suite (LLM classification set with expected labels)
  • SLOs & Alert Playbook (thresholds + next actions)
  • KPI Dashboard (shared, simple, and real-time)

Common Anti-Patterns (And How RACI Prevents Them)

“Everyone owns it” → no one owns it 

Fix: One Responsible per step and one Accountable end-to-end (PM).

Build-first, design-later 

Fix: Gate Build on Design sign-off; PM enforces the stage-gate.

QA as a checkbox at the end 

Fix: Break is its own phase; QA is Responsible there with veto power.

Dashboards with no actions 

Fix: SRE owns alert playbooks—every alert includes a next step and an owner.

Ad-hoc changes in production 

Fix: PM requires versioning and a tiny change checklist; OA re-runs regression sets.

Operators don’t trust it 

Fix: PM leads Launch training; tags and internal notes explain every action; reversible rollout.

Small-Team and Solo-Founder Adaptations

You may not have nine people. You can still keep the spirit of RACI by doubling up roles:

  • BA + PO: Same person scopes and validates (label them R/C accordingly)
  • SA + AE: Designer builds (very common in startups)
  • QA + SRE: One person breaks it, then hardens it (timebox each hat)
  • OA + PM: One person reports KPIs and runs cadence

Rule of thumb: Keep one Responsible per step and one Accountable across all steps. If you’re wearing several hats, write it down so you know which hat you’re wearing in each meeting.

RACI in Action: Real Freight Forwarder Example

Here’s how the RACI Matrix played out in our Zendesk automation project:

Discover (BA R, PM A): We learn that 52% of tickets are routine notices (booking acknowledgments, missing docs). The baseline first response time is 2 hours and 40 minutes.

Design (SA R, PM A): Create a catalogue (type/subtype), define idempotency keys, rollbacks, and a test plan with 100 real subjects.

Build (AE R, DevOps C, PM A): n8n flow with text cleaning, strict JSON classification, a normalizer, and structured logs.

Break (QA R, AE C, PM A): Negative tests (odd encodings, empty subjects), burst tests, and a regression set. 0 Sev-1 issues remain.

Harden (SRE R, DevOps C, PM A): Circuit breaker flips to “tag + note only” above 3% error rate; SLOs set; rollback drill passes.

Launch (PM R/A, DevOps C): 30% phased rollout for two weeks with daily checks; agents trained on tags/notes/overrides.

Monitor (OA R, PM A): Auto-handled share hits 51% on the pilot set; first response drops to 1h 28m; OA proposes adding “documentation reminders” next.

RACI turns what could be a vague journey into a predictable set of handoffs with clear sign-offs. It’s not bureaucracy; it’s speed through clarity.

Frequently Asked Questions

Q: Why is the PM Accountable everywhere? 

Because someone must own the total outcome—timeline, risk, scope, sign-offs. The PM doesn’t decide the schema or write the code, but they own the delivery plan and ensure decisions are made at the right level, on time.

Q: Can a BA also be Responsible during Break? 

They can be Consulted (C) to validate that tests reflect reality, but QA must own the Responsible seat in Break to avoid conflict of interest.

Q: Who signs off on prompts and model versions? 

Treat prompts and models like code: SA (design) and AE (build) propose; QA proves they hold under the regression set; SRE adds guardrails; PM signs the release plan as Accountable.

Q: Where do legal and security fit? 

They’re usually Consulted in Design (data access, retention, PII) and Harden (SLOs, incident processes), and Informed at Launch.

Q: How is “Informed” used without spamming people? 

Default to I only for milestones (stage gates, go-live) and notable incidents; keep weekly noise within the core team.

Why This Matrix Works

It pre-decides debates (“Who owns testing?” “Who runs the rollback?”).

It reduces rework by clarifying who must be consulted before a choice is made.

It accelerates sign-offs because accountability is unambiguous.

It survives scale: Whether you’re a three-person squad or a cross-department initiative, you can compress or expand roles without losing clarity.

At its core, HARDEN Method is about shipping automations that survive contact with reality. 

The RACI Matrix is the governance spine that keeps that promise. 

Assign one Responsible per step, keep the PM Accountable across the journey, invite the right Consulted voices at the right time, and keep the Informed list lean.

Do that, and you’ll ship faster, with fewer surprises—and you’ll have the paper trail to prove it.


Ready to implement the HARDEN Method with clear ownership?

Leave a Comment

Your email address will not be published. Required fields are marked *