Feb 23 / Michaël Hompus

arc42 chapter 11: Risks and technical debt

arc42, Documentation, Risk Management, Risks, Technical Debt

Chapter 11 keeps uncomfortable truths visible. It records the risks and technical debt that can still bite you later, so they do not stay hidden in someone's head or scattered across chat logs. In this article I explain what belongs in chapter 11, what to keep out, a minimal structure you can copy, plus a small example from Pitstop.

Full article…

This post is about chapter 11: Risks and technical debt, the first chapter in the “Reality and shared language” group.

The first ten chapters built up the architecture story: goals, constraints, structure, runtime, deployment, concepts, decisions, and quality scenarios.

Chapter 11 is where we stop pretending everything is solved. It is where I write down what can still go wrong, what we knowingly postponed, and what we have not figured out yet.

Note

Teams that write risks down early tend to be calmer in production.

What belongs in chapter 11 (and what does not)

Chapter 11 of an arc42 document answers:

What can still hurt us later, and what are we doing about it?

What belongs here:

Architecturally relevant risks, including:
- product and adoption risks (does the workflow fit reality?)
- integration risks (vendor stability, contract changes, rate limits)
- operational risks (backup, monitoring gaps, single points of failure)
- security and compliance risks (data exposure, audit gaps, retention)
- performance and scalability risks (hot paths, growth limits)
Technical debt that affects maintainability, reliability, or delivery speed: the things you deliberately postponed that now need a visible trail.
For each risk or debt item:
- a clear statement
- why it matters (impact)
- how likely it is (roughly)
- mitigation or next step
- owner (a person, role, or team)
- trigger or early warning (optional): how will you know this risk is starting to happen? This is different from mitigation: it is the signal that tells you to act. Think: a metric crossing a threshold, an error rate spike, a support ticket pattern.
Cross-links to the rest of the document: constraints in chapter 2, strategy in chapter 4, runtime scenarios in chapter 6, deployment assumptions in chapter 7, reusable concepts in chapter 8, decisions in chapter 9, and quality scenarios in chapter 10.

What does not belong here:

A full project management backlog. Keep chapter 11 focused on items that can materially impact the architecture.
Sensitive vulnerability details in public documentation. It is fine to record security risks at a high level, but do not publish exploit steps, internal endpoints, or secret material. Link to a private ticket or security register if needed.
A duplicate of chapter 9. Decisions belong in chapter 9, risks and debt belong here. When a risk is addressed by a decision, link to the ADR.

Tip

If something feels uncomfortable to say out loud, it probably belongs in chapter 11.

Risks vs technical debt

A simple distinction that works well:

A risk is something that might happen. You manage it with mitigation, monitoring, and contingency plans.
Technical debt is something that already happened. You chose a shortcut or postponement, and it has an interest rate.

Both are normal. Hiding them is what hurts.

Open questions and postponed decisions

Postponing architectural choices is often a very good practice. You wait for more certainty, more feedback, and more clarity before committing.

But not making a decision is also a risk. At best, you forget it still needs to be made. At worst, someone assumes it is already decided and starts building based on that assumption.

I use chapter 11 to make those “not-yet-decided” topics visible. It is not only a risk list, it is also a lightweight backlog of decisions that still need daylight.

Tip

Keep open questions visually distinct from risks. Pick a simple marker: a dedicated status like open question, a symbol, or a separate sub-section. That lets you scan them at a glance and prevents them from drowning in the mitigation entries.

The minimum viable version

If you are short on time, aim for this:

3–5 risks that could realistically derail delivery, operations, or stakeholder trust
3–5 technical debt items that you know will slow you down later

That is already enough to stop surprise work. You can always add more later.

Copy/paste structure (Markdown skeleton)

Use this as a starting point.

## 11. Risks and technical debt

Risks are phrased as: _what could hurt us_ + _why it matters_ + _what we will do about it_.

| Risk / debt item | Why it matters | Mitigation / decision |
| :--------------- | :------------- | :-------------------- |
| ...              | ...            | ...                   |

<!-- add likelihood, owner, trigger columns when your process is ready -->

### Known technical debt (optional)

- <intentional shortcut> → <why acceptable now> ; <when to revisit>
- ...

Note

Tables are not mandatory. If you prefer a list format, keep it scan-friendly and keep the same fields per item.

If you are more serious about risk management, you can add likelihood and impact columns as in a risk matrix, and maybe even assign an owner.

Tip

Where you put open questions depends on how you work. If your process is strategy-driven (pick direction first, then refine), keeping open questions in chapter 4 works well, and you can link to chapter 11 when they become concrete risks. If your process is more risk-driven (track uncertainties and mitigation first), keep them in chapter 11 and link back to chapter 4 when they influence strategy.

Example (Pitstop)

Pitstop is my small demo system for this series. It is intentionally simple, so the documentation stays shareable.

Below is a small example list. It is not meant to be complete, it is meant to show the style and the level of detail.

11. Risks and technical debt

Risks are phrased as “what could hurt us” + “what we will do about it”.

Risk / debt item Why it matters Mitigation / decision
Integration ambiguity per Planning Vendor Each vendor has different semantics (cancellations, reschedules, no-shows), causing inconsistent work orders Define a vendor mapping spec + contract tests; keep vendor-specific logic in adapters
Offline sync conflicts Workshop can update while foreman/admin also edits → conflict resolution can become messy Keep conflict rules simple (append notes; validate status transitions); provide “needs foreman review” path
Backlog growth in sync queue Vendor outage or slow API can pile up updates, delaying customer comms Monitor sync_queue_depth; circuit breaker; dead-letter queue + ops playbook
WebSocket instability in harsh networks Real-time UX can degrade unpredictably in garages Configurable fallback to polling (Realtime:FallbackToPollingSeconds); reconnect UX; track disconnect rates
Audit log volume / reporting load Auditability creates data; dashboards can overload OLTP queries Use read models; partition audit table; retention policies; optional replica for reporting

Known technical debt (intentional for v1)

Single backend instance per garage (no HA) → acceptable for v1; revisit for chains.

Minimal conflict resolution UI → acceptable initially; prioritize based on observed conflicts.

Risk / debt item	Why it matters	Mitigation / decision
Integration ambiguity per Planning Vendor	Each vendor has different semantics (cancellations, reschedules, no-shows), causing inconsistent work orders	Define a vendor mapping spec + contract tests; keep vendor-specific logic in adapters
Offline sync conflicts	Workshop can update while foreman/admin also edits → conflict resolution can become messy	Keep conflict rules simple (append notes; validate status transitions); provide “needs foreman review” path
Backlog growth in sync queue	Vendor outage or slow API can pile up updates, delaying customer comms	Monitor `sync_queue_depth`; circuit breaker; dead-letter queue + ops playbook
WebSocket instability in harsh networks	Real-time UX can degrade unpredictably in garages	Configurable fallback to polling (`Realtime:FallbackToPollingSeconds`); reconnect UX; track disconnect rates
Audit log volume / reporting load	Auditability creates data; dashboards can overload OLTP queries	Use read models; partition audit table; retention policies; optional replica for reporting

To browse the full Pitstop arc42 sample, see my GitHub Gist.

Common mistakes I see (and made myself)

Treating this as a shame list
Chapter 11 is not for blame. It is for visibility and prioritization.
No owners
A risk without an owner is a wish. Put a person, role, or team on it.
No next step
A risk without a mitigation is just anxiety in table form. Even “decide in ADR-007” is better than nothing.
Only technical risks
Adoption, workflow fit, vendor behavior, and operations are often where the real pain starts.
Deleting history
Close items explicitly. If something was a risk and it is no longer a risk, document why.
No links back to the architecture story
Risks should connect back to the drivers and trade-offs. Otherwise the list becomes isolated and nobody acts on it.

Done-when checklist

🔲 The chapter lists the risks that could realistically hurt delivery or operations.
🔲 Technical debt items are visible, not hidden in chat and backlog noise.
🔲 Each item has an owner and a next step.
🔲 Items link to the relevant chapters, concepts, decisions, or scenarios.
🔲 The list is reviewed on a cadence (even if it is just “every release”).

Next improvements backlog

Add a simple severity sorting (impact × likelihood) to focus discussions.
Add a trigger column to your table once the basic list is stable.
Link security risks to a private register when details should not be published.

Wrap-up

Chapter 11 is what keeps the architecture document honest. You make risks and debt visible early, then refine them as you learn.

Next up: arc42 chapter 12, “Glossary”, where we build shared language so readers do not have to guess what terms mean.