Designing human-in-the-loop governance for AI agents — White Paper

Introduction

Agentic systems change the design problem. Once AI moves from drafting answers to initiating actions on behalf of people and organisations, the centre of gravity shifts from output quality to decision authority: what may happen automatically, what requires explicit consent, and how oversight stays meaningful rather than symbolic.

Established frameworks — including NIST’s AI Risk Management Framework and emerging guidance on agentic AI — converge on a lifecycle view of trustworthiness: risk tiering, constrained capability, human oversight where it counts, traceability, monitoring, and clear accountability across roles. The gap for many enterprises is not principles on paper; it is operational translation into products, services, and day-to-day workflows.

This paper proposes a design-led lens on human-in-the-loop governance: treating autonomy boundaries as decisions about interfaces, information architecture, escalation paths, and operating rituals — not only policy clauses or model checkpoints.

Human-in-the-loop governance is not a checkbox. It is an architectural decision about who holds judgment, when, and with what visibility into what the agent is doing.

Why this matters now

Agentic AI is different from basic chat-based assistants. Instead of just answering questions, these systems can pursue goals, invoke tools, and take sequences of actions with limited direct supervision. They book, update, escalate, reconcile, notify, and sometimes commit organisations to real-world decisions. Oversight failures here do not just produce a wrong paragraph; they can break a process, a contract, a relationship, or a regulatory obligation.

At the same time, governance frameworks are maturing. NIST’s AI Risk Management Framework describes governance as a set of ongoing functions — Govern, Map, Measure, Manage — flowing through the entire lifecycle of an AI system, not a single review gate. Work on governing agentic systems from model providers and practitioners emphasises calibrating oversight to risk, constraining action spaces, designing approval points, ensuring legibility and logging, and keeping systems interruptible.

The tension is obvious: require approval for every action and you lose the value of autonomy; remove humans entirely and you inherit unbounded risk. Human-in-the-loop governance is one way to resolve that tension — if it is treated as a designed system, not just a slogan.

Three modes of human oversight

A practical governance model recognises that not all agent actions are equal. The oversight pattern should match the consequence profile, not the technical novelty.

01

Human in the loop — for decisions that really matter

Use this mode when an agent is about to perform an action that is hard to reverse, materially affects a person or organisation, or has regulatory, legal, or ethical implications.

Examples:

Approving or denying a loan, claim, or application.
Moving significant funds or committing to a contract.
Changing access rights, roles, or permissions.
Making a health, safety, or high-impact employment decision.

In this mode, the agent can propose an action but cannot execute it without explicit human approval. The human sees:

What the agent wants to do.
Why it thinks that is appropriate.
The key data and policy checks that informed it.
The most important risks and alternatives.

02

Human on the loop — supervised autonomy

Here the agent operates within a pre-agreed envelope while humans supervise behaviour over time. This is a good fit for moderate-risk, high-volume operations where constant approval would be too slow, but unbounded autonomy would be too risky.

Examples:

Routing tickets and tasks within defined boundaries.
Adjusting non-critical system settings within safe ranges.
Drafting responses or updates that humans can later refine.
Orchestrating workflows that remain reversible.

Governance behaves like a monitoring and escalation system:

Thresholds and rules trigger alerts when something unusual happens.
Humans can drill into logs and reasoning to inspect what occurred.
Exceptions and anomalies go into a queue for review.
Performance and incident patterns are regularly analysed and fed back into design.

03

Human out of the loop — bounded autonomy for low-risk work

This mode is for low-risk, reversible, tightly bounded tasks where manual review would create friction with little governance benefit. Even here, it is not “no governance”; it is governance through design.

Examples:

Formatting content, summarising internal notes, tagging documents.
Running internal housekeeping workflows (syncing labels, archiving old items).
Suggesting next actions on non-sensitive internal tasks.

In this mode you still want:

Logging of actions and changes.
Periodic sampling or retrospective review.
Clear rules about what this agent is never allowed to touch.

Seven building blocks for human-in-the-loop governance

When thinking about how to make this real in product and service design, seven building blocks keep appearing.

Decision rights by risk tier

Not every decision deserves the same level of oversight. A risk tiering model groups agent actions by:

Impact if wrong.
Reversibility.
Volume and frequency.
Legal, regulatory, or reputational sensitivity.

Each tier gets an explicit governance pattern: mandatory approval, supervised autonomy, or bounded autonomy.

Constrained action space

Agents should never have a blank cheque. Their action space is designed and constrained through:

Limited tool access and permissions.
Guardrail policies and hard boundaries.
Rate limits, budgets, and transaction caps.
Clear separation between “suggest” and “execute.”

The goal is to make it difficult for agents to do the wrong kind of work, not just to detect it after the fact.

Meaningful review interfaces

Human approval is only as good as the interface that supports it. Review surfaces should make it easy to answer:

What exactly is being proposed?
Why now?
What are the main risks, and how big are they?
What are the alternative options?
What happens if we do nothing?

That is an information architecture and interaction design problem, not just a back-end flag.

Legibility and traceability

People need to see what the agent did, when, and why. Legibility and traceability show up as:

Action logs tied to specific agents and sessions.
High-level narratives of what happened (“story of the incident”).
Access to underlying signals where appropriate — without overwhelming users.
Links from outcomes back to the decisions and approvals that shaped them.

This is essential for debugging, incident response, audit, and trust.

Continuous monitoring

Agent behaviour drifts as data, environments, and integrations change. Governance needs monitoring that looks for:

Escalation patterns and exception hotspots.
Changes in intervention rates (too low or too high).
Repeated edge-case failures.
Signs that new tools or workflows are breaking original assumptions.

Monitoring is not just technical metrics; it includes qualitative feedback from operators and users.

Interruptibility and escalation

Agents must be stoppable. Interruptibility ensures that when something looks wrong, humans can:

Halt a specific action or workflow.
Temporarily suspend a capability or integration.
Shut down an entire agent or class of agents if needed.

Good design also defines who can do this, under which conditions, and what happens immediately after — for example, fallback behaviours and communication to affected users.

Accountability architecture

Finally, governance breaks without role clarity. Effective systems define:

Who owns the agent and its scope.
Who is responsible for policy and risk decisions.
Who reviews incidents and takes corrective action.
How responsibilities are shared between model providers, system integrators, and the organisation deploying the agent.

This is as much organizational design as it is technical architecture.

Design lens

Governance failures often look like interface failures: the right control existed in theory but did not survive contact with deadlines, cognitive load, and organisational pressure. A serious design-led approach spans:

Interface design Clarity at decision points; progressive disclosure without hiding consequence.
Workflow design Where judgment lands in the journey; how exceptions propagate.
Information architecture How logs and narratives assemble so reviewers see a coherent story.
Service design Cross-functional handoffs among product, legal, risk, operations, and data teams.
Operational visibility Executive and practitioner views that support intervention — not vanity dashboards.

This is the terrain where experienced product and design leaders translate abstract safeguards into systems people can run — especially in regulated or large-scale transformation contexts.

From framework to artifacts

To make this concrete, this framework can produce a small set of practical artifacts:

Risk-tier matrix – mapping common agent actions to risk tiers and oversight modes.
Approval pattern library – standard UI patterns for high-stakes human decisions.
Governance service blueprint – showing how product, operations, risk, and leadership interact when the agent runs into edge cases.
Oversight dashboard concept – visualising exceptions, intervention rates, incident clusters, and the “health” of autonomy over time.

These artifacts give leaders and teams a shared way to talk about, design, and evolve human-in-the-loop governance over time.

Closing thoughts

The question is no longer whether organisations will use AI agents. The question is how much autonomy those agents will get, and how clearly humans will remain in charge when things are complicated, ambiguous, or high stakes.

Human-in-the-loop governance is not about slowing everything down. Done well, it is how organisations move faster with AI — because they know where agents are free to act, where humans must decide, and how to keep that boundary honest as the system evolves.