AI + Design

AI features users actually trust

A fee earner dismissed an AI suggestion – not because it was wrong, but because nothing told her why to believe it. Five design decisions that emerged from building AI features for legal professionals, and what separates trust-building features from those users ignore.

Originally written May 2025

We were in a UserZoom session. A fee earner was looking at an AI suggestion on her screen – a recommended action based on the case summary we'd built. The suggestion was correct. She dismissed it anyway. I asked why.

"I don't know where it came from," she said. "How do I know it's right?"

That session changed how I thought about AI feature design. The problem wasn't the AI. The problem was that nothing in the interface gave her any reason to trust it. The suggestion appeared confident and unexplained, and she had no framework for evaluating it. Dismissing it was the only rational response.

What the research said

Among 110+ legal professionals in UserZoom studies, the same pattern appeared across every AI feature we tested: trust was not about accuracy. It was about stake-calibrated transparency.

Users trusted AI for routine admin – suggesting a next action on a low-stakes task, flagging an email for follow-up, and surfacing a case deadline. They were significantly more sceptical when the stakes rose. The same suggestion that would be accepted for a diary reminder would be challenged for a billing decision.

The principle that came from this:

trust scales with stakes.

A feature that treats routine and high-stakes tasks the same way will earn uneven trust across both.

Five things trust-building features share

Show the reasoning. An AI suggestion without a reason is a guess dressed as confidence. The sentiment analysis notification showed fee earners not just that an email needed attention but why – the tone analysis that flagged it. Users who could see the reasoning could evaluate it. Users who couldn't have had no choice but to blindly accept or dismiss it.

Calibrate to stakes. Routine tasks can be suggested quickly. High-stakes actions need confirmation and visibility of impact before acting. The decision tree that turns an AI summary into a recommended next step works because it makes the stakes explicit – this is what happens when you take this action. Low friction for low stakes and deliberate for high.

Make the override obvious. Every AI feature we shipped followed one rule: one action to dismiss or change. Not buried in settings, not requiring an explanation. If the AI is wrong and the user can't easily correct it, they'll stop using the feature rather than fix every instance.

Earn trust, don't design for it. No interface choice makes a user trust an AI that's consistently inaccurate. The Niland Test – built specifically to evaluate prompt performance against real users – existed because accuracy is what converts early sceptics. Design creates the conditions for trust. Accuracy sustains it.

Recommend, don't decide. Every AI feature we shipped surfaces intelligence and shows its reasoning. None act autonomously. Users wanted AI to help them make better decisions, not to make decisions for them. The moment a feature stepped past recommender and into decision-maker, trust collapsed.

What this looks like in practice

Across four shipped AI features – Copilot AI, the sentiment analysis notification, the decision tree, and the case summary – the same five principles appeared in different forms. Copilot AI earned trust by showing sources and allowing users to edit before accepting. The sentiment analysis earned trust by explaining why it flagged an email rather than just flagging it. The Decision Tree earned trust by making the recommended path explicit and the alternatives visible. The case summary earned trust by attributing information to specific case documents so users could verify.

In each case, the AI was accurate enough to earn our use. But accuracy alone wouldn't have got there – the interface had to create enough transparency for users to evaluate what the AI was doing before they trusted it enough to act on it.

What it changed

The most useful shift was moving away from thinking about AI trust as a design problem and toward thinking about it as a research problem. The question stopped being "How do we design a feature users will trust?" and became "For this specific user, for this specific task, at this specific stake level – what does trust actually require?" The answers were different every time.

Legal professionals who'd never interacted with AI features needed to see reasoning they could evaluate. Legal professionals who'd used the features for months needed the reasoning to step back and let accuracy do the work. Trust is dynamic, not fixed – it builds or erodes with every interaction, and the interface needs to support both stages.

A note on trust

Trust in AI features isn't designed. It's earned through accuracy and enabled through transparency.

A feature can be designed to feel trustworthy without being trustworthy. The patterns – showing reasoning, calibrating to stakes, making override easy – aren't tricks. They're conditions that give accurate AI a chance to be recognised as accurate. If the accuracy isn't there, no design decision closes the gap.

The most reliable measure of whether a feature is building trust is whether users who dismissed it initially start accepting suggestions after a few weeks of use. That's the arc worth designing for – not a first impression of trustworthiness, but a gradual recognition that the feature knows what it's doing.

Let's talk. Open inbox, always.

Whether it's a question about something I've written, an interesting design problem, or a hello from another designer working on hard things – email's the best way to reach me.