The case for boring AI: features your users don't notice

Every AI roadmap conversation in 2026 starts the same way. The team has a copilot or chat surface in mind. They've seen a competitor ship one. The CEO read an essay. The pitch deck has a with-AI bullet that has to be backed by something visible. By the time we're talking, the question isn't whether to add chat — it's where to put it.

Almost every time, the right answer is somewhere else entirely. The AI features that actually improve product metrics are invisible. Users don't identify them as AI. They just notice the product works better than it used to.

What boring AI looks like

01Search ranking. Most product search engines rank by exact-match keyword and recency. An embedding-based re-rank on top — even cheap — lifts relevance materially. Users notice 'search got better'. They don't notice why.
02Auto-deduplication. CRMs, knowledge bases, catalogues — anything with user-submitted entries — drift toward duplicates. A nightly embedding similarity job merges or flags the obvious ones. Database stays cleaner; downstream queries return better results.
03Smart defaults. Pre-filling fields based on what similar users picked. Suggesting tags from content. Recommending the next obvious action. The user keeps full control; the product stops asking the same questions twice.
04Auto-tagging. Categorisation of user content into existing taxonomy without asking the user. Email apps do this with priority. Photo apps with people. CRMs with intent. Every minute it saves is a minute the user doesn't know they didn't spend.
05Anomaly detection. Quiet alerts for outliers in dashboards, logs, support tickets. Catches issues earlier; produces a 'how did you notice this?' moment that builds trust.

Why visible AI loses more often than it wins

Chat surfaces have three structural disadvantages.

First, they require behaviour change. The user has to learn to ask the right question, in the right format, with the right context. Most users don't. The feature ships, sees a usage curve that peaks in week two and decays through month three, and then the team rebuilds it.

Second, they fail loudly. When invisible AI is wrong, the user doesn't know it was AI. The bad result looks like a normal product limitation — easy to forgive. When chat is wrong, it produced an authoritative-sounding answer that the user trusted and acted on. Trust loss compounds; recovery is expensive.

Third, they cost more per query. Chat surfaces use frontier models, often with long context windows. The unit economics rarely work outside enterprise tiers. Boring AI runs on smaller models — sometimes a fine-tuned classifier, sometimes a single embedding pass — at a tenth the cost.

When chat is actually right

A small set of cases. Chat is the right surface when the underlying task is genuinely open-ended (a documentation Q&A bot), when the user pool is sophisticated (developer tools), or when the alternative is a structured form that's worse (data exploration tools where the queries can't be predicted).

Outside those, chat is usually a worse version of a feature that should have been a smart default or an automated background job. The bar to ship chat should be: we tried the invisible version first and it wasn't enough.

The harder design problem

Invisible AI is harder to ship than visible AI, paradoxically. Visible AI gets a UI surface, a launch announcement, a metric to point at. Invisible AI requires designing a measurement scheme for a thing the user isn't supposed to notice. Did search-with-ranking lift conversion? You need an A/B test. Did auto-tagging save user time? You need a stopwatch on the without-version.

Most teams skip the measurement and ship the visible thing because it's easier to demo. The visible thing then gets killed in the next strategy review for not moving a number. The invisible thing would have moved a number, but never got built.

Practical defaults

If you're sketching an AI roadmap right now, the order of operations we'd suggest:

01List the five places your product asks the user to make a choice. Pick the one with the most data behind it. Add a smart default. Measure the change in completion rate.
02Audit your search analytics. Find the queries that return zero results or that users abandon. Build embedding-based re-ranking against your existing corpus. Measure search-driven conversion.
03Find the manual categorisation step somewhere in your product. Replace it with auto-tagging. Measure time-to-action.
04Only after those: consider whether a chat surface earns its place. It usually doesn't, and the budget you saved funds two more invisible features.

If you're trying to figure out where AI fits in your product without shipping a copilot you'll regret, the discovery call is the place to map the invisible-AI surface against your roadmap.

What boring AI looks like

01Search ranking. Most product search engines rank by exact-match keyword and recency. An embedding-based re-rank on top — even cheap — lifts relevance materially. Users notice 'search got better'. They don't notice why.
02Auto-deduplication. CRMs, knowledge bases, catalogues — anything with user-submitted entries — drift toward duplicates. A nightly embedding similarity job merges or flags the obvious ones. Database stays cleaner; downstream queries return better results.
03Smart defaults. Pre-filling fields based on what similar users picked. Suggesting tags from content. Recommending the next obvious action. The user keeps full control; the product stops asking the same questions twice.
04Auto-tagging. Categorisation of user content into existing taxonomy without asking the user. Email apps do this with priority. Photo apps with people. CRMs with intent. Every minute it saves is a minute the user doesn't know they didn't spend.
05Anomaly detection. Quiet alerts for outliers in dashboards, logs, support tickets. Catches issues earlier; produces a 'how did you notice this?' moment that builds trust.

Why visible AI loses more often than it wins

Chat surfaces have three structural disadvantages.

When chat is actually right

The harder design problem

Practical defaults

If you're sketching an AI roadmap right now, the order of operations we'd suggest:

01List the five places your product asks the user to make a choice. Pick the one with the most data behind it. Add a smart default. Measure the change in completion rate.
02Audit your search analytics. Find the queries that return zero results or that users abandon. Build embedding-based re-ranking against your existing corpus. Measure search-driven conversion.
03Find the manual categorisation step somewhere in your product. Replace it with auto-tagging. Measure time-to-action.
04Only after those: consider whether a chat surface earns its place. It usually doesn't, and the budget you saved funds two more invisible features.

If you're trying to figure out where AI fits in your product without shipping a copilot you'll regret, the discovery call is the place to map the invisible-AI surface against your roadmap.

The case for boring AI: features your users don't notice

What boring AI looks like

Why visible AI loses more often than it wins

When chat is actually right

The harder design problem

Practical defaults

More from the journal.

Why AI makes things up — and what to do about it

RAG before fine-tuning: a practical default for AI in production

Have a project that touches this?

Routing through the lattice.

One short email a month, no fluff.

The case for boring AI: features your users don't notice

What boring AI looks like

Why visible AI loses more often than it wins

When chat is actually right

The harder design problem

Practical defaults

More from the journal.

Why AI makes things up — and what to do about it

RAG before fine-tuning: a practical default for AI in production

Have a project that touches this?