AI · Architecture · 6 min read

Getting AI to answer from your own data

Pointing an AI assistant at your company's documents takes an afternoon. Making it answer reliably — the kind of reliable you'd put in front of a customer — takes considerably longer, and the gap between the two is almost entirely architecture.

It's the most common AI request I hear: "We want it to answer questions about our own stuff — our docs, our policies, our data." The instinct is right. A general model knows a lot about the world and nothing about your business, and the fix is to feed it the relevant material at the moment you ask. The basic loop is genuinely simple: find the passages that relate to the question, hand them to the model, ask it to answer using them. That simplicity is exactly why so many of these projects look great in a demo and then quietly stall before launch.

The demo lies to you

In a demo you ask the questions you already know the answers to, over a tidy set of documents, and it looks like magic. In production, people ask questions your material doesn't cleanly cover, over messy real documents, and the model — being a model — confidently fills the gap with something invented. The failure usually isn't the model. It's that the lookup step handed it the wrong passages, or nothing useful, and nothing in the system noticed or cared.

Bad context in, confident nonsense out. The quality of these systems is decided mostly by what you retrieve, not by which model you pick.

Where the real decisions are

The parts that determine whether people trust the answers sit upstream of the model:

How you split the documents. Chopping text into naive fixed-size blocks destroys meaning. Split on natural structure, keep enough surrounding context, and carry source information (which document, which section, what date) through every step.
How you find the right passage. Matching on meaning alone misses exact terms and product names; matching on keywords alone misses intent. Combining both, then re-ranking the candidates, is what moves you from "usually relevant" to "reliably relevant."
Showing its work. Make the assistant point to the exact passages it used and surface those to the reader. Citations aren't just polish — they're how you and your users catch a wrong answer before it does damage.
Knowing when to say "I don't know." A system that declines when it can't find good support is worth ten that always answer. That restraint is the single most under-built part of these projects.

A shape that holds up

The systems I've built that survived contact with real users share a common skeleton: a pipeline that prepares and indexes the source material with rich source metadata; a lookup layer that combines meaning-based and keyword search and then re-ranks; an answer step with strict instructions to stick to the supplied material and cite it; and — most importantly — a test set of real question-and-answer pairs that runs on every change, so you can actually tell whether a tweak helped or quietly made things worse.

That last piece is the one teams skip and later regret. Without it, tuning is guesswork dressed up as progress. With it, you can move quickly and prove you're moving in the right direction.

Don't forget the boring parts

Moderation on what goes in and what comes out, handling of personal and sensitive information, access controls so the assistant only surfaces what a given user is allowed to see, cost and speed budgets, and a feedback loop that captures the questions it answers badly. None of this is glamorous. All of it is the difference between a clever prototype and something a business can actually stand behind.

This work isn't hard because the concept is hard. It's hard because "good enough to demo" and "good enough to trust" look almost identical — right up until someone asks the question you never tested. Build for the second one from the start.

Want AI grounded in your own data? Let's talk →