Privacy

May 31, 2026 · 9 min read

Not all AI analysis treats your data the same way — and where your respondents’ identities end up matters more than what the model finds.

Confidential by Design: How Reactor’s AI Reaches Insight Without Ever Learning Who’s in Your Data

Every AI assistant has to be fed your data to analyse it. Ours is engineered so the assistant works on anonymous tokens — never the real names, accounts, or contact details — and the key that could turn those tokens back into people never leaves your environment.

The question to ask before you let AI near your research

AI-assisted analysis is genuinely transformative. Point an assistant at a study and it will summarise thousands of open-ends, surface themes, and answer in seconds the questions that used to take an analyst a week. The pull is real, and we feel it too. But there is a question every serious research team should ask before they let any assistant touch their data, and it is not “how clever is the model?” It is the quieter one: where does the data actually go?

Because to analyse a study, something has to feed the data to the model — the verbatims, the respondent records, the client’s account name, the project it belongs to. That material is some of the most sensitive a research business holds: personally identifiable information from respondents who were promised confidentiality, and commercial details belonging to clients who are paying for discretion. The moment it leaves your control, you are trusting someone else’s policy about how it is logged, retained, and reused.

That is the honest shape of the problem, and anyone who tells you their AI feature is private without explaining where your data goes is asking you to take it on faith. We would rather show you the work.

The common shortcut — raw data to a model you don’t control

It helps to know how most AI analysis is actually wired. The fast way to build it — and the way a great deal of the market has built it — is to take your data more or less as it sits and send it to a general-purpose model hosted by a third party. It is quick to ship and it works. It also means the names, emails, and verbatims travel to a system you do not run, where they may be retained or used to improve someone else’s product, governed by a policy you did not write and cannot see into.

This is not an accusation aimed at anyone in particular; it is simply the default shape of bolting a hosted model onto a research tool. But the default trades away the one thing a research platform cannot afford to give up: control over confidential data. Confidentiality that depends on a vendor’s promise not to misuse what you sent them is a weaker thing than confidentiality where the sensitive data was never sent at all.

Our stance: the assistant works on tokens, not identities

Reactor takes the opposite path. Before the assistant sees anything, every confidential value is replaced with a stable, meaningless token. “Acme Corporation” becomes an anonymous reference; a respondent’s name, a moderator’s email, the project and the client account each become a token that carries no identifying information of its own. The assistant then performs its full analysis — themes, summaries, comparisons, follow-up questions — on the tokenised data.

The insight does not suffer, because the assistant never needed the identities to do the reasoning. It does not need to know a respondent’s name to tell you what that respondent thought. It groups, counts, contrasts, and explains over tokens exactly as well as it would over names — and when it needs to act on its conclusions, Reactor translates the tokens back into real values inside your environment, so the work still lands on the right records.

The assistant doesn’t need to know who your respondents are to tell you what they think. So we built it so that it never does.

Where we draw the line — and why we say so

Start with the limit, because a privacy story that hides its limit is not a privacy story. Detecting confidential information inside free-flowing text is a hard problem, and no automated system catches every conceivable string with perfect recall. We do not pretend otherwise. What we do instead is stack the detection so misses are rare, and engineer the system so the consequence of a miss is bounded.

Here is the bound that matters: the key that links a token back to a real person never leaves your environment and is never sent to the model. So even in the event a stray fragment of text were not tokenised, what reached the assistant would be a fragment — not a dataset, not a contact list, and never the means to re-identify anyone at scale. The crown jewels — the map from tokens to identities — stay home, encrypted, under your control. That is the promise we actually make, and it is one we can keep.

The identity map never leaves your environment

The link between a token and the real value it stands for lives in a single, encrypted store inside your own environment. It is held under strong encryption at rest, it is readable only by the parts of the system that need it to perform the translation, and — this is the part that matters most — it is never transmitted to the AI provider. The model receives tokens and returns tokens; the re-identification happens locally, on your side of the line, only when it is needed to act on a real record.

This is the structural difference between confidentiality-by-architecture and confidentiality-by-promise. We are not asking a model provider to be careful with your respondents’ identities, because we never hand them over. If the analysis data were somehow exposed, it would be a collection of anonymous tokens — not a directory of the people behind them.

Sanitised before it’s stored, not just before it’s shown

There is a subtler trap that many privacy features fall into: they clean the data on the way to the screen but keep a raw copy sitting in a working store underneath. Reactor does the sanitisation earlier. The data the assistant analyses is tokenised before it is written to the local analysis store, so the protected form is the stored form — there is no shadow copy of the raw identities waiting in a cache to be queried.

In practice this means the assistant’s entire working surface — every response it can read, every record it can search — is already anonymous by the time it exists. Protection at rest, not merely protection in transit, is what makes the guarantee hold up under scrutiny rather than only under a demo.

How the detection actually works

Recognising what to tokenise is layered on purpose, because no single technique is enough on its own. Structured fields — the columns a record explicitly labels as a name, an email, an account, an address — are recognised for what they are and tokenised every time. Contact patterns that betray identity regardless of where they appear — email addresses, phone numbers, government identifiers — are detected by shape and caught even when they are buried in a sentence.

On top of that, Reactor builds and continuously grows a registry of your real entities — the customers, projects, people, and accounts that actually exist in your data — and sweeps that registry across everything, including the open-ended verbatims where sensitive details most often hide in plain prose. A proper noun the system encounters is added to that registry, so once a name is known anywhere it is tokenised everywhere it appears: consistently, across every response and every report.

And all of it funnels through a single, mandatory checkpoint. Every piece of data the assistant receives passes through one sanitisation layer with no path around it — no tool, no shortcut, no special case. Consistency matters as much as coverage here: because the tokens are stable, the same person is the same token in every place they appear, which is exactly what lets the assistant reason about them without ever knowing who they are.

The assistant still does great work — that’s the hard part

It would be easy to protect data by crippling the tool. The real engineering is keeping the assistant genuinely capable while it works blind. That is why Reactor translates in both directions: tokens going out to the model, real values coming back in — inside your environment — when the assistant needs to act on a conclusion, open the right project, or filter the right responses. The assistant reasons in token-space and operates in real-space, and the boundary between the two is the encrypted map that never travels.

This two-way translation, the layered detection, the encrypted local registry, the sanitise-before-store discipline, the single unavoidable checkpoint — none of it is a setting you toggle on. It is deliberate architecture, built so that an assistant can do first-rate analytical work on data it is never allowed to see in the clear. That careful treatment is the price of letting AI loose on confidential research without giving the confidentiality away, and we paid it on purpose.

Why this is harder than it looks — and why we did it anyway

We will keep being precise about what this does and does not claim, because that is how a security team is supposed to evaluate a vendor — on the merits, not on the marketing. Tokenisation reduces identifiability; it does not make data magically non-personal, and under the GDPR and UK GDPR pseudonymised data is still personal data, handled accordingly.¹ Automated detection is layered and thorough, but detection is probabilistic, not infallible — which is precisely why we anchor the guarantee on the key that never leaves rather than on a claim that every string is always caught.²

The easy path was to send the raw data to a hosted model and write a reassuring sentence about it. We took the harder path because the easy one quietly spends your respondents’ confidentiality and your clients’ trust to save engineering effort. For a research platform, that is not a trade worth making.

The bottom line

You get the speed and the depth of AI-assisted analysis. Your respondents keep the anonymity they were promised. Your clients keep the discretion they are paying for. With Reactor those are not competing goals you balance against each other — the whole design exists so you do not have to choose.

That is the Science of Choice applied to the part of modern research you have to trust most: the moment an intelligent system reads your data. Confidentiality here is not a feature we switched on. It is the shape of how the thing was built.

What you’re protecting	Typical hosted-model AI analysis	Reactor
Identities seen by the AI provider	Sent up with the data	Tokenised before the model sees anything
The re-identification key	Travels with the data / held by the vendor	Encrypted, stays in your environment, never transmitted
Data at rest for analysis	Often a raw working copy	Sanitised before it is stored — no raw shadow copy
Free text & verbatims	Left to the model to handle	Multi-layer detection plus a growing registry swept across all text
Can the AI still do the work	Yes — on raw data	Yes — on tokens, with in-environment translation to act
How you verify it	A vendor policy taken on faith	A registry inside your environment, behind one enforced checkpoint

What buyers increasingly weigh when an AI feature touches confidential research data — and how the common “send it to a hosted model” approach compares with confidentiality-by-architecture. Comparisons describe general architectural characteristics, not any named product.³

¹Tokenisation/pseudonymisation reduces identifiability but does not render data anonymous; under the GDPR and UK GDPR, pseudonymised data remains personal data and is processed accordingly. Statements describe the system as designed and engineered to operate under normal conditions and are not warranties that the product is unbreachable, error-free, or that all confidential information is detected in every case. Any product warranties are set out solely in the applicable written agreement.²Detection of confidential values combines structured-field rules, pattern matching, and a continuously updated registry of known entities, applied across structured data and free text. Detection is probabilistic and layered; no automated method identifies every possible identifier in free text with complete accuracy. The property that the token-to-identity mapping is not transmitted to third-party model providers is a characteristic of the system architecture and is independent of detection coverage.³Comparative statements refer to general architectural characteristics of AI analysis tools that send data to third-party hosted models and do not assert that any specifically named product is insecure, mishandles personal data, or transmits data unlawfully. Each platform’s data-handling practices are governed by that platform’s own documentation and policies. CloudArmy and Reactor are marks of CloudArmy; any third-party names are used nominatively for identification only, and no affiliation, sponsorship, or endorsement is implied.