After the Cluely data exposure — what candidates should actually be using

What happened, briefly

In mid-2025 Cluely's transcript-storage backend was found to be returning user interview transcripts without sufficient authentication. Discovered and disclosed by an external researcher; Cluely patched within ~24 hours and notified affected accounts. The full incident write-up has been covered by TechCrunch and others; this post is not about re-litigating the response.

It's about what the incident illuminated, which is broader than one company.

The category default is "log everything for ML training"

Every interview copilot has the same temptation: the live transcripts and the LLM responses are gold for training the next-generation model. So the default architecture, across the category, is:

Audio captured on the candidate's device
Streamed to vendor's STT
Transcript + LLM response logged to vendor's database
Retained indefinitely, available for ML training (sometimes opt-out, often opt-in by default)

This is the architecture Cluely had, and Cluely is not unusual. Final Round AI's TOS retains transcripts for 24 months. Sensei AI doesn't publish a retention period. LockedIn AI's privacy policy reserves the right to use transcripts for "service improvement." Parakeet AI's TOS allows storage of "session data" with no defined window.

The problem isn't that vendors retain — it's that a 60-minute interview transcript contains:

The candidate's full résumé context (often verbal recap of confidential project details)
The interviewer's identity and company (often a named hiring manager + a specific role)
The candidate's worst answers (the ones they'd never share publicly)
Sometimes salary discussions, sometimes confidential project details from prior employers

A transcript breach is not "an email got leaked." It's "the worst moments of someone's interview, attached to their LinkedIn, indexed to a future search."

What Mirly does differently

Three architectural choices, all verifiable:

1. Transcripts are never stored on our servers

The transcript exists in three places, all on the candidate's device:

The Deepgram WebSocket connection during the live session (ephemeral, closed when the session ends)
The renderer process's in-memory state during the session
An optional local SQLite file if the user enables "interview history" in settings

The Mirly backend (api.mirly.co.uk) sees the question text for ~200ms while it relays it to the LLM provider, and never persists it. The webhook handler in our open-sourced webhook code doesn't even have a transcripts table.

This isn't a privacy promise. It's a schema-level guarantee — there is no place to put a transcript even if we wanted to, because the database tables don't exist.

2. The LLM provider is a configurable choice

You can use our Anthropic key, our Gemini key, or your own BYOK key. With BYOK, the question text doesn't pass through our backend at all — it goes directly from the desktop to your chosen LLM, signed with your key. We never see it.

For users who choose our hosted key, we use Anthropic's zero-retention API endpoints where available — Anthropic agrees not to retain the request beyond serving the response.

3. The desktop app's settings have a "wipe everything" toggle

One click in Settings → Privacy → "Clear local interview history" deletes the SQLite file. We don't sync this anywhere. You quit, your machine has no record of the interview ever happening.

What to look for in any copilot

If you're evaluating any interview tool (not just Mirly), three questions worth asking:

Where is the transcript stored? "Our servers" is a yellow flag; "your device" is a green flag. Ask for the schema name.
How long is it retained? Anything longer than "the active session" requires a justification you should be able to read.
Is the LLM provider configurable? If you can't bring your own key, the vendor's relationship with the LLM company is part of your attack surface.

These are not exotic questions. They're the same questions enterprise IT teams ask before approving any SaaS tool. The fact that they aren't asked of interview copilots reflects how new the category is — and how much the marketing has outpaced the security discussion.

The honest trade-off

There's a real trade-off in this design. Vendors that store transcripts can:

Train better personalization models from real session data
Show you a transcript history across devices
Offer "session replay" features

We can't do the first. The second works if you stay on one device (Mirly is desktop-only by design — the local-only history is the constraint that makes the privacy story possible). The third we'd refuse on principle.

What you get in return: an audit answer that fits in one sentence. "Mirly does not store interview transcripts on its servers." No qualifier, no retention-period clause, no "except for fraud prevention" carve-out. The schema doesn't have the column.