Transcribe Research Interviews in ChatGPT (Privacy-First, Free to Start)
For UX researchers, qualitative academics, and any team running participant interviews: this is the ChatGPT workflow that respects your data. OpenAI training is off on the Whipscribe GPT, files are processed by Whipscribe (not OpenAI), default retention is 7 days, and speaker labels plus timestamps come standard. One project, one Knowledge folder, real coding work in the chat.
What this workflow is for
Qualitative research workflows that involve recorded interviews — UX research, dissertation fieldwork, ethnography, employee research, customer-discovery cycles. The common shape: 30-60 minutes of audio per session, 5-30 sessions per study, transcripts that need speaker labels and timestamps so you can verify quotes against the audio while you code.
The specific job ChatGPT is good at, with a real diarized transcript in front of it: theme extraction. Reading 12 hours of transcripts to find the patterns is the part of qualitative research most people quietly hate. ChatGPT can do a competent first-pass coding sweep across a whole study folder in minutes, and you spend your time on the second-pass refinement instead of the first-pass slog.
Setup — 90 seconds, one time
The full setup walkthrough is in the Custom GPT vs MCP Connector guide. The short version for researchers:
- Open the Whipscribe Custom GPT in ChatGPT (works on the free plan).
- The first time you ask for a transcript, sign in to Whipscribe with the email you'll use long-term — ideally an institutional address if you have one.
- Drop the .m4a (or .wav, .mp3, .mp4) into the message box.
If you're on ChatGPT Plus or Pro and want Whipscribe available in every conversation, add it as an MCP Connector at https://whipscribe.com/mcp in Settings → Connectors.
Record once, transcribe right
The transcript quality is set by the recording quality more than by the model. A few field-tested practices that make a noticeable difference:
- Single mixed track. If your platform offers it (Zoom, Riverside, Cleanfeed), use the single-file export. Two-track recordings need post-merge and most tools won't accept them natively.
- Mid-distance mic. A phone or omnidirectional recorder at arm's length between speakers beats a distant boundary mic. You're optimizing for transcript accuracy, not radio-quality audio.
- Quiet room. Background noise is the single biggest accuracy killer. A 30-second test recording before the session catches the AC unit you didn't notice.
- Verbal consent on the tape. Have the participant give consent on the recording itself ("yes, you can record and transcribe this conversation for the [study name] research"). The audio is the legal record.
The transcribe-and-code prompt
Once the file is uploaded, the prompt that produces a useful first-pass coding sweep:
Transcribe this research interview with speaker labels and word-level
timestamps. Speaker 1 is the researcher; Speaker 2 is the participant.
Then do a first-pass thematic coding pass:
1. THEMES — identify 4–8 themes that emerge from the participant's
responses. For each theme:
- 1-sentence description
- 2 representative quotes from the participant, each with timestamp
- Frequency count: how many distinct turns mention the theme
2. SURPRISES — moments where the participant said something I'd want
to follow up on in a next interview. Quote with timestamp.
3. CONTRADICTIONS — turns where the participant contradicts themselves
or hedges in a noticeable way. Quote with timestamp.
4. RESEARCHER QUESTIONS — list every question I asked, in order.
Useful for protocol audit.
Save the transcript to my Knowledge folder named "[study name]" so I
can run cross-interview queries later. If the folder doesn't exist,
create it.
Tag the transcript with the participant ID I'll provide in my next
message.
The pattern researchers most often miss when first using ChatGPT for this: asking for the protocol audit (item 4). After the first three interviews, scanning your own asked-questions list across sessions surfaces leading questions or skipped topics in a way that reading transcripts doesn't.
First 30 minutes of audio per day are free. No card required.
Open the Whipscribe GPT →What the themes output looks like
Sample output on a 45-minute participant interview about onboarding to a SaaS product. Names and quotes are illustrative:
Theme · Setup friction in the first session
Participants describe the initial setup as the moment they considered abandoning the product. Account creation and first integration are the named pinch points.
"I almost gave up trying to connect the Slack integration. Three different screens, no clear path, and I wasn't sure if it had even worked."
Participant · 00:14:22 · 4 turns mention this themeTheme · Trust through demos vs trust through quickstarts
Two participants explicitly preferred a guided demo with a sales rep over a self-serve quickstart, despite being technical buyers. Trust-building is named.
"I just don't believe a quickstart that says 'in 5 minutes.' I'd rather watch a real person walk through it once."
Participant · 00:28:50 · 3 turns mention this themeTheme · Pricing-page anchoring
The pricing page is the second-highest-mentioned surface after the dashboard. Anchoring against a competitor's pricing came up explicitly.
"I had your tab open next to the [competitor] tab. The fact that you don't show a per-seat number was actually a turn-off."
Participant · 00:36:41 · 5 turns mention this themeTheme · Mobile expectations for B2B
Mobile parity expectations are higher than the team's stack assumes. Two participants wanted to triage notifications on phone.
"I don't need the full app on mobile, but I should be able to clear my inbox while walking to the next meeting."
Participant · 00:41:18 · 3 turns mention this themeThe format ChatGPT returns in the chat is essentially a coding worksheet — themes you can carry into your second-pass deep coding, with quotes pre-attributed to timestamps that link back to the audio in your Whipscribe library.
Cross-interview queries — the payoff of the Knowledge folder
The compounding value of saving each transcript to the same Knowledge folder shows up around interview 5. Once enough sessions are in the folder, you can ask cross-interview questions inside the chat:
Cross-interview query
From all transcripts in my "Onboarding study" folder, list every
distinct mention of the Slack integration. For each mention,
include the participant ID, the verbatim turn, and the timestamp.
Group by sentiment: positive, negative, neutral.
The response cites the participant turns directly with timestamps so you can verify each one against the original audio. This is the work that takes a researcher 3-4 hours of re-listening across a 12-interview study and that the GPT can produce in under a minute. You still verify, you still write the analysis — the brute-force searching is the part that disappears.
The privacy boundary, in plainer words
For research with human participants, "where does the audio live?" is a real question. Here's the boundary, end-to-end:
- Recording stays on your device until you upload it.
- Upload happens through ChatGPT, which hands the file to Whipscribe over your authorized OAuth connection.
- Transcription runs on Whipscribe infrastructure — Whisper-family models on GPUs Whipscribe operates. The audio doesn't pass through OpenAI for transcription.
- The text comes back to ChatGPT for the reasoning steps (theme coding, summarization, etc.). At this point ChatGPT's standard data handling applies — the Whipscribe Custom GPT is configured to disable training on user-uploaded content.
- The transcript file is saved in your Whipscribe library (visible at whipscribe.com/home) under your account only. Sharing requires explicit folder-share.
- Default retention on raw audio is 7 days. The text transcript and any Knowledge-folder items persist until you delete them.
For interviews that need a stricter privacy boundary
Some research has a privacy bar above any hosted tool: medical interviews under HIPAA, legal depositions, anything where the recording itself can never leave your machine. For those:
- Run an offline Whisper pipeline locally —
faster-whisperpluspyannotefor diarization. The audio never leaves your laptop. The tradeoff is setup time (30-90 minutes the first time) and slower transcription (3-5 min per hour of audio on a typical laptop). - Then use ChatGPT only for the analysis pass on the resulting text transcript — text-only is a much smaller privacy surface than audio.
Our journalist interview workflow post covers the offline-Whisper option in more depth; the same setup applies to research with stricter privacy requirements.
What ChatGPT alone can't do (and where it fits)
Two things ChatGPT plus Whipscribe is good at: theme extraction at speed and cross-interview pattern queries against a Knowledge folder. Two things it isn't a substitute for:
- Methodology choice and rigor. Whether grounded theory, IPA, framework analysis, or thematic analysis is the right approach — and whether your codebook is defensible — is your judgment. The tool helps execute faster; it doesn't pick the method.
- Reflexivity and bias awareness. An LLM coding pass is faster than yours but less self-aware. Always do a second-pass review where you challenge the codes the LLM produced — particularly in studies where participant identity matters to interpretation.
What this saves you, in honest hours
For a typical 12-interview UX or qualitative study with 45-minute sessions:
- Transcription: machine instead of manual or paid human service — saves the 8-12 hours you'd spend re-listening or the $300-500 you'd pay a human service.
- First-pass coding: a sweep across all 12 interviews via the cross-folder query — saves 4-6 hours of brute-force highlighting and tagging. You still do the second-pass refinement, which is the part where the actual research insight happens.
- Cross-interview triangulation: the "where was X mentioned across the study?" lookup — saves 2-4 hours per question across the analysis phase.
Honest take: this doesn't make qualitative research a one-day job. It removes the slog from the parts that are slog, and gives you back time for the parts that are actually thinking.
Frequently asked
- Can I use this for IRB-approved research? The privacy posture above is what you'd present to an IRB. For studies with stricter requirements (HIPAA, certain EU GDPR Article 9 categories), use the offline-Whisper option for the transcription itself and use ChatGPT only on the resulting text.
- What languages does diarization work for? Whisper-family models support ~99 languages with auto-detection. Diarization is language-agnostic — it operates on voice characteristics, not language.
- Can I export the coded themes for NVivo or Atlas.ti? Ask the GPT to format the themes as a CSV with columns for theme name, participant ID, quote, timestamp. Drop the CSV into your QDA tool of choice for further analysis.
- What if a participant withdraws consent after the interview? Delete both the audio and the transcript from your Whipscribe library immediately, and remove any Knowledge folder reference. Confirm via the library trash.
- Is there a paid plan I need? Free tier covers 30 minutes a day, which is enough to transcribe one interview and run the coding pass. Pay-as-you-go is $1 per hour of audio at whipscribe.com/credits; credits never expire.
Run the workflow on your next interview
Open the Whipscribe Custom GPT, drop the .m4a, paste the transcribe-and-code prompt above. Save the transcript to a per-study Knowledge folder; from interview 2 onward you can run cross-interview queries against the whole study.