Aiko vs Whipscribe in 2026 — free local Whisper on Apple devices vs hosted full pipeline
Aiko is the free Mac and iOS Whisper app from Sindre Sorhus. Drag a file in, get a transcript out — no cloud, no account, no telemetry, no paid tier. Whipscribe is a hosted pipeline that runs the same Whisper model family on server GPUs and adds speaker diarization, URL ingestion, and exports. The decision between them is privacy-versus-batch-throughput. Below: when each one wins, the honest cost math, the worked example for 30 hours of audio per month, and the cases where the answer is genuinely Aiko, not us.
The decision in one paragraph
If your audio cannot leave your device — legal, medical, internal HR, anything under a no-cloud policy — Aiko is the right tool. It is free, runs Whisper entirely on the Mac or iPhone, sends nothing to a server, and the developer collects no telemetry. If your audio can leave your device and you have more than an hour or two per week of it, the math grinds against running Whisper on your own laptop. Whipscribe runs the same model family on a server GPU, returns the transcript in minutes, adds speaker labels by default, accepts URLs, and starts at $0 (30 minutes a day free) before any plan kicks in. Pricing checked May 2026.
What Aiko actually is
Aiko is a single-purpose app from Sindre Sorhus, the Mac developer behind a long catalog of polished free utilities. It ships on the Mac App Store and the iOS App Store, free, with no in-app purchases. The full feature surface is short:
- Drop an audio or video file in. Aiko transcribes it locally using a bundled Whisper model.
- Output is a flat text transcript with timestamps. Export as plain text or SRT.
- Recording works on iPhone and iPad — capture in the app, transcribe on the device, no upload.
- 99 languages, because that is what Whisper supports natively.
- No account. No cloud. No telemetry. No subscription.
What Aiko deliberately does not do is also short: no speaker diarization, no URL ingestion, no model picker (the bundled Whisper model is fixed by the app version), no batch queue, no DOCX or JSON export, no API, no MCP, no team workspace. The product does one thing and does it cleanly. That restraint is the appeal.
What Whipscribe is
Whipscribe runs the same Whisper Large family that Aiko bundles, but on dedicated server GPUs, fronted by a web app, an API, and an MCP server. The trade is the inverse of Aiko: the audio has to reach a server, and in exchange the transcript comes back in minutes rather than real-time, carries speaker labels by default (via WhisperX), and the pipeline accepts a URL or a file. There is a daily 30-minute free allowance with no sign-up, then $2 per hour pay-as-you-go, $12 per month for 100 hours (Pro), or $29 per month for 500 hours (Team). Pricing checked May 2026.
The local-Whisper math (scoped to Aiko's constraint)
Aiko bundles a single Whisper model — the user does not pick a tier. Historically that has been Whisper Large with the bundled checkpoint updated by the developer over time. The wait that comes with running Large locally is the same wait every Mac front-end faces, because the bottleneck is the chip and the encoder, not the app wrapper. Numbers below are clean-audio averages on a 1-hour file, drawn from the Whisper paper plus Apple-Silicon community benchmarks (checked May 2026).
| Hardware | Tier | Wait · 1 hr audio | Aiko's reality | Verdict |
|---|---|---|---|---|
| M4 / M3 Mac16+ GB RAM | Best local case | ~40–55 min | Smooth. Fan stays quiet on shorter files. Battery hit is real but tolerable. | Right hardware for Aiko. Use it for low-volume sensitive audio. |
| M2 / M1 Mac16 GB RAM advised | Good local case | ~55–75 min | Roughly real-time. Mac is heat-bound during the run; close other heavy apps. | Workable. The wait fits inside a meeting, not inside coffee. |
| M1/M2 with 8 GBunified-memory tight | Marginal | ~70–110 min | System swaps under memory pressure. Mac becomes hard to use during the run. | Tolerable for a single short file. Not for a backlog. |
| Intel Macno Neural Engine | Painful | ~4–6 hrs | Aiko still installs, but Whisper Large on Intel is half a workday per file. | Wrong tool for the hardware. Use a hosted service instead. |
| iPhone 14 Pro+A16 / A17 / A18 | iOS niche | ~2–3× real-time | Works for short voice memos. Thermal throttling kicks in past ~10 minutes of audio. | Genuine win for < 10 min memos. Not for hour-long meetings. |
| Older iPhone / iPad< 6 GB RAM | RAM-bound | variable / fails | The bundled Whisper Large model strains older devices; long files crash or stall. | Skip. Transcribe on Mac or use a hosted service. |
Wait times are Apple-Silicon community medians for Whisper Large; iPhone numbers vary heavily with thermal state and concurrent apps. RAM advice assumes you also want the rest of the OS responsive while transcribing.
Side-by-side feature comparison
The two tools share a model family. Almost everything else differs.
| Capability | Aiko | Whipscribe |
|---|---|---|
| Where transcription runs | On your Mac or iPhone | On server GPUs (US data centers) |
| Audio leaves the device | Never | Yes, with retention policy |
| Price | Free, no in-app purchases | Free 30 min/day · $2/hr PAYG · $12/mo Pro (100h) · $29/mo Team (500h) |
| Whisper model | Bundled Large (no picker) | Large-v3 + WhisperX diarization |
| Speaker labels | No | Yes, by default |
| URL ingestion (YouTube, podcasts, etc.) | No — file-only | Paste any media URL |
| Batch / queue | One file at a time | Multi-job parallel |
| Exports | TXT, SRT | TXT, SRT, VTT, DOCX, JSON |
| API / MCP for Claude / Cursor | No | Yes — REST API + MCP server |
| Languages | 99 | 99 |
| Platforms | macOS, iOS, iPadOS | Web, API, Chrome extension, MCP |
| Telemetry / analytics | None | Yes, standard product analytics |
| Wait for 1 hour of audio (typical) | ~60 min on Apple Silicon · ~4–6 hr on Intel | ~2–5 min |
The honest cost picture
The line on a receipt says Aiko is $0 and Whipscribe is up to $29 a month. That is the easy comparison. The harder comparison is what your laptop is doing while Aiko runs.
| Volume per month | Aiko (Apple Silicon, Whisper Large) | Whipscribe |
|---|---|---|
| Up to 30 min/day | ~30 hr of laptop compute spread across the month | $0 (daily free allowance) |
| 5 hours of audio | ~5 hr of GPU-pinned Mac time | $10 PAYG · or free across daily allowance |
| 30 hours of audio | ~30 hr of GPU-pinned Mac time | $12/mo Pro (covers it 3× over) |
| 100 hours of audio | ~100 hr of GPU-pinned Mac time | $12/mo Pro (exact fit) |
| 500 hours of audio | impractical on a personal Mac | $29/mo Team |
"GPU-pinned Mac time" is the load-bearing phrase. While Aiko is running, your Mac's GPU is fully occupied with Whisper. The fans spin, battery drains, other heavy work — video calls, Xcode builds, a browser with 40 tabs — competes for the same memory bandwidth. The real cost of "free" isn't the dollar figure. It's the hours your laptop spends being a transcription rig instead of a laptop.
The crossover is around 5 hours of audio per month. Below that, on a recent Apple Silicon Mac, Aiko's compute cost is invisible — the wait fits inside coffee breaks and the device is otherwise idle. Above 5 hours, Mac time starts to dominate any rational accounting, and a $12-per-month plan that frees the laptop becomes the better trade.
Worked example — 30 hours of meeting recordings on an M2 MacBook Air
Take a concrete scenario. You're a founder on an M2 MacBook Air with 16 GB of RAM. You record every customer call and every internal sync, ending up with about 30 hours of meeting audio per month. You want speaker labels (so you can search for who said what) and you want exports (so the transcripts go into Notion, Linear, or a CRM).
The Aiko path. Drop each meeting file in, wait roughly real-time for Whisper Large to finish — call it 30 hours of compute spread over the month, mostly during evenings or while the Mac is plugged in. The transcripts come out as flat text, no speaker labels. To split "you said" from "they said," you label by hand or copy the text into another tool. You do not get DOCX or JSON, so any downstream automation has to parse plain text. Total dollars: $0. Total Mac hours surrendered to transcription: ~30 per month. Total post-processing time to add speaker labels manually across 30 hours of meetings: probably 4–8 more hours.
The Whipscribe path. Drop the same file in (or paste a URL if it lives in Zoom or Otter cloud). The transcript comes back in 2–5 minutes per hour of audio, with speaker labels already separated. Export to DOCX for Notion, JSON for the CRM, SRT if you ever clip the recording for social. Cost: $12 per month for the Pro plan, which covers 100 hours and leaves headroom. Mac time surrendered: zero — the GPU work happens on a server while you keep working.
Diarization, URL ingestion, SRT/DOCX/JSON exports, MCP from Claude and Cursor — all included on every paid tier.
See pricing →When Aiko is genuinely the right call
To be fair to a well-built free app, here are the four situations where Aiko is the right answer and Whipscribe is not.
- The audio cannot leave the device. Lawyer-client recordings, therapist session notes, internal HR conversations, source interviews under a confidentiality agreement that names the cloud as off-limits, anything covered by a no-third-party-processor policy. Aiko's no-cloud architecture is the feature, and Whipscribe is the wrong tool here.
- You're on iPhone or iPad and you want a transcription option. Aiko on iOS is one of the few real ways to run Whisper on a phone or tablet. For voice memos, short field interviews, or capture-and-transcribe on the move, the iOS app is genuinely useful. Whipscribe is a website (and an MCP server), so phone-native transcription is not where it competes.
- You're fully offline. Field journalist in low-connectivity regions, researcher on a long flight, anyone whose primary failure mode is "no internet right now." Aiko works at 30,000 feet. A hosted service does not.
- Volume is small and you're on Apple Silicon. A handful of voice memos a week on an M2 with 16 GB of RAM. The wait fits inside the time you'd spend on coffee anyway, and you don't need diarization or batch processing. Aiko is fine here.
When Whipscribe is the right call
Outside the four cases above, the math tips toward a hosted pipeline.
- You have hours of audio per week. Podcasts, journalist interviews, sales calls, founder backlogs, customer-research sessions. The wait stops being free.
- You need speaker labels. Aiko produces a flat transcript. Anything where "who said what" matters — interviews, meetings, debate-style podcasts — needs diarization. WhisperX speaker labels are on by default in Whipscribe.
- The audio lives on the web. YouTube videos, podcast feeds, Zoom or Otter cloud recordings, anything you want to transcribe by URL rather than first-download-then-drag. Whipscribe accepts URLs directly. Aiko does not.
- You want exports beyond TXT and SRT. Notion likes DOCX. Pipelines like JSON. Subtitles like VTT. Whipscribe ships all of them.
- You call transcription from Claude, Cursor, or your own code. Whipscribe has an MCP server (used inside Claude Desktop and Cursor) and a REST API. Aiko has neither — it's a Mac/iOS app, intentionally.
- You're on Intel hardware or a low-RAM device. The local-Whisper wait stops being tolerable. A hosted GPU is genuinely cheaper than your time.
Honest tradeoffs
Two things to be straight about, because the comparison is not all in our favor.
Aiko's privacy story is real, and ours is different. Aiko sends nothing anywhere. Whipscribe sends your audio to a server, processes it, stores the transcript while you have an account, and applies a retention policy. That's a fundamentally different posture. If your threat model says "no audio leaves my device," Whipscribe is the wrong choice and we will tell you so. The reverse is not symmetric — if your threat model says "no audio sits on my unencrypted laptop," neither tool is sufficient and you need disk encryption first.
Aiko is not a competitor — it's a complement to its own niche. The people for whom Aiko is the right tool are not, in most cases, the people for whom Whipscribe is the right tool. Aiko serves the privacy-required, low-volume, file-only segment cleanly. Whipscribe serves the high-volume, URL-friendly, integration-needing segment. The right answer to "Aiko or Whipscribe?" depends entirely on which segment you're in.
The verdict
Aiko is one of the cleanest minimalist apps shipping on the Mac App Store. Free, no-cloud, no nonsense. For the narrow audience it serves — privacy-sensitive, low-volume, willing to wait roughly real-time for Whisper to grind through a file on Apple Silicon — it is the right answer. The dollar cost is zero and the developer collects no data on you.
For everyone else — the operator clearing 30 hours of meetings a month, the journalist with a backlog of interviews, the podcaster with weekly episodes, the team that wants speaker labels and a JSON export — your laptop's time is more expensive than $12 a month. Buy the hours, finish the backlog, and your Mac goes back to being a Mac.
Frequently asked
Is Aiko really free?
Yes. Free on the Mac App Store and the iOS App Store, no in-app purchases, no subscription, no premium tier. Sindre Sorhus ships it as a public-good download. The trade-off is the cost of your own device's compute time.
Does Aiko upload my audio anywhere?
No. Aiko runs Whisper entirely on the device. Audio never leaves your Mac or iPhone. That is the core feature and the reason Aiko is the right tool for sensitive recordings — legal, medical, internal HR — that genuinely cannot be sent to a cloud service.
Which Whisper model does Aiko run?
Aiko ships with a single Whisper model bundled into the app — historically Large, with the bundled checkpoint updated by the developer over time. There is no model picker. That is part of the design: you don't have to choose between Tiny, Base, Small, Medium, and Large. The trade-off is no Turbo option and no way to swap to a smaller model when speed matters more than accuracy.
Does Aiko do speaker labels?
No. Aiko produces a flat transcript with timestamps but no speaker diarization. For interviews, podcasts, and meeting recordings where "who said what" matters, you'd label speakers manually after export. Whipscribe runs WhisperX diarization by default on every paid tier and on the free 30-minute daily allowance.
Can Aiko transcribe a YouTube URL?
No. Aiko is file-only — drag in an audio or video file from disk and it transcribes locally. There is no URL ingestion, no paste-a-link, no podcast-feed import. To transcribe content that lives on the web, you'd download the audio first with another tool, then drop it into Aiko. Whipscribe accepts a YouTube, podcast, or any media URL directly.
How long does Aiko take for an hour of audio?
On Apple Silicon (M1, M2, M3, M4) running Whisper Large, expect roughly real-time — about an hour of compute for an hour of audio. On older Intel Macs without the Apple Neural Engine, the same hour can take 4 to 6 hours. On iPhone or iPad the wait stretches further because of thermal throttling and RAM caps. Whipscribe runs the same model family on a server GPU and typically returns a transcript in 2 to 5 minutes for an hour of audio.
Can I run Whisper offline somewhere other than Aiko?
Yes — there are several Mac front-ends for local Whisper, MacWhisper being the most polished and SuperWhisper being the most "always on" for system-wide dictation. Each has a different trade-off across model picker, hotkeys, transcription versus dictation, and price. Aiko's distinguishing trait is being completely free on the App Store with zero configuration.
What's Whipscribe's free tier exactly?
30 minutes of transcription per day, every day, with no sign-up and no credit card. Diarization, URL ingestion, and exports are all included on the free allowance. Past 30 minutes a day, paid plans start at $2 per hour pay-as-you-go, $12 per month for 100 hours (Pro), or $29 per month for 500 hours (Team). Pricing checked May 2026.
Aiko stays on your Mac. Whipscribe transcribes 100 hours a month on server GPUs while your laptop stays free. Same Whisper model family, opposite trade-off — pick the one that fits your audio.
See Whipscribe pricing →