Among the new features onboard the Pixel 4 and Pixel 4 XL phones is a Recorder app that can transcribe spoken audio in real-time – a tool that could prove hugely useful in lectures, interviews, and more besides. It works offline too, but it’s not your only option for converting speech into text.
If you have bought a Pixel 4 or Pixel 4 XL, you’ll find the Recorder app preinstalled (or you can download it here). The app can be sideloaded via an APK on other Android phones, but the real-time voice transcription won’t work – this is one of those features (like Motion Sense) that Google is hoping will get you to buy one of its new flagship phones.
Once loaded up, the app has a simple but tasteful design: You hit the big red record button and recording starts, as you would expect. The recording opens the Audio tab, which shows a sound wave representation of what’s being heard, but you can switch back and forth between that and the Transcript screen, where speech is transcribed in real-time.
Recorder recognises the difference between speech and music, and will show this on screen as part of the Audio tab. Transcription only works with speech, however, or at least it does with the songs we tried (if you want to try and decipher a particularly difficult to understand set of lyrics, you’ll need to look elsewhere).
Tap the pause button at the bottom and you can give your recording a title and a location if you want to (if the Recorder app has noticed certain words being repeated, it’ll suggest these as keywords for your title). You then have the option to Resume the recording or to Save it to your phone.
Does it work? Based on our experience, it works impressively well, but it’s far from perfect – the app doesn’t get every word yet, though with clear speech and little background noise we’d say it’s in the high 90s in terms of percentage. We did notice occasional gaps in the transcription, almost as if the Pixel’s AI processing algorithms were being overwhelmed and had to take a breather.
When there’s more going on in the background – recording from talk radio, say – the accuracy starts to drop, though to be fair to the Recorder app we were testing it on UK voices and accents. The app only officially supports US English for now, with more languages appearing further down the line.
Unfortunately there’s no option yet to edit the transcription. Though you can search through the text of your recordings, so it’s easy to find mentions of particular words. Even better, you can search for specific sounds, like whistling, applause, or music, and Recorder pulls up a list of matches for you.
As with just about everything Google does, machine learning is key to how this all works: It’s managed to shrink its language processing model down to a small enough size for it to fit on the Pixel 4 and Pixel 4 XL, and the Pixel 4 phones use similar techniques to power Google Lens and the Now Playing song recognition on device.
The transcription alternatives
Pick up a Pixel 4, and the Recorder app comes free with it. As for the competition, the closest alternative to what Recorder does is Otter – developed by ex-Googlers, as it happens – which again uses the power of artificial intelligence to identify spoken words as they’re said, either live or from a recording.
That ability to process recordings sets Otter apart from Google’s Recorder app, and Otter is also able to identify different speakers in a conversation, something Recorder hasn’t stretched to yet. Transcription search is included too. For the live transcription, you need the apps for Android or iOS, and you get a generous 600 minutes of free transcription time per month.
Beyond that, you’ll need to pay $10 (£7.77) a month or $100 (£77.60) a year, and that gets you extra features, including custom vocabulary support, integration with Dropbox, the ability to skip silences, and more. If you don’t own a Pixel 4 (or maybe if you do), it’s well worth a look, and in our tests was about on a par with the Recorder app.
You’ll find a number of competing Otter-like services out there, leveraging AI to do the transcription work, though none of the others have graduated to real-time transcription as yet. Temi promises 5-minute turnarounds for audio uploaded to the site (or recorded via the Android or iOS apps), and you can edit the transcripts online if needed.
We found Temi matched its quick turnaround claim and impressed with its accuracy, based on our limited testing of it. You get one trial transcript (up to 45 minutes) for free, and then processing costs $0.10 (8p) per minute (you might prefer that pay-as-you-go flexibility to Otter’s flat rate, depending on how much transcribing you need).
Trint is another option for those looking for AI-powered speech transcription, though it only has an iOS app (nothing for Android yet), and the real-time processing component is only available to enterprise users.
It’s on the expensive side though, compared to Otter and Temi – after your 7-day free trial, you need to stump up £50 a month for up to 7 files of any length, or £60 a month for unlimited transcription. It does come stacked with features, including a comprehensive online editor.
Of course, you can still get your transcribing done by an old-fashioned human, if you want to – it’ll take longer to get back to you, but the accuracy should be better than anything powered by artificial intelligence... at least for the time being.