This startup wants to clone your voice
Eleven Labs can get you to say anything, in any language
In today’s edition we’ve got another chunky startup profile for you, and if you’re a paying PreSeed Now member, a bonus startup just for you at the end of the newsletter.
Synthetic voice tech is an interesting field for startups right now. You might have seen the startup that controversially makes call centre staff sound like white Americans. And more positively, there’s Respeecher, which emulated James Earl Jones’ iconic Darth Vader voice for a new Star Wars project.
There’s a lot of potential in voice cloning technology, and London-based Eleven Labs is one startup in the space well worth looking at. Scroll down to read all about them.
I won’t even mention the scary word ‘deepfake’ once (except that time), because I respect you enough as a PreSeed Now reader to separate the technology’s wider potential from its potential misuses.
Our bonus member-only startup, meanwhile, is in the A.I.-enhanced healthtech space. If you’re a member, you’ll find them at the end of the email.
PreSeed Now members get the FULL version of the profile below, with more info about the startup’s investment, business model, and competition. Plus bonus startups (there’s one in today’s edition) and the chance to get featured in a ‘Member Spotlight’.
Eleven Labs can get you to say anything, in any language
Dubbing video content into other languages is an important business. It opens up new markets for films, TV shows, and online videos.
But there’s a reason many real fans prefer to watch the original with subtitles over a dubbed version. Listen to a dubbed animé film next to the Japanese original and you’ll understand what I mean - a lot of the original’s flavour is lost once well-meaning local actors get involved.
So what if you could clone the voices of the original actors, and make those cloned voices fluent in any language? That’s the offering London-based Eleven Labs has cooked up.
“We want to make content available in any voice and in any language… preserving the same emotions, the same intonation, the same speaker voices… in Spanish or Polish, or any other language that you speak and understand,” explains Eleven Labs co-founder Mati Staniszewski.
Dub be good to me
Eleven Labs sets itself apart from others in the space because of its ability to generate cloned audio in multiple languages from a small sample (a few minutes or even a few seconds) of the original voice.
Staniszewski says the process is similar to traditional dubbing, in which a transcript is manually translated for a voice actor to perform. Eleven Labs performs automatic translation, which can be manually tweaked if any corrections are needed. It’s just that the original audio serves as the ‘voice actor’.
If the original audio emphasises certain words, or conveys a particular emotion in their voice, this will be carried over to the translated version.
The technology works with a few seconds of source audio, but Staniszewski says around five minutes of high-quality audio provides the best results. This recording is then interpolated with data from around 30,000 voices in the startup’s data set to create a version of your voice that can say anything.
Here’s an Eleven Labs demo, showing how emulations of celebrities’ voices can be dubbed onto an original. You can hear a few artifacts in the emulated audio, but it’s an impressive effect, and the startup continues to refine the tech.
Of course, A.I. models can’t necessarily get everything right. It’s easy to imagine a word pronounced a certain way in one language meaning something completely different in another when spoken in that way. Staniszewski agrees there are many nuances that the A.I. will need to learn over time, but they already train the technology on audio in multiple languages to pick up on issues like this.
The Eleven Labs story begins 12 years ago, when Staniszewski met his future co-founder Piotr Dabkowski in Poland. They both moved to the UK, with Dabkowski working as a software engineer at companies like Google, while Staniszewski worked on the business side of companies including Palantir.
Staniszewski says the pair often talked of starting a company together, and collaborated on many hackathon projects over the years. Eventually they found their startup opportunity with the sudden increase in video calls in 2020.
“We decided to create a sentiment analysis of how we speak, the pronunciation that we use, how to make our pronunciation better, how to detect the emotions we use… We deployed that over a weekend,” says Staniszewski.
They later realised how they could take this tech further when Dabkowski and his girlfriend, who doesn’t speak English, wanted to watch the Polish-dubbed version of a movie. But Polish versions of movies are often voiced by a single ‘narrator’ (a ‘lektor’) who explains what is happening, rather than a cast of Polish actors performing the script.
The approach saves money, and is surprisingly quite popular in Poland. But it highlights the budgetary pressures around localising the increasing amount of TV, movie, and online video content audiences around the world might want to explore.
Staniszewski and Dabkowski explored other opportunities to use this tech, such as in countries where the local language is mandated to be used in a percentage of media content, but there might not be big budgets available to high quality voice actors. And thus Eleven Labs was formally established in May this year to explore the opportunities.
Turning it up to Eleven
Now a team of five, Eleven Labs launched a private beta that they tested with a group of YouTubers with large subscriber bases. They’ve taken insights from this test, and are refining the tech to make it more precise and scalable. They plan a November launch for an initial version of the voice cloning feature for the public to try.
The team has also developed additional features like the ability to switch in a completely different voice while maintaining the same intonation and emotions as the original voice (as seen in the video above). They plan to make this available early next year.
The startup’s product will initially be positioned to be accessible to a wide range of online creators. But potential specialisms may emerge over time.
For example, Staniszewski says movie studios could eventually become customers. Beyond using the tech in the final releases of movies, he says he’s discussed with studios use cases such as pre-production mockups of scenes. It’s easy to imagine games companies using the tech to localise new titles, and there are various enterprise use cases the startup is exploring.
It sounds like with the potential use cases of the technology being so broad, Eleven Labs is deliberately keeping its options open for any particular market sectors they might focus on in the future. But they have a business model lined up for the first commercial stage of the product.
Business model, investment, and competition
The startup will charge a flat fee of “a few tens of dollars” to clone a voice, with free and paid monthly subscriptions for making use of the clone. I could, for example,
Keep reading with a 7-day free trial
Subscribe to PreSeed Now to keep reading this post and get 7 days of free access to the full post archives.