Investing in

Building an AI-enabled voice-over startup

28th July 2021

Reading Time:

5 mins

Voice is one of the most powerful ways to communicate. It enables us to convey all that which words alone cannot. The same words can evoke different feelings (and responses) in the listeners depending on how they are expressed (tone, pitch, volume, etc.) and this gives voice the centrestage in a variety of communication.

“Alexa! Please write this investment memo for me!”

Led by large companies such as Amazon and Spotify, audio-tech has seen more than $1 Bn in M&A activity in the last one year alone. Recent advances in voice-based technologies are already reinventing sales and customer relationship management (Gong, Chorus etc.), social media (Clubhouse, Spotify Greenroom, Twitter Spaces etc.), digital assistants (Siri, Alexa/Echo, Google Now, Cortana etc) and several other areas. Given we live in a world with an exploding content economy today, it was only a matter of time that this growing focus on voice-tech would disrupt the world of content creation as well.

Disrupting and democratizing voice overs

Voice overs are critical to ensure production of high quality and impactful audio-visual content that is used in areas of e-learning, product explainers, media and publication (news, audio-books, etc.), entertainment, gaming, advertising etc.

Let’s look at Nike’s viral “You Can’t Stop Us” ad. While the incredible visuals definitely capture our attention, it is the voice over by US soccer star Meghan Rapinoe that ultimately drives home the powerful message of inclusivity and perseverance. Hence, a great voice over is the secret ingredient in the making of successful audio-visual content.

Traditionally, content production companies would hire professional voice artists to record these voice overs in a sound studio. After the requisite number of iterations and edits, once a voice over is finalized, it is synced with the on-screen visuals. Currently, the entire production process is time consuming, expensive and cumbersome. So, could there be a better way to do these voice overs?

In October 2020, when IIT-Kharagpur batchmates Ankur Edkie, Divyanshu Pandey and Sneha Roy faced challenges in creating high-quality voice overs for product demos, they realized there was a gap in the market. Hence, they started Murf AI with a vision to change the way voice overs are done and democratize them for a larger set of creators. is an AI-enabled SaaS tool that allows creators to generate “human-like” voice overs for videos and slideshows without hiring an actual voice artist. Creators can either type-in a script or upload a homestyle voice recording, which is converted into a natural sounding (AI) audio file, using their own cloned voices or a suitable fit from an existing library of 100+ aesthetic voices. The platform also has an easy-to-use editor that allows users to sync voice over with visuals, play around with speed, add pauses and more. Sounds incredible, doesn't it? We thought so too!

Podcast Audience Growth RateSource: Nielsen

Embarking on our partnership

In our meeting with Ankur, Divyanshu and Sneha earlier this year,  we were both intrigued and awed by what they had set out to build at Over the next few days, as we dived deeper, our conviction in the team and their product only grew stronger and we partnered with them for four key reasons:

1. The audio market is large with clear tailwinds that point to time being ripe for disruption. The market can be looked at from two perspectives- by technology (horizontal) and by use-case/industry (vertical). While the horizontal dimension includes the text-to-speech market at $2 Bn (expected to grow to $5 Bn in next 4-5 years), along the vertical dimension we have the audiobook market at $1.2 Bn and the podcast market at $9 Bn, both growing at 20%+ CAGR.

2. There is a growing latent need for generating high-quality voice overs at scale in a quick and cost-effective way. Even though the Text-to-Speech technology has been around for decades, it never found large-scale commercial use as the quality of audio output has never been “natural”. With the recent advancements in AI, generating human-like voices synthetically is now a reality.

3. The team built an early yet stellar product. Despite their nascency, the team has been able to build an affordable, feature-rich and high-quality product, garnering positive feedback from early adopters. One of their users, the CEO of a Canadian E-learning company, even offered to partner and recommend to 6,000 of his own customers!

4. We saw a very strong, passionate and complementary team in Ankur, Divyanshu and Sneha. They are a great mix of engineering, content, and sales and marketing capabilities, which is core to building such a business.

Since inception, the team has been scaling rapidly on the back of their relentless customer obsession and a strong bias for action. With 80% of business already coming from the US and UK, they have been able to grow the ARR ~12X in a short span of time. We are proud to partner with them from Day One and look forward to the exciting journey ahead! Team