A person holding headphones, symbolising the rise of smarter digital audio technology, with AI at the microphone transforming the way podcasts and audiobooks are created and experienced.

06 March 2025| doi: 10.5281/zenodo.14981911

AI at the microphone: The voice of the future?

Digital audio content is booming – more and more people are listening to podcasts, audiobooks, and audio dramas. At the same time, artificial intelligence (AI) is revolutionising the production and distribution of these audio contents. AI-powered tools can synthesise voices, automatically narrate texts, and even independently generate entire podcast episodes. But what does AI at the microphone mean for the industry? What new opportunities are emerging – and where do the challenges lie? This article explores the most exciting trends and takes a look at the present and future of AI in the audio world.

Listen up! Audio is booming worldwide

According to the Audible Hörkompass 2024, 46% of Germans aged 18–65 listen to audio content regularly. This is almost three times as many as in 2016. This figure includes both podcasts and audiobooks, highlighting the strong growth in audio consumption.

Other studies also show that audio formats are becoming increasingly popular: Around a third of German-speaking people aged 14 and over use podcasts at least occasionally. The usage increased significantly between 2020 and 2022 and has since stabilised at a high level. A similar trend can be observed across Europe and the USA: In many European countries, around 20–30% of people listen to podcasts occasionally, depending on the country. In the USA, podcast usage is even higher, reaching record numbers: 47% of the US population aged 12 and over have listened to a podcast in the past month in 2024, with 34% listening weekly. Audiobooks are also becoming more popular – 38% of US adults have listened to at least one audiobook in the past year (compared to around 35% in 2019). Overall, the audio sector is a growing market, with large portions of the population consuming content regularly.

AI speech tools revolutionising audiobook production

Against the backdrop of this demand, new tools are emerging to quickly and cost-effectively provide audio content. A prominent example of such a tool is ElevenLabs, an AI speech synthesis company founded in 2022. In February 2025, ElevenLabs launched its own audiobook platform called ElevenReader Publishing. Through this platform, authors and publishers can create audiobook versions of their books for free. To do so, they upload an eBook (e.g. in ePub or PDF format), select a narration voice from a range of AI voices, and the AI generates the audiobook. The finished audiobooks can then be listened to for free via the ElevenReader app. During the beta phase, ElevenLabs even pays participating US authors if listeners stay for longer than 11 minutes in an AI-generated audiobook ($1.10 per user). The company, which is already generating revenue from various business clients, plans to eventually introduce a subscription model for listeners, as well as a marketplace where authors can sell their audiobooks. The revenue share will be lower than that of established platforms such as Audible or Apple Books. Initially, ElevenLabs is targeting self-publishers and small publishers to provide audiobooks for books that otherwise would not have an audio version. This example shows the potential of AI to democratise audiobook production: audiobooks can be produced faster and cheaper, which particularly benefits niche authors.

A relevant competitor in this field is Apple. The tech giant started offering selected audiobooks with AI voices in its Apple Books service at the end of 2022 and early 2023. Without much fanfare, a range of English-language titles were published with digital voices. Apple promotes these voices as “naturally sounding” and based on real human speech. However, critics have pointed out that the AI voices currently do not match the expressiveness of human narrators and have been met with some scepticism from audiences. Nevertheless, Apple’s move marks an important trend: large platforms are directly integrating AI narration into their ecosystem. Google also offers a similar service to publishers—since 2022, publishers in certain countries have been able to automatically convert English or Spanish eBooks into audiobooks via Google Play Books. A variety of voices (in different ages, genders, and accents) are available. The finished audiobooks can be sold through Google Play, with publishers receiving a large share of the proceeds. During the beta phase, the costs of automatic narration are low or entirely waived. Google argues that AI is necessary because many books would not be converted into audiobooks without it, allowing publishers to easily and affordably enter the audiobook market.

Meanwhile, distribution platforms are opening up to AI-generated content. In February 2025, Spotify announced that it would now accept AI-created audiobooks. Authors can publish audiobook files created with ElevenLabs through Spotify’s Findaway Voices service. AI-narrated titles will be clearly labelled for listeners, with a note in the description saying “This audiobook is narrated by a digital voice.” Spotify had previously implemented support for audiobooks generated by Google Play. The opening of large platforms like this signals that AI audiobooks have become commercially viable and accessible to a wide audience. Competitors like Amazon’s Audible, on the other hand, are still cautious about the issue—by 2024, Audible allows AI-narrated audiobooks only to a limited extent on its platform, showing that the market is still in the process of finding its footing.

AI at the microphone

Not only in audiobooks but also in the podcasting field, AI is making its way. One example of this is Google’s experimental project NotebookLM. This is an intelligent note-taking and research tool that, since 2024, features a special audio function. NotebookLM can generate an audio summary from uploaded documents or notes at the push of a button – a sort of mini-podcast that presents the content in a conversational format. This “Audio Overviews” feature, introduced by Google in September 2024, has attracted much attention because the automatically generated voices sound very natural and resemble the tone and pace of real podcast hosts. Users have enthusiastically shared snippets of these AI-generated podcasts online, often created from their own materials. For instance, a dialogue between two AI hosts, in which they are shocked to realise they are not human, went viral on Reddit.

Google emphasises that the value of the tool lies in making content audible that would otherwise not be available as audio. For example, you can upload a lengthy slide deck or a scientific article and have the AI assistant create a multi-minute “podcast” format that you can listen to in the background. The use of NotebookLM is currently free and only requires a Google account. This example demonstrates how AI can transform individual content into personalised audio experiences, whether it’s an internal company report or university lecture notes.

Listen to the AI generated series now

The podcast series Talking About Platforms – Platform Classics labels itself as an “AI Generated Series.” In this series of the Talking About Platforms podcast, academic papers are summarised using AI tools such as NotebookLM and published as short audio episodes. The creators use AI for both content processing (for summaries and simplified explanations of research literature) and automatic narration. To ensure transparency, each episode includes a note that the podcast was generated using artificial intelligence and draws content from academic publications.

For each episode, the AI synthesises the content of a specific research article and presents it in a narrative speaking style to make it more accessible. The episodes, typically 10 to 15 minutes long, carry the title of the respective paper and are clearly marked as AI-generated – including a disclaimer about potential discrepancies and errors of the AI.

This project demonstrates how AI is reshaping science communication: complex texts are automatically processed and made available as audio. This creates new opportunities for making academic knowledge accessible to a broader audience.

The examples mentioned – Google NotebookLM and Platform Classics – show how AI enables new formats in the audio space. Individual content, which was previously text-based, can now be automatically “brought to life” as audio. This opens up opportunities for education (summaries of learning materials), businesses (automated audio newsletters or reports), or creative industries (experimenting with new forms of storytelling).

Transparency remains key: listeners must be able to clearly recognise when content or voices are AI-generated – as Platform Classics consistently implements. Clear labelling allows for an informed assessment of the origin and quality of the narrated content, strengthening trust in the format.

Perfect imitation? Opportunities and risks of synthetic voices

Speech synthesis technology has made tremendous progress in recent years. Modern AI voices now sound so realistic that they are often indistinguishable from human speakers. Services like ElevenLabs allow users to clone voices or have them speak in different languages with minimal effort. ElevenLabs supports 29 languages in total and gives users the ability to precisely adjust pitch and emphasis. Blind tests have shown that listeners can sometimes have real difficulty distinguishing AI voices from real human ones. The AI can also modulate moods – from cheerful to neutral to sad – offering listeners a dynamic experience. These technological advances mean that many spoken contents could be created automatically in the future: news, navigation, customer service hotlines, and even dubbing in films or TV series. This presents both an opportunity and a challenge for the media industry and voice-based professions.

However, alongside the opportunities offered by AI voices, such as scalability, personalisation, and content accessibility, there are significant risks. Highly realistic synthetic voices can be misused to deceive people or distort content. There have already been reports of fraudsters using artificial intelligence to mimic the voices of family members to create fake emergencies over the phone and steal money. According to a survey, 25% of people worldwide have either personally experienced such a voice deepfake scam call or know someone who has been a victim. The scam works frighteningly well, as a significant portion of those called do not recognise the synthetic voice as such. In addition to crime, there are concerns about misinformation: manipulated audio quotes from politicians or public figures could promote the spread of fake news. At the same time, the creative industry is worried about copyright and jobs – for example, voice actors and audiobook narrators have protested against the unauthorised use of their voices by AI. In the 2023 negotiations of actors’ and voice actors’ unions (including in Hollywood), strong protection against unauthorised voice cloning was demanded.

Protection against voice theft: How AI-generated voices are regulated

Legislators and platforms are working to respond to new developments surrounding AI-generated voices. In the EU, the AI Act is set to introduce a legal framework that includes a requirement for the labelling of deepfakes. AI-generated or manipulated content—whether images, videos, or audio—must be clearly labelled as such to prevent consumer deception. Such transparency obligations are likely to apply to synthetic voices, especially when real people are being imitated. Major platforms like Spotify are already adopting transparency by marking all AI-narrated audiobooks with a notice for listeners.

In the US, specific state-level laws are also emerging. One example is the recently passed ELVIS Act in Tennessee, which makes it illegal to imitate someone’s voice without permission. For the first time, a synthetic (AI-generated) voice is legally recognised as equally worthy of protection as a person’s actual voice. Violators of this law could face civil lawsuits or even criminal consequences (e.g., fines or up to one year in prison). Other US states and countries like China have also introduced regulations prohibiting the misuse of deepfake voices, such as in political campaigns or defamation. Additionally, authorities like the US Federal Communications Commission (FCC) have clarified that AI-based voice calls made without consent are considered illegal robocalls and are banned.

From a societal perspective, the question arises of how we will handle these new possibilities. The media landscape and society are undergoing a fundamental change: on the one hand, AI voices can make media content more diverse and accessible—for example, news portals could automatically offer articles in audio form, or historical figures could speak in documentaries with their “own” (cloned) voice. People with visual impairments or reading difficulties could also benefit from increasingly realistic text-to-speech voices. On the other hand, we will need to learn to question the information we hear more critically. Did this person really say that, or is it a deepfake? Labels like acoustic watermarks or standard disclaimers (“This recording was created by AI”) may become established to build trust. New career opportunities are emerging for artists and voice actors, such as providing voices for AI systems. At the same time, technological advancements will increase competitive pressure. In the coming years, it will become clear how responsible management of synthetic voices will develop, so that we can harness the benefits of AI while not forgetting the associated risks.

Conclusion

AI-powered audio content is no longer a thing of the future, but part of our everyday lives. Automated voices read books to us, summarise documents into podcasts, and expand the audio landscape with new formats. This development brings significant advantages: content can be created more quickly and tailored to individuals, making it accessible to more people.

However, with these new possibilities come new challenges. It must be clear which content is created by humans and which by machines. Transparency is crucial to maintaining trust. Initial regulations – from platform guidelines to laws – are addressing this issue by tackling the risk of misuse, but we are still at the beginning.

For the media, creatives, and society, this means we are witnessing a balancing act between innovation and responsibility. If successful, AI voices could enrich the audio world without undermining credibility and creativity. In any case, it is worth actively following this discourse – as the voice of AI will increasingly be speaking in our ears.

References

ARD/ZDF Online Study. (2024). Podcast usage in Germany. ARD/ZDF Media Commission. Retrieved from https://www.ard-zdf-onlinestudie.de

Audible Hörkompass. (2024). Study on audiobook and podcast consumption in Germany. Audible & Kantar. Retrieved from https://www.audible.de

China Cyberspace Administration. (2024). AI-generated content regulations and deepfake restrictions in China. Retrieved from https://www.cac.gov.cn

Edison Research. (2024). The Infinite Dial 2024: Podcast & audiobook listening trends in the U.S. Retrieved from https://www.edisonresearch.com

European Commission. (2024). The AI Act: Regulatory framework for artificial intelligence in the EU. Retrieved from https://ec.europa.eu/digital-strategy

Federal Communications Commission (FCC). (2024). AI-generated robocalls classified as illegal under U.S. law. Retrieved from https://www.fcc.gov

Publishers Weekly. (2025, February 10). ElevenLabs launches AI-powered audiobook platform to rival Audible & Spotify. Retrieved from https://www.publishersweekly.com

Spotify Newsroom. (2025). AI-narrated audiobooks now available on Spotify. Retrieved from https://newsroom.spotify.com

Talking About Platforms Podcast. (2024). Platform Classics Series: AI-generated academic podcasting. Retrieved from https://www.talkingaboutplatforms.com

TechCrunch. (2023, December 7). Apple’s AI-narrated audiobooks: The next step in digital publishing. Retrieved from https://www.techcrunch.com

TechRepublic. (2024, October 1). AI-generated voices in media: The rise of digital narration. Retrieved from https://www.techrepublic.com

Tennessee State Government. (2024). The ELVIS Act: Protecting voice rights in the era of AI cloning. Retrieved from https://www.tn.gov

Wired. (2024, September 15). Google NotebookLM launches AI-generated podcast summaries. Retrieved from https://www.wired.com

This post represents the view of the author and does not necessarily represent the view of the institute itself. For more information about the topics of these articles and associated research projects, please contact info@hiig.de.

Sign up for HIIG's Monthly Digest

You will receive our latest blog articles once a month in a newsletter.

Explore Research issue in focus

Du siehst Eisenbahnschienen. Die vielen verschiedenen Abzweigungen symbolisieren die Entscheidungsmöglichkeiten von Künstlicher Intelligenz in der Gesellschaft. Manche gehen nach oben, unten, rechts. Manche enden auch in Sackgassen. Englisch: You see railway tracks. The many different branches symbolise the decision-making possibilities of artificial intelligence and society. Some go up, down, to the right. Some also end in dead ends.

Artificial intelligence and society

The future of artificial Intelligence and society operates in diverse societal contexts. What can we learn from its political, social and cultural facets?

AI at the microphone: The voice of the future?

Listen up! Audio is booming worldwide

AI speech tools revolutionising audiobook production

AI at the microphone

Perfect imitation? Opportunities and risks of synthetic voices

Protection against voice theft: How AI-generated voices are regulated

Conclusion

References

Philip Meier

Sign up for HIIG's Monthly Digest

Explore Research issue in focus

Artificial intelligence and society

Further articles

Polished yet impersonal: The unintended consequences of writing your emails with AI

Do Community Notes have a party preference?

How People Analytics can affect the perception of fairness in the workplace

KEEP UP TO DATE