In a time wherein the world has become increasingly interconnected and the development of AI, it has become easier than ever to assume another person’s identity. Whether by using deepfake video or audio, one can almost easily pass off as someone else.
Now, a study by the University College of London has found that humans are able to detect audio deepfakes only about 73% of the time. The study was conducted using 50 deepfake samples generated by a text-to-speech algorithm trained on both, an English and a Mandarin dataset, played for about 529 respondents.
This number marginally improved by about 3% when the respondents were told the aspects to look out for, to recognise deepfake speech. The study is the first of its kind, with its primary aim being to assess the ability of people to detect deepfake speech in a language other than English.
ALSO READ: AI and chess: How human brains trained machines to beat human brains
Audio deepfakes are a form of artificial speech generated using artificial intelligence (AI), which are made to resemble a real person’s voice.
Why are audio deepfakes difficult to detect?
As per the study, training people doesn’t necessarily ensure that they can reliably recognise deepfakes. While they can do so when the audio is recorded in similar environments or voices, people often end up failing when there is a change in conditions – such as a change in the environment the recording has been done in, or even in the voice on the audio clip. Automated detectors cannot reliably and consistently detect deepfake audio either, she adds.
The deepfake speech detectors can be sensitive to even the most minute changes in audio. It is therefore important to evaluate them thoroughly during development and in a number of situations such as different voices, environments with various ambient sounds, as well as varying pitches and accents. This, she says, will help minimise false positives as well as false negatives, vis-à-vis the results generated by these detectors.
Deepfakes, over the years, have become increasingly realistic. It includes everything from artificially generated images, and videos. However, deepfake images and videos are usually easier to identify as compared to audio, owing to the fact that there are visual cues that are more explicit in nature.
Does artificially generated voice technology have any benefits?
Interestingly, Apple is expected to introduce a feature referred to as “Personal Voice” with its iOS 17 update. It is designed for people at risk of losing their speech abilities, as well as those who find it difficult to do so. It basically gives users 150 random phrases to read out loud, which may take about 15-20 minutes. Then, it learns how to recreate the user’s voice and lets them read any text out loud in their voice, without actually having to speak for themselves.
ALSO READ: Google Bard now available in Hindi and other Indian languages: Here’s how to try it out
While artificially generated audio has several benefits though, the same can also be misused by conmen to dupe people. This is precisely what happened in 2019 when an executive of a British energy company was tricked into transferring thousands of pounds to a scamster, using a deepfake audio instructing him to do so in his superior’s voice.
Unleash your inner geek with Croma Unboxed
Subscribe now to stay ahead with the latest articles and updates
You are almost there
Enter your details to subscribe
Happiness unboxed!
Thank you for subscribing to our blog.
Disclaimer: This post as well as the layout and design on this website are protected under Indian intellectual property laws, including the Copyright Act, 1957 and the Trade Marks Act, 1999 and is the property of Infiniti Retail Limited (Croma). Using, copying (in full or in part), adapting or altering this post or any other material from Croma’s website is expressly prohibited without prior written permission from Croma. For permission to use the content on the Croma’s website, please connect on contactunboxed@croma.com
- Related articles
- Popular articles
Atreya Raghavan
Comments