I spoke with an AI version of myself, thanks to Hume’s free tool – how to try it

ai voice concept — Chiken Courageous/Getty Pictures

Apple’s latest iPad hit a new low price at Walmart – and it’s available in every color — Chiken Courageous/Getty Pictures

Should you’ve ever had the urge to converse with an AI model of your self, now you possibly can — sort of.

On Thursday, AI start-up Hume introduced the launch of a brand new “hyperrealistic voice cloning” feature for the most recent iteration of its Empathic Voice Interface (EVI) mannequin, EVI 3, which was unveiled last month. The concept is that by importing a brief audio recording of your self talking — ideally between 30 and 90 seconds — the mannequin ought to be capable to rapidly churn out an AI-generated reproduction of your voice, which you’ll be able to then work together with verbally, simply as you’d with one other particular person standing in entrance of you.

Additionally: Text-to-speech with feeling – this new AI model does everything but shed a tear

I uploaded a recording of my voice to EVI 3 and spent a while idly chatting with the mannequin’s imitation of my voice. I hoped (maybe naively) to have an Uncanny Valley expertise — that exceedingly uncommon feeling of interacting with one thing that feels nearly utterly actual, but off-kilter sufficient to make one really feel barely uneasy — and was disenchanted when the EVI 3 me was extra like an audio cartoon model of myself.

Let me unpack {that a} bit.

Utilizing EVI 3’s voice cloning characteristic

Voice cloning using Hume — Screenshot by Webb Wright/ZDNET

The imitation of my voice was, in some methods, undeniably sensible. It appeared to pause intermittently when talking in roughly the identical manner that I are inclined to do, with a contact of acquainted vocal fry. However the mirroring stopped there.

Hume claims in its weblog publish that EVI 3’s new voice cloning characteristic can seize “features of the speaker’s character.” It is a imprecise promise (most likely deliberately so), however in my very own trials, the mannequin appeared to fall quick on this regard. Removed from feeling like a convincing simulation of my very own conduct quirks and humorousness, the mannequin spoke with a chipper, eager-to-please tone that may’ve been well-suited to a radio advert for antidepressants. I like to think about myself as being pleasant and customarily upbeat, however the AI was clearly exaggerating these explicit character traits.

Additionally: Fighting AI with AI, finance firms prevented $5 million in fraud – but at what cost?

Regardless of its usually puppy-like demeanor, the mannequin was surprisingly staunch in its refusal to attempt talking in an accent, which appeared to me like it will be the sort of playful voice train that it will excel at. After I requested it to offer an Australian accent a whirl, it stated “g’day” and “mate,” a few times in my regular voice, then instantly shied away from something extra daring. And it doesn’t matter what I prompted it to talk about, it tended to seek out some artistic and roundabout strategy to circle again to the subject I used to be discussing once I recorded my voice as a pattern for it to make use of, paying homage to an experiment from Anthropic final yr through which Claude was tweaked to turn out to be obsessed with the Golden Gate Bridge.

In my second trial, for instance, I had recorded myself talking about Led Zeppelin, which I might been listening to earlier that morning. After I then requested EVI 3’s voice clone of myself to elucidate its ideas on the character of darkish matter, it rapidly discovered a strategy to convey its response again to the topic of music, evaluating the mysteriously invisible drive pervading the cosmos with the intangible melody that imbues a track with which means and energy.

You possibly can attempt EVI 3’s new voice cloning characteristic for your self here.

Based on Hume’s website, person information produced from interactions with the EVI API are collected and anonymized by default with a view to practice the corporate’s fashions. You possibly can flip this off, nevertheless, by way of the “Zero information retention” characteristic in your profile. For non-API merchandise, together with the demo linked above, the corporate says it “might” acquire and use information to enhance its fashions—however once more, you possibly can toggle this off in the event you create a private profile.

Whispering robots

AI voices have been round for fairly some time, however they’ve traditionally been somewhat restricted of their realism; it is very apparent you are speaking to a robotic if you obtain responses from basic Siri or Alexa, for instance. In distinction, a brand new wave of AI voice fashions, EVI 3 amongst them, have been engineered not solely to talk in pure language but in addition, and extra importantly, to imitate the refined inflections, intonations, idiosyncrasies, and cadences that inflect actual, on a regular basis human speech.

“A giant a part of human communication is emphasizing the precise phrases, pausing on the proper instances, utilizing the precise tone of voice,” Hume CEO and chief scientist Alan Cowen informed me.

As Hume wrote in a blog post on Thursday, EVI 3 “is aware of what phrases to emphasise, what makes individuals snicker, and the way accents and different voice traits work together with vocabulary.” Based on the corporate, this marks a significant technical leap ahead from earlier speech-generating fashions, “which lack a significant understanding of language.”

Many AI consultants would take umbrage with using phrases like “understanding” on this context since fashions like EVI 3 are educated merely to detect and recreate patterns gleaned from their voluminous swathes of coaching information, a course of that arguably would not go away any room for what we would acknowledge as true semantic comprehension.

Additionally: ChatGPT isn’t just for chatting anymore – now it will do your work for you

EVI 3 was educated “on trillions of tokens of textual content after which tens of millions of hours of speech,” in keeping with Hume’s weblog publish. Based on Cowen, this strategy alone has enabled the mannequin to talk in voices which might be way more sensible than would intuitively be anticipated. “With voice [models], what’s been most stunning is how human [they] could be simply by coaching on lots of information,” he stated.

However philosophical arguments apart, the brand new wave of AI voice fashions is uncontroversially spectacular. When prompted, they will discover a a lot vaster vary of vocal expression than their predecessors. Firms like Hume and ElevenLabs declare that these new fashions could have sensible advantages for industries like leisure and advertising, however some consultants concern that they’re going to open new doorways for deception — as was illustrated simply final week when an unknown particular person used AI to imitate the voice of US Secretary of State Marco Rubio and subsequently deployed the voice clone in an try to dupe authorities officers.

“I do not see any cause that we would want a robotic whispering,” Emily M. Bender, a linguist and coauthor of The AI Con, just lately informed me. “Like, what’s that for? Besides possibly to disguise the truth that what you are listening to is artificial?”

Revolutionary turns into routine

Sure, EVI 3’s voice cloning characteristic, like all AI instruments, has its shortcomings. However these are considerably overshadowed by its outstanding qualities.

For one factor, we must always do not forget that the generative AI fashions hitting the market at the moment are a part of the know-how’s infancy, and so they’ll solely proceed to enhance. In lower than three years, we have gone from the general public launch of ChatGPT to AI fashions that may roughly convincingly simulate actual human voices and instruments like Google’s Veo 3, which may produce sensible video and synchronized audio. The breathtaking tempo of generative AI developments ought to give us pause, to say the least.

Additionally: AI agents will change work and society in internet-sized ways, says AWS VP

At present, EVI 3 can simulate a tough approximation of your voice. It isn’t unreasonable to anticipate, nevertheless, that its successor — or maybe grand-successor — will be capable to seize your voice in a manner that feels really convincing. In such a world, one can think about EVI or the same voice-generating mannequin being paired with an AI agent to, say, be part of Zoom conferences in your behalf. It might additionally, much less optimistically, be a rip-off artist’s dream come true.

Maybe essentially the most hanging truth about my expertise interacting with EVI 3’s voice cloning characteristic, although, is how mundane this know-how already feels.

Because the tempo of technological innovation accelerates, so too does our capability for instantaneously normalizing that which might have shocked earlier generations of people into awestruck silence. OpenAI’s Sam Altman made this very level in a latest weblog publish: Based on Altman, we’re approaching the Singularity, but for essentially the most half, it looks like enterprise as regular.

Need extra tales about AI? Sign up for Innovation, our weekly publication.

Source link

I spoke with an AI version of myself, thanks to Hume’s free tool – how to try it

Related articles

Apple’s latest iPad hit a new low price at Walmart – and it’s available in every color

I found a compact power station with solar charging, and it’s a new off-grid essential

Related Posts

Apple’s latest iPad hit a new low price at Walmart – and it’s available in every color

I found a compact power station with solar charging, and it’s a new off-grid essential

I took a walk with Meta’s new Oakley smart glasses – they beat my Ray-Bans in every way

AI’s biggest impact on your workforce is still to come – 3 ways to avoid getting left behind

Is your Ring camera showing strange logins? Here’s what’s going on

Bitcoin Entering Second ‘Price Discovery Uptrend’, What’s Ahead?

Bitcoin Spot-Perpetual Price Gap Turns Negative

Bitcoin Price Flashes Major Buy Signal On The 4-Hour TD Sequential Chart, Where To Enter?

Cardano Price Outlook: The $0.40 Threshold Could Unlock Doors to $1

Bitcoin could reach this unbelievable price by 2025, but these factors must align

XRP Consolidation Could End Once It Clears $2.60 – Top Analyst Expects $4 Soon

Fed Can’t Hold Bitcoin, No Plans Yet To Change Law, Powell Says

Bears Take Full Control of the Market

COINS Act Model Law to Guide India’s Crypto Regulation

UK plans to sell £5 Billion Bitcoin – What does it mean for the crypto market?

CoinDCX Launches White-hat Recovery Bounty after $44M Hack

Ethereum’s price to the moon? Maybe, but traders should be aware of…

Recent News

COINS Act Model Law to Guide India’s Crypto Regulation

UK plans to sell £5 Billion Bitcoin – What does it mean for the crypto market?

Categories

Recommended