外媒选读|语音识别弱爆了,语音克隆才是细思恐极
语音克隆
Cloning voices: It is now possible to imitate people's speech patterns easily and precisely. That could bring trouble. 现在机器可以轻松又准确地模仿人类讲话,问题或许也随之而来。
UTTER 160 or so French or English phrases into a phone app developed by CandyVoice, a new Parisian company, and the app's software will reassemble tiny slices of those sounds to enunciate, in a plausible simulacrum of your own dulcet(优美的) tones, whatever typed words it is subsequently fed. In effect, the app has cloned your voice. The result still sounds a little synthetic but CandyVoice's boss, Jean-Luc Crébouw, reckons advances in the firm's algorithms will render it increasingly natural. Similar software for English and four widely spoken Indian languages, developed under the name of Festvox, by Carnegie Mellon University's Language Technologies Institute, is also available. And Baidu, a Chinese internet giant, says it has software that needs only 50 sentences to simulate a person's voice.
巴黎一家新公司CandyVoice开发了一款手机应用,只要对着它说【小词活用】出约160个法语或英语短语,程序就能将这些发音的片段重组,念出之后打字输入的任何字句,听起来和你自己的声音颇为神似。这个应用其实是克隆了你的语音。拼合出的语音听起来还是有点合成的味道,但CandyVoice的老板让·吕克·克莱伯认为,公司算法的改进会令声音变得越来越自然。此外还有一款类似的软件Festvox,由卡内基梅隆大学的语言技术研究所针对英语及四种广泛使用的印度语言开发。而中国互联网巨头百度则表示,其开发的软件仅凭50句话就可以模拟一个人的声音。
Until recently, voice cloning—or voice banking, as it was then known—was a bespoke【别老用DIY】industry which served those at risk of losing the power of speech to cancer or surgery. Creating a synthetic copy of a voice was a lengthy and pricey process. It meant recording many phrases, each spoken many times, with different emotional emphases and in different contexts (statement, question, command and so forth), in order to cover all possible pronunciations. Acapela Group, a Belgian voice-banking company, charges €3,000 ($3,200) for a process that requires eight hours of recording. Other firms charge more and require a speaker to spend days in a sound studio.
直到不久前,语音克隆,即过去所说的“语音银行”【记得语料库英语怎么说吗】,还只是个定制业务,为那些有可能因癌症或手术丧失语言能力的人服务。过去,模仿并合成语音耗时漫长,花费不菲。过程中要录制许多短句,每一句都要以不同的情感侧重及根据不同的语境(陈述、疑问、命令等)重复多次,为的是涵盖所有可能的发音。比利时语音银行公司阿卡贝拉集团对需耗时8小时的录制过程收取3000欧元(3200美元)的费用。其他公司收费更高,还需要顾客在录音室里花上好几天的时间。
Not any more. Software exists that can store slivers of recorded speech a mere five milliseconds long, each annotated with a precise pitch. These can be shuffled together to make new words, and tweaked individually so that they fit harmoniously into their new sonic homes. This is much cheaper than conventional voice banking, and permits novel uses to be developed. With little effort, a wife can lend her voice to her blind husband's screen-reading software. A boss can give his to workplace robots. A Facebook user can listen to a post apparently read aloud by its author. Parents often away on business can personalise their children's wirelessly connected talking toys. And so on. At least, that is the vision of Gershon Silbert, boss of VivoText, a voice-cloning firm in Tel Aviv.
今非昔比【结合语境,不能直接用于词汇翻译】。现有的软件可以存储仅五毫秒长的语音录音片段,并逐一精确标注音调。这些片段可以调换顺序组成新词,并可单独微调,让新词听起来自然顺耳。这比传统语音银行便宜得多,而且还可以开发新的用途。妻子不用太费劲,就可以把自己的声音植入盲人丈夫的屏幕阅读软件里。雇主可以把自己的声音用到工厂机器人身上。Facebook用户可以收听仿佛是由帖子作者亲自朗读的内容。经常出差的家长可以个性化配置孩子的无线联网说话玩具。诸如此类。至少,这是特拉维夫语音克隆公司VivoText的老板格森·希尔伯特的期望。
More troubling, any voice—including that of a stranger—can be cloned if decent recordings are available on YouTube or elsewhere. Researchers at the University of Alabama, Birmingham, led by Nitesh Saxena, were able to use Festvox to clone voices based on only five minutes of speech retrieved online. When tested against voice-biometrics software like that used by many banks to block unauthorised access to accounts, more than 80% of the fake voices tricked the computer. Alan Black, one of Festvox's developers, reckons systems that rely on voice-ID software are now “deeply, fundamentally insecure”.
更令人担忧的是,只要在YouTube或其他地方能找到质量不错的语音片段,任何声音都可以克隆,包括陌生人的声音。在尼特什·塞克森纳的带领下,阿拉巴马大学伯明翰分校的研究人员凭借短短五分钟的网络讲话片段就用Festvox克隆出了语音。许多银行使用语音识别软件来阻止非法入侵账户,当用这类软件来测试时,超过80%的合成语音成功骗过了计算机。Festvox的开发人员之一艾伦·布莱克认为,如今依赖语音识别软件的系统“从根本上来说,极为不安全”【写作句型】。
(节选自《经济学人·商论》2017年6月2日文)
(编辑:xueqi)