Voice cloning: an interview with Paul Welham, CEO, CereProc

Interview conducted by April Cashin-Garbutt, MA (Cantab)

insights from industryPaul WelhamCEO, CereProc

What conditions put individuals at risk of losing their voice?

CereVoice Me was essentially designed for those who had been diagnosed with MND/ALS and other illnesses that may cause an individual to lose the power of speech. CereProc identified that there was a gap in the market and pulled our resources together to build CereVoice Me, the goal of the service is to build a TTS voice efficiently and at a cost effective price.

Could you please give an overview of text to speech (TTS) voices and how they have traditionally been created?

CereProc offer two types of bespoke voice builds Unit selection and HTS.

Unit selection produces high quality, natural sounding voices, it requires at least 4 hours of good quality data which takes at least 40 hours of recording. Most of the voices on CereProcâ€™s demo bar is a unit selection build. HTS on the other hand only requires 40 minutes of good quality data, which will require up to 3 hours of recording. HTS is not a natural sounding as opposed to the Unit selection however you can still quite clearly identify who the individual is.

CereProc understood those who have been diagnosed with MND for example, have a limited time so taking that into account we developed CereVoice Me, the user only requires a quiet room and a special headset supplied by CereProc. The recording period should not take longer than 3 hours. The user can do the recordings in one go but for those with MND they can take breaks in between recording.

What were the main drawbacks of traditional voice creation methods?

With CereProcâ€™s traditional voice builds, the recording period is a long time (typically 20-40 hours) for those who have MND/ALS, which may not be feasible for them. CereVoice Me is a much faster and more cost effective method in comparison to CereProcâ€™s traditional voice builds.

Traditional voice creation methods could cost around a quarter of million US dollars and sometimes take around 18 months to create.

How does CereProcâ€™s online voice cloning tool, CereVoice Me, work?

Once the user has purchased it from CereProcâ€™s webstore, they can download the software and a special headset with a microphone will be sent to out to them from CereProc. The software consists of sentences that has the required phonetics to build a TTS voice and the user just needs to record these sentences. This typically takes 1-3 hours and can be done in the comfort of oneâ€™s home.

Does CereVoice Me require specialist equipment?

Just a quiet room and the headset sent out by CereProc. An engineer will listen to it first to check the recording sounds ok.

How long does CereVoice Me take to create a TTS and how much does it cost?

It should take no longer than a week for CereProc to build a TTS voice and CereVoice me is priced at Â£499.99.

CereProc has also been working with MND Scotland, those who have been referred by the charity will receive a 10% discount. We are open to working with more charities and organisations moving forwards and urge them to get in touch.

What feedback have you had on the quality of the TTS voices that have been created using CereVoice Me and how does this differ from traditional methods?

We have had positive feedback from users, many comment on how easy the process is and that the quality is better than expected. Also how it has allowed them to retain a part of their identity.

CereVoice Me has definitely exceeded our expectations, we recently built a voice for a boy in Newcastle using a HTS build and he had his friends and family contribute to the recordings and CereProc built him his own Geordie TTS voice which he now uses all the time. Heâ€™s much more confident and he attends the National Star College and he is also taking part in a documentary on BBC3 filming life around the college.

I would recommend that as soon as a patient is diagnosed they should look into this service so that they can get it before their voice deteriorates. If their voice has already deteriorated we can look at getting a near match from another individual.

In what ways do you plan to improve CereVoice Me moving forwards?

As a company we want to raise more awareness on CereProc, let more people know that they can still keep apart of themselves despite having a terrible illness. We are not looking to make money out of CereVoice Me, the team have built a system with the aim of helping and improving a personâ€™s life.

CereProc is currently taking part in a European project and part of the research and development is looking at CereVoice Me and how to improve parametrisation of the HTS build.

What do you think the future holds for voice cloning

The process of voice cloning could be simplified even further, we might only require 10 minutes of speech to build a TTS voice in the future.

Where can readers find more information?

CereProc has built a Geordie TTS voice for a National Star College Student, Lewis who will be featured in BBC 3 documentary The Unbreakables this Thursday at 9pm, more information can be found here: http://www.nationalstar.org/the-unbreakables/

About Paul Welham

Paul Welham has over 30 yearsâ€™ experience in the IT industry, with the last 15 years specializing in speech technology. Paul joined ICL in 1979, and steadily progressed within the organization, through a number of key sales and marketing roles, in UK and International positions.Â Paulâ€™s last role at ICL was as the UK Director of Business Development responsible for corporate resellers and system integrators such as EDS, Cap Gemini, Computacenter, and GE Capital.

Prior to founding CereProc in 2005, Paul was at Telephonetics, as Sales and Marketing Director, during his time there the company expanded from 5 to 35 staff, the majority of whom reporting to him. Major success at Telephonetics with speech based systems sales included Odeon, CineWorld, with Filmline and various large NHS Trusts with the ContactPortal. Since founding CereProc, Paul has seen the company grow from 2 to 10 staff and remain profitable throughout, while developing an exciting synthetic speech product range.