Why we don’t like our own recorded voice?
Have you ever heard your recorded voice? How do you like it? Don’t worry. You’re not the only one. I know most of us really don’t like our own recorded voice. As evidence of that, if you put the words “own voice” in the Google search box in Japanese, the top co-occurrence word is “ugly.” Me? Of course, I hate my recorded voice and always wish my voice sounded like Don LaFontaine, the legendary movie trailer voice actor. Indeed, visual information is more influential than vocal information according to Mehrabian’s law, of which ratio is said to be 55 to 38. Don’t you think voice is unexpectedly important?
This is why I thought it must be a weapon to kill people with self-hatred when I heard the news that Microsoft had developed a synthesizer “VALL-E” last year. Surprisingly enough, the synthesizer can simulate our voice only for three seconds, read out texts in the simulated voice, and even put emotions like anger and sadness into voice. For fear of abuse, the program codes don’t seem to be open to the public as of yet, but in order not to die due to self-hatred, we have to improve our voice or at least learn more about what good voice is.
The conditions or definition for good voice
As some of you may know, neuroscience has already clarified what good voice is. There’s a specific frequency that we can hear pleasantly. It’s 3000 Hz. We can say, in general, that the voice including a frequency of 3000 Hz sounds good. On this point, Japanese people face a disadvantage because the average frequency of the Japanese language is under 1500 Hz, while that of English is over 2000 Hz according to a French doctor, Alfred Tomatis.
Anyway, as you can imagine, now it’s very easy for AI synthesizers to create good voice or voice including a frequency of 3000 Hz, but that’s another issue if people really feel it’s good or not. That is because of the complexity of human psychology which enriches and also always troubles our lives. For example, imagine the Star Wars series where C-3PO and R2-D2 talk very fluently in natural and even gorgeous human voice created by AI synthesizers. Can you take it? I don’t think so. It may be just nostalgia, but I’m sure there is a demand at least in Japan for synthetic voice that sounds obviously synthetic because we have a music genre called “Vocaloid” in the Japanese music scene.
Vocaloid may take the place of human singers in the future
Vocaloid is a coin word made by combining vocal and android. Originally Vocaloid was the name of speech synthesizers developed in around 2000, but now it is more regarded as a music genre where not actual human beings but virtual singers perform. It was in 2007 that history changed by the release of Vocaloid, “Miku Hatsune.” Since then, many people have made the virtual singer Miku sing the songs they composed, and shared the music videos on YouTube, etc. Many Japanese people can distinguish Miku’s synthetic voice and wouldn’t be happy at all if her voice changed.
Miku is a virtual singer, but her presence is not limited only in videos. Her concert named “Hatsune Miku Expo” (see below) has taken place all over the world every year since 2014. Some years ago, I went to her concert on the invitation of the company developing Miku. On the stage, there was Miku’s hologram with some actual humans playing the guitar, bass, and drum. It was fun! Now, Miku becomes so popular that her visual characters have been spread everywhere independently from her Vocaloid functions. By the way, do you know why I’ve been writing about Miku so enthusiastically here? Off the record, there’s a plan to collaborate with Miku. Collaboration between wooden furniture and a virtual pop star sounds interesting, doesn’t it?
Shungo Ijima
He is travelling around the world. His passion is to explain Japan to the world, from the unique viewpoint accumulated through his career: overseas posting, MBA holder, former official of the Ministry of Finance.