The universal horror: The sound of your own voice
Have you ever heard a recording of your own voice? If your immediate reaction was “That sounds awful,” don’t worry—you are in the vast majority. In fact, if you type “own voice” into a Japanese Google search, the top suggested co-occurring word is “ugly.” Me? I desperately wish my voice sounded like Don LaFontaine, the legendary movie trailer voice actor.
While Mehrabian’s law famously suggests visual information (55%) is more influential than vocal tone (38%), the voice still holds significant, unexpected power. This is why I felt a chill last year when I read about Microsoft’s new AI synthesizer, VALL-E. This AI can simulate a person’s voice from just a three-second sample, read any text in that simulated voice, and even inject emotions like anger or sadness.
To avoid dying of self-hatred (or AI-induced paranoia), we must either improve our voices or, at least, understand the science of what constitutes a “good voice.”
The scientific disadvantage: Japan’s 1500Hz problem
Neuroscience has already clarified one technical definition of a pleasing sound: the frequency range around 3000 Hz. Voice that includes this frequency generally sounds good to the human ear.
Here, Japanese speakers face a slight disadvantage. According to the research of French doctor Alfred Tomatis, the average frequency of the English language is typically over 2000 Hz, while the Japanese language averages under 1500 Hz. We are statistically designed to sound less acoustically “good” than English speakers.
Given this clarity, creating a voice that hits that perfect 3000 Hz mark is trivially easy for AI synthesizers. But here’s the core issue: Does a scientifically “good” voice always feel emotionally “good” to humans?
Imagine C-3PO and R2-D2 suddenly speaking in gorgeously fluent, natural human voices created by VALL-E. Could you accept it? Probably not. Human psychology complicates everything, introducing elements like nostalgia and a profound, cultural appreciation for synthetic sound.
The synthetic heart of Japan: The vocaloid paradox
This is where Japan’s unique culture of Vocaloid (a portmanteau of “vocal” and “android”) comes into play. Vocaloid started as synthesizer software, but it evolved into a music genre starring virtual singers.
The history changed forever in 2007 with the release of Miku Hatsune. Miku is not just a singer; she is an open-source movement. People compose songs for her, share music videos online, and her synthetic voice is now instantly recognizable. Most Japanese fans can easily distinguish Miku’s unique synthetic tone and would be deeply unhappy if she suddenly sounded “human.” We actively demand a voice that sounds obviously synthetic. This is the Vocaloid Paradox: we prefer artificiality when it serves as a unique aesthetic.
Miku’s influence transcends the screen. Her “Hatsune Miku Expo” concert (see image below) tours globally. I once attended her concert, which featured a hologram Miku performing with a live band of human musicians. It was undeniably fun. Her character has since spread everywhere, independently of the software function.
Now, for the official secret: I am writing about Miku so enthusiastically because we have a plan to collaborate with her. Wooden furniture and a virtual pop star—it sounds ridiculous, doesn’t it? But perhaps this collaboration is the perfect fusion of Miku’s future-forward synthetic aesthetic and our timeless, naturally crafted comfort.


Shungo Ijima
He is travelling around the world. His passion is to explain Japan to the world, from the unique viewpoint accumulated through his career: overseas posting, MBA holder, former official of the Ministry of Finance.


Comments
List of comments (1)
Blue Techker Awesome! Its genuinely remarkable post, I have got much clear idea regarding from this post . Blue Techker