It is now common for media such as movies, games, and music, to use multi-channel audio formats. Recent years have seen the advent of new services and products that adopt object-based audio combining the sound source with position information. Given this trend, the need for spatial sound technology that reproduces an acoustic field with high precision is expected to grow because the technology not only offers a more immersive acoustic experience but also improves the creation environment for sound content creators.
360 VME is a sound reproduction technology that has already come into practical use for actual movie production at Sony Pictures Entertainment (hereinafter SPE), which is a subsidiary of the Sony Group and operates motion picture productions among others. SPE had long suffered a shortage of sound mixing studios and, in order to overcome this shortage, the company was seeking a way to listen to sounds virtually without any geographic constraints with the same quality one could get in a sound mixing studio. That was when the spatial sound technology being developed at the R&D Center caught their attention. Since then, the R&D Center has been working together with SPE to realize a professional-use spatial sound technology. Offering a level of quality high enough to satisfy professional sound creators, 360 VME has the potential to enable the evolution of various audio content creation workflows.
Binaural processing is used to reproduce a 3D sound field using headphones. The key to realizing 360 VME was the technology that personalizes for each listener’s Head-Related Transfer Function (HRTF) information added to the sound source in the binaural processing. To personalize the HRTF as much as possible, our development effort covered the whole range of processes from HRTF data measurement to signal processing and reproduction in headphones. This has made 360 VME capable of reproducing an acoustic field with precision high enough even for professional movie creators.
People locate the sound source based on differences in the arrival time and volume of a sound reaching the left and right ears. Even when people listen to a sound from the same sound source, what they actually hear is greatly influenced by the shape of the ear and other factors. The HRTF expresses such change characteristics of a sound traveling from the sound source to both ears. Using these characteristics allows an acoustic field in a space where multiple loudspeakers are installed, as in a sound mixing room, to be reproduced in a pair of headphones.
In developing spatial sound technology for immersive sound reproduction, it had long been a challenge to reproduce sounds with the loudspeaker placed right in front of the listener. When the sound source is on the left or right side of the listener, there are huge differences in the arrival time and volume of a sound reaching the ears. When the sound source is right in front of the listener, however, there are almost no differences and the shape of the listener’s ears has a relatively large influence. To reproduce an acoustic field with higher precision, we focused on the personalization of the HRTF information used for the binaural processing.
In order to enhance the personalization technology to a practical use level, we developed all the technologies and devices that could impact the performance of 360 VME, from microphones and measuring tools to the signal processing technology for converting measurement data to playback signals and custom headphones for playback, in close collaboration with SPE. It is one of Sony’s strengths to be able to create new kinds of UX by combining our diverse assets flexibly.
Through collaboration with SPE, we got detailed feedback from sound creators and addressed their needs, which allowed us to achieve a level of technology suitable for use in the actual creation environment. The needs of sound creators include, for example, adjusting how they feel the reverberation or changing the room size according to the creation process and making it easier to connect to a complex, large-scale sound creation project for a movie.
The custom headphones for playback need to support stable sound reproduction to both ears. This requires a mechanical structure for robust ear fitting, installing the driver in an appropriate position at an appropriate angle for obtaining stable characteristics, and more. So, it was indispensable to work with Sony’s headphone design engineers. In the case of 360 VME, members of the development team took initiative to contact the headphone design engineers and formed a framework for cooperation early on before the project itself got the green light. Being able to collaborate with other departments smoothly like this is another thing that makes Sony strong and unique.
There had been arrangements before the COVID-19 outbreak to use 360 VME for the sound mixing of SPE’s several works. But the pandemic forced the U.S. to enter lockdown. So, we decided to use this technology, initially meant to temporarily make up for the shortage of mixing room, for another purpose - a sound mixing tool in a remote working environment. We are now promoting the 360 VME for use in movie production. With the studios shut down due to the COVID-19 pandemic, Tom McCarthy (EVP, Post Production Facilities, Sony Pictures Entertainment) and SPE’s creators have praised 360 VME, calling it a “game changer” and a “life saver.” These remarks are the biggest compliment we could have hoped for and mean that 360 VME has earned recognition from professionals.
In developing 360 VME, we had frequent discussions with creators involved in movie sound creation and took advantage of the Sony Group’s know-how and technological assets. This has allowed us to refine the quality of this technology to a level of perfection that impresses even professional sound creators.
When we think about deploying this technology for various sound creation applications in the future, we see it as our strength that we have content creation business and can directly communicate with creators. The movie, gaming, and music industries are the same in that they all involve sound creation. But each industry has different creation environments, workflows, and approaches. In the gaming and music industries, 360 VME may be used in different ways than in movies. We need to optimize the technology while communicating with creators. Also, we intend to increase the precision of the technology as appropriate for each individual type of content.
Of course, I am pleased that consumers love our technology. The most impressive thing, though, is to hear professional creators praise 360 VME for helping them out when they are really in trouble. I feel that, from now on, research and development personnel like me will also be required to have an artistic perspective to understand sound design and other aspects of content creation.
We are not just researching and developing cutting-edge technologies, but we are striving to address the needs of professionals at the same time. That’s no easy task. But the detailed feedback from creators is very stimulating. I find it motivating and fulfilling to overcome challenges and gain recognition from those professionals.
The feedback from content creators is not always given in a quantitative form that is easy for engineers to understand. Faced with these challenges, we make improvements by constantly communicating with the creators in many ways to cater to their ideas and creativity in our development project. To me, that process is sort of interesting.
Sony has this culture of openness where even someone you don’t know helps you out in a flexible way if you ask for advice or assistance through an in-house communication tool. You will get a lot of chances to make your dreams and wishes come true. The rest depends on your passion. In terms of work, what I think is good about the Sony R&D Center is that people can pursue their areas of interest until they feel fully satisfied.