Initiatives

Experience “The physical and virtual connection”! STEF 2022 report

Mar 15, 2023

Recently, the place where emotions are created extends beyond physical world and into the virtual.
With that in mind, we hosted STEF 2022 under the theme of “The virtual and physical are seamlessly connected” and exhibited 17 different technologies and initiatives over 3 sections:

1. Capturing physical world
2. Digital processing
3. Recreating physical world

We would like to introduce you to some of our experiences at the technical exhibition.

1. Capturing physical world

“ToF AR”: an app for VTubers

“ToF” (Time of Flight) is a technology that acquires depth information an object by measuring the time it takes for the light to be reflected and bounced back. This exhibition was on a software development kit that uses “ToF” technology. “ToF AR” has various special features, including the ability to perform smooth hand-tracking even when the target moves quickly, high depth perception accuracy, and more.

“ToF AR” is targeted at application developers and aims to promote the development of applications that utilize depth information.

At the booth, participants could experience what it was like to have an avatar follow their every move using only their smartphone cameras and no markers or other equipment.

We received various comments from participants, like “It's amazing how it can accurately detect small movements, like snapping my fingers, just by using depth information collected by my smartphone's camera!”

Video shooting and streaming support

The next booth introduced a mock stage equipped with an auto shooting system and a mobile camera robot. With the increasing demand of online entertainment due to the COVID-19 pandemic, we have made it possible to remotely film concerts and have achieved high-quality and low-cost shooting of live concerts.

At the live demonstration, the exhibitor pretended to be a singer up on stage, and the auto shooting system's filming of the pretend live concert and the mobile camera robot's follow-up shots could be seen up close.

The operation touch panel user interface (UI) that supports the auto shooting system is a unique UI that has been highly advanced by the video creator, and it allows them to control unmanned cameras and to handle camerawork using software.

By following the singer's movements and moving as it films, the mobile camera robot achieves a new kind of camerawork with movement and power. A map of the robot's environment and shooting scenario can easily be made using operating tools. Looking forward, we plan to connect it with a switcher to provide better camera angles as we continue development and aim to further automate the system.

2. Digital processing

Surgical simulator

Through utilizing a surgical simulator that creates realistic experiences in a virtual world and that processes in real time based on the physical law, participants could experience authentic physical interaction in a virtual space.

Thanks to the newly developed physics engine, it is possible to simulate with a realistic feel via a haptic device like the touching of soft organs. Authentic textures of different sensation blocks could be felt, the blocks could be manipulated, and small changes in strength could be felt.

Additionally, by using ray-tracing technology to reproduce life-like visuals and surgical techniques, participants could experience what it was like to actually perform coagulation, clipping, and dissection in surgery.

The surgical simulator is expected to offer a new effective training method to surgical education, which traditionally takes many years. When talking about the outlook of this, the development manager said “As a result of training in a virtual space and improving real-life surgical technique, we will then go on to collect more data on real surgeries to improve the simulator's reproduction quality, thus bringing forth growth in AI and robotics.”

Restoring video and music with deep generative models

The next thing participants experienced was the deep generative model, one of the AIs being developed by Sony. Once it is trained, this technology not only generates contents but can also be used to restoration. It is expected this technology will be especially useful in the creation of video and music.

There are various types of deep generative models, and one gaining recent attention is the diffusion model. This model, using a learning method that uses equations based in the physical law to generate high quality, gradually transforms random noise, finally generating a clear new image.

Additionally, to efficiently train the generative model on large-scale contents, compressing the data becomes very important. However, a conventional learning scheme is known to be unstable, and it is necessary to attempt the learning many times to get a good compressor. In response to this, in order to stabilize the compressor learning method so that it finishes in one try, Sony developed a novel method, called the Stochastically Quantized Variational Autoencoder (SQ-VAE). A demonstration using this model to generate images that do not exist from text prompts was shown.

Conventionally, general machine learning required us to prepare a large amount of both broken and clean data. However, by using the deep generative model trained only on clean data, missing data can be restored from broken samples.

On top of that, deep generative models are useful in restoring not only images but also music. At the exhibition booth, a demo in which participants listened to sound through headphones was shown. In this demo, they heard clean audio samples that are obtained by removing the vocal reverb and background noise in the recording with this technology. The restored audio samples were comparable to the original clean audio signals.

AI technologies such as deep generative modeling are expected to become increasingly important as a technology to support creators in the game, music, and film industries, where a great deal of content is produced.. We will continue the pursuit of better technology expected to be utilized in creating higher quality contents.

3. Recreating physical world

A large and high-quality 3D display

Research on the 3D display has continued since before the beginning of the COVID-19 pandemic. Remote operation and telecommunications technology that made use of a large screen and high video quality is also in development. Online communication has become the current standard, but the demonstrated 3D display showed the conversation partner in 3D. The conversation partner could show brochures and could do other things that made the experience feel like an authentic face to face conversation. We received various comments from participants, like “The 3D technology is amazing! I feel like I could actually shake hands with them! It feels no different than an in-person conversation and I almost feel nervous.”

Using high 3D video quality and low-latency transmission technology that we developed independently, this 3D display allows one to experience telepresence in real-time without wearing special glasses or using other special equipment. We see this technology being useful in times when there is no other choice but to have an online meeting, even when meeting face to face is preferred, like when consulting on a mortgage or receiving an explanation about or confirming the purchase of an expensive product. This technology can provide the experience of an in-person conversation despite not being able to actually meet.

Our exhibition partner, Sony Bank, as of 2022, is already implementing the use of our large 2D display telepresence system, “Mado (means Window),” at some of their locations. They say they are also aiming to make use of the 3D display going forward.

Held conferences with outside guests

In addition to the technology exhibition, we hosted two days of online conferences. We welcomed outside experts and discussed the future of technology.

The first day of the conference started with an opening remarks. Sony Group's Senior Executive Vice President and CTO Hiroaki Kitano talked about his thoughts and approach on research and development, as well as what direction he plans to take with future research and development activities. Next, for “AI for social impact: Results from deployments for public health” Milind Tambe was invited to discuss how AI can help address society's important issues and how businesses should go about facing and resolving these issues. Additionally, we gathered Sony Group's top engineers to exchange their opinions on 3D-3R technology, one of the group's strengths.

On the second day, we held two special lectures. Pascale Fung (The Hong Kong University of Science and Technology) and Natasha Crampton (Microsoft) were invited to discuss AI ethics of the present and the future in the panel “AI Ethics: Latest challenges and future prospects.” Additionally, the latest applications of Sony Group's video and content creation technologies were introduced under the theme of “The Frontline of Content Creation by Cutting Edge Technologies.”

A place where diverse perspectives meet

The external exhibition area was invite-only, and the media, investors and analysts, business partners, researchers, engineers, creators, university and high school students, and other external stakeholders came to participate. It was here that STEF's concept, “share diverse technology and exchange ideas,” came to life. The lively exchange of ideas on technology between participants and technical development staff also turned into a chance for Sony to receive feedback on our development environment and structure.

Additionally, after the creators and students saw Sony's technology in person and experienced a new side of Sony through conversation with engineers, they left us with comments, like “The technical introduction of new devices and software and the demonstrations, which showed specific examples of how to use them to make new content, were easy to understand” and “Being able to see, in person, the kind of development Sony is performing deepened my interest.”

During the event period, Mr. Justin Lin, film director of the "Fast & Furious" movie series, the "S.W.A.T." TV drama series, and the popular manga "One Punch Man," which is currently in development to be made into a live-action movie by Sony Pictures Entertainment, visited the company. After visiting the exhibitions in the area for external guests, he also had a discussion with Hiroaki Kitano, Senior Executive Vice President and CTO.

The STEF 2022 special site is available. Posted on the site are more details and information on the exhibited technology as well as five conference videos. Please take a look.

Related article