In recent years, video input variety has increased. Terrestrial digital broadcasting input signals are in HD, but DVD has SD signals, and there are also differences in resolutions and framerates. On the other hand, the display capabilities of output devices have increased significantly. We have seen wide adoption of 4K television sets, and standardization of 8K and high-brightness, high color gamut formats is underway for contents, transmission, and display. There is now a demand for technological improvements to image quality enhancing technology which bridge the gap between input and output and are capable of handling new image formats.
We are also developing many kinds of display devices, including ones meant for projection and head mounted displays for AR/VR. Compared to flat panel displays, these kinds of devices face a variety of problems, including a lack of brightness and color gradation. Therefore, new approaches to signal processing that take into account their individual characteristics and are capable of faithfully reproducing images are needed.
Sony has been a pioneer of image enhancing technology, its achievements including the introduction of the Trinitron brand of television sets and the Handycam series. In the 4K/8K era, we have relentlessly pursued the three pillars of high image quality, high resolution, high dynamic range (HDR), and wide color gamut (WCG) while aiming to provide the most impressive viewing experience in the world through a total optimization of the video signal flow.
We are developing signal processing technology capable of restoring the temporal resolution, gradation, contrast, and color lost due to factors like data compression in images with a variety of types and qualities and converting them to 4K/8K quality. The core technologies are image analysis, noise reduction processing, and learning-based super-resolution processing.
When performing image quality enhancing processing, it is important to be able to recognize the subject. For example, for accurate image representation of a cloud, image analysis capable of detecting clouds is necessary. We are working on segmentation technology capable of reinterpreting images per object to improve image analysis.
Next is the noise reduction processing technology. It is used to remove block artifacts and mosquito noise that arise when compressing the data as it is sent. Finally, the learning-based super-resolution technology is used to enhance image quality by utilizing machine learning methods to supplement information lost due to compression and other causes, restoring details and making images sharper.
In this article, we will introduce the learning-based super-resolution technology, recreating real texture, and 3D rendering.
Sony has been developing video technology using machine learning since the 1990s. In the 2000s, Sony developed a new proprietary learning-based super-resolution technology for 4K output in projectors and BRAVIA. We have been expanding the variations of its application. Sony is still among the world’s leaders in image enhancing due to multiple patented core and associated technologies as well as a continuous evolution of the underlying techniques, which include image analysis and the mechanisms used to set up and prepare the training data in addition to making image quality adjustments.
Low quality 2K images are replaced by high quality 4K images in real time through the use of a conversion database created beforehand with machine learning.
We are currently utilizing deep learning, a cutting edge machine learning technology, to improve restoration efficiency.
The real texture recreating technology was born out of the idea of restoring image aspects aside from resolution such as contrast, brightness, and color, just like the learning-based super-resolution technology creating 4K images out of 2K images through resolution and pixel number improvement.
In the real world, light from the sun is reflected from objects and enters our eyes. There is a wide variety in the degree of brightness which even modern cameras have trouble capturing. By analyzing physical characteristics such as the subject’s shape, reflectance properties, and the light source in an image, we are recreating the real-world expression of texture.
HDR’s advantage is its ability to adjust the dynamic range when filming and displaying to allow for optimal viewing. As a result of the evolution of filming devices and the progress of transmission method standardization, it is capable of recording the precise gradation from dark to light areas. However, dynamic ranges on the display side may vary and there is no guarantee that they will be able to reproduce the recorded gradation accurately. Controlling the display so that its gradation display capabilities are optimized by analyzing the properties of the image contents will make it possible, for example, to leave the cloud and sky details in a sunset scene intact while also clearly displaying objects that have been darkened due to backlight.
At Sony, development of gradation and color restoring technology capable of enhancing video signals to HDR quality is underway.
The machine-learning image processing technology is used in electronics such as television sets and cameras, but to expand their use cases even further, we are implementing them in the production of entertainment content. One of these uses cases is 3D rendering, or movie production with the use of 3DCG.
In the entertainment industry, there is a growing need for 3DCG contents and improved movie production efficiency. To provide an environment where creators can focus on creative work, we are applying our image processing technology to content creation support. After considering the ways we could contribute as an R&D division from a mid to long-term perspective, we made a technical proposal to the group company Sony Pictures Entertainment (SPE) three years ago, marking the beginning of the path to the practical utilization of the technology.
The vertical axis represents quality while the horizontal axis represents rendering time. The time it takes to produce a single frame in the movie content production field, represented in blue, can be up to and exceeding 1,000 hours, a staggering amount of time. This is why finding ways to reduce rendering time is a pressing issue. On the other hand, in the gaming industry, represented in yellow, it is done in real time, so it is necessary to find ways to improve image quality without increasing rendering time. You could say that the value provided by this technology is its ability to contribute to creating entertainment by bridging the gaps in these two fields.
Why does rendering take such a long time? Because it involves simulating physical events that occur in the real world inside a virtual environment. Which points of an object do the rays from a light source hit and which points on the image do they pass upon being reflected before being displayed on the virtual camera? All these questions must be answered through calculations in the rendering process.
One way to reduce calculation time is to reduce the size of the rendered image, and another is to reduce the number of light rays being traced.
On the left side of the chart, an approach where the number of pixels is reduced (to 1K/2K level) and images rendered at the low resolution are enhanced using super-resolution technology is shown. On the right side, noise generated as a result of reducing the number of rays is removed while rendering information is utilized for restorative purposes in the ray tracing result, thereby improving image quality.
Unlike television, movies are stored indefinitely. It is not easy to satisfy the strict requirements of movie creators and recreate images faithfully. Having them provide the source data used for movie creation and using it as a model for machine learning, we performed repeated trials aiming to increase data accuracy. As a result of utilizing the know-how gained from the learning-based super-resolution technology and other projects, our definition-enhancing super-resolution technology has already technically evaluated by SPE and has been expected to reduce production time. Additionally, preparations are currently underway to move to the experimental stage of using ray interpolation for image quality improvements.
This technology is likely to also be used in the gaming industry, where minimizing the number of calculations will be the main hurdle. The reason is that the framerate drops as the number of calculations increases, hurting the user experience. When resolving this issue in the future, it will be important to utilize Sony’s extensive processing-related expertise, gained through a period of machine learning R&D spanning over 25 years, to create new technologies that combine it with AI.
Applying super-resolution technology to 3D rendering will lead to ways to efficiently create movies in a virtual 3D environment. In this sense, we believe it can be applied to all movie-related services and videos. In addition to movie and videogame production, it may become possible to recreate a remote environment in a different place. This includes recreating remote spaces through telepresence (a technology that allows you to feel like you are together with someone else in person while being physically separated) and livestreamed videos that are processed in real time, making you feel like you are really there.
Thanks to the power of AI and other revolutionary technologies, things that were considered impossible are becoming a reality in the field of movie production. As engineers working at Sony Group, which has deep relationships with content creators, we would like to contribute to the creation of works that provide deeply moving experiences to as many people as possible.
Sony offers various products and services, all of which utilize image technologies. The fruits of our development are currently finding uses even in the entertainment industry. Creators use our technology to produce videos which provide memorable experiences to many people. I feel that is what makes my work worthwhile.
I actually majored in robotics. But it has many similarities with this field since logic is universal. Sony, which has mastered images and sounds, offers a fulfilling environment where you can collaborate with various experts and creators. You can also make use of our super-resolution technology-related assets, built up over 25 years. Even if your own major is slightly different, we are undergoing a major transformation, so there are many great opportunities for junior engineers like you to realize their full potential.
Sony Group boasts a variety of products used by many people around the world, from consumer products like television sets to movies produced by the entertainment industry. We are in the remarkable position of being able to directly contribute by increasing image quality. It is very challenging to implement this technology into actual products, but achieving our goals while collaborating with experts results in a great sense of accomplishment. I am looking forward to having ambitious and motivated colleagues join us.