Current location - Education and Training Encyclopedia - Graduation thesis - Stylegan2 paper
Stylegan2 paper
Author | Xiao Wei

At the end of February, various versions of "Hey Ant" were released in Tik Tok. Some netizens said, "Opening Tik Tok seems to have smashed the ant nest."

Through an APP called Avatarify, users only need to upload a photo, and the photo owner can make all kinds of desired expressions. As of press time, there are more than 250,000 videos of "Hey Ant" in Tik Tok, and the video playback volume of related topics has reached 3 billion times. Avatarify won the first place in the domestic App Store application free list on February 25th, and then ranked first in the overall list for several days.

Compared with three days after ZAO became popular, Avatarify could not escape the fate of face-changing software, and it was removed from the APP Store in China only seven days later (it can still be used abroad at present).

Avatarify was developed by a Russian programmer and put on GitHub. It was originally used to "relieve boredom" for video conferences such as Zoom and Skype. For example, you can change your face into Musk's face in a video conference and interact in real time. So far, this project has got nearly 1.2 million stars on GitHub.

A few months later, Avatarify launched the APP version (iOS version only). In principle, Avatarify trains the algorithm on the face image to be exchanged with the help of deepfake and other technologies. By training the algorithm on similar target images, the model supports real-time face transformation.

Behind the frequent removal of face-changing software are privacy and information security issues. Many people will worry that their face information will be leaked or abused, but we don't have to kill the AI technology behind it-deep synthesis. In addition, deep synthesis has many more valuable applications in many industries.

The depth composition was first noticed by the public at 20 17 1 1. At that time, a user named "deepfakes" on the American news website Reddit uploaded a synthetic pornographic video, replacing the face of an actor in the pornographic film with the face of a star. Since then, the media began to use deepfake to describe this AI-based video synthesis content. But many people mistakenly think that deep synthesis is deepfake, and it is really embarrassing to change your face.

First of all, deepfake is a subset of deep synthesis. However, changing face is the first to enter the public's field of vision, and it is also the most well-known deep synthesis application.

The connotation of deep synthesis is very extensive, including the synthesis and automatic generation of voice, image, audio, video and face with the help of artificial intelligence algorithm. Its typical applications include: face replacement (face changing), face reproduction (manipulating the facial expression of the target object, for example, making them say something they have never said), face synthesis (AI produces a real face image, in fact, this face does not exist), speech synthesis, whole body synthesis and so on.

Secondly, the frequent privacy security and abuse of pornographic scenes caused by deepfake will make people prejudice and misunderstand the deep synthesis technology, and even think that the forged content of AI will impact social trust and so on. However, with the application of deep synthesis technology in more fields, the public's understanding of deep synthesis technology is more mature.

The AI technology behind deep synthesis mainly includes two modules: automatic encoder and gan (Generation Countermeasure Network). GAN consists of two groups of artificial neural networks, one is a generator and the other is a discriminator. In countless confrontations, the generator finally makes the discriminator no longer able to distinguish between real data and synthetic data, thus generating highly real content.

The most advanced image generator in the industry is NVIDIA's StyleGAN, which was opened on Github from 2065438 to February 2009.

The Report on the Development of AI-Generated Content in 2020-The First Year of Commercialization of Deep Synthesis (hereinafter referred to as the report) released by Tencent Research Institute and Tencent Youtu Lab shows that the evolution of deep synthesis technology has accelerated in recent years, and it presents several major technical trends:

1. In addition to single audio and image synthesis, deep synthesis technology is developing towards integration.

Second, after the face is formed, total body synthesis will become a new hot spot.

3. Besides 2D synthesis, 3D synthesis technology (especially virtual digital human) will be the focus of the next stage.

Moreover, with the maturity of "deep synthesis" technology, it has been applied to many fields such as film and television, entertainment, education, medical care, e-commerce, advertising and marketing.

In the media industry, AI anchors are getting more and more popular. In 20 18, sogou and Xinhua News Agency launched the world's first AI composite anchor, and in 2020, the two sides launched the world's first 3D AI composite anchor. 3D AI synthetic anchor is based on surreal 3D digital human modeling, multimodal recognition and generation, real-time facial motion generation and driving, transfer learning and many other artificial intelligence frontier technologies. , so that the machine can generate high-fidelity 3D digital human video content based on the input text and present the same video broadcast as a real person.

In addition, Internet giants including Baidu, JD.COM and Netease have also launched virtual digital people. The virtual digital person introduced by Baidu AI Cloud became the "virtual employee" of the first bank in China.

In the field of automatic driving, deep synthesis is used to develop automatic driving simulation system (AADS), create virtual road environment, and provide training and testing for automatic driving system.

In the medical field, training AI system by generating medical images that are not different from real medical images can solve the problems of insufficient medical data and patient privacy protection. In the paper jointly published by NVIDIA and its partners, the method of synthesizing brain MRI images with tumors by using GAN algorithm is demonstrated. In the process of algorithm training and generation, only 10% real data is needed, and the AI diagnostic system can detect tumors in real images.

In the field of advertising marketing, the face and virtual image synthesized by AI can replace the real model to participate in marketing activities, and no one will be like copyright issues. For example, Generated Photos is a website that automatically generates faces with AI. There are more than 65438+ 10,000 faces generated by AI in its database, which can be downloaded and used for free, and there is no copyright issue. These free face pictures can be used in many scenes, such as advertising leaflets, websites, PPT presentations, questionnaires, user avatars and so on.

The abuse of deep synthesis is an important topic of artificial intelligence governance.

Porn industry is a pioneer in the adoption and popularization of new technologies, and AI technology is no exception. At present, the porn industry is the hardest hit by the abuse of deep synthetic technology. According to the report, from 20 19 to 12, there were 14678 in-depth synthetic videos in the whole network, of which 96% were pornographic in-depth synthetic videos, mainly concentrated in pornographic websites.

How does deep synthesis technology prevent people from doing evil? Diversified governance is a recognized concept, including legal scheme, technical scheme, industry self-discipline and public education.

In terms of law, some developed countries have introduced relevant bills. However, it is worth noting that there is no "one size fits all" ban on the use of deep synthesis technology, but it is forbidden to use deep synthesis technology to engage in pornographic video synthesis, false news, interference with elections and other illegal activities. For example, the DeepFakes Liability Act of the United States Congress and other related bills only prohibit deep synthesis for political interference, pornographic retaliation, impersonation and other purposes, and require producers to add watermarks and other marks to the deep synthesis content.

Technically, identification technology and traceability technology are two mainstream methods. However, in authentication, there is no universal video authentication scheme at present, and a targeted authentication network needs to be trained for each emerging composite technology.

Although the threshold of in-depth content synthesis has been greatly reduced, ordinary people can also complete entertainment in-depth content synthesis on smart terminals such as smart phones, but such content is often easier to identify. High-quality, high-simulation deep synthesis content still needs professional tools and skills. Therefore, we need to guard against risks and not panic.

AI is like a very clever student of human beings, while TA just learns what human beings teach quickly and faithfully.

As the report said, "Deep synthesis is not about' forgery' and' deception', but a very creative and breakthrough technology. Although it, like other technologies, has spawned a series of problems that must be faced, it will not obliterate the progress that this technology has brought to society. "