In order to have a deeper understanding of the development status and prospects of NLP in China, CCF's efforts in NLP direction and the development of NLPCC conference, Leifeng. com. Com interviewed Zhou Ming, director of CCF Chinese Information Technology Committee and vice president of Microsoft Asia Research Institute (who is also the executive director of China Chinese Information Society (CIPS) and the incoming chairman of ACL of International Computing Language Society), Professor Zhao Dongyan of Peking University and Secretary General of CCF Chinese Information Technology Committee (Lei Fengcom will report later).
The main content of this paper is that Dr. Zhou Ming deeply introduced the research progress of natural language processing and the development status and prospect of natural language processing in China from the perspective of CCF Chinese Information Technology Committee. His opening remarks are as follows:
At present, governments (including the United States, German, Japanese, China, etc. ) are doing some planning of artificial intelligence, but China's planning of artificial intelligence is the clearest. Combining the State Council's China Artificial Intelligence Development Plan (July 20 17) and the report of the 19th National Congress of the General Secretary (July 20 17 10), we can see that China has planned two stages of artificial intelligence development, the first stage is to enter the world advanced level in 2020, and the second stage is in 2030.
Our domestic natural language processing is basically synchronized with the national planning of artificial intelligence. In other words, we will reach the advanced level in the world in 2020, and it is expected to reach the top level in the world in 2030.
What's the difference between advanced and top? The advanced level means that you follow the most developed countries in the world, and you have mastered all the key technologies, but you are not the initiator of key technologies, that is, you are not a leader; At the top, you are actually leading. You tell the world which direction to go. You propose a key theoretical model, and others are following you. This is the difference.
In the NLP field, China is now a very good follower. Once any technology appears in the world (mainly in the United States), we immediately learn to master it and apply it quickly, no worse than in the United States. What is different now is that we are not the first to put forward this technology and method. Therefore, CCF Chinese Information Technology Committee thinks that we are basically at the advanced level in the world now, and will reach the advanced level in three years' time, that is, in 2020. On this basis, we expect to reach the top level in the world in 2030. This is our vision.
The following is Dr. Zhou Ming's in-depth explanation. Lei Feng. Without changing the original intention, com simplified and edited the interview content to provide dinner for readers.
First, natural language processing is the core of cognitive intelligence.
Lei Feng. Com: What is the position com:NLP in the whole AI field?
Zhou Ming: In recent years, artificial intelligence has entered a period of rapid development due to the four elements of big computing, big data, algorithm model (represented by deep learning) and landing scene. Its main development direction: perceptual intelligence and cognitive intelligence.
The so-called perceptual intelligence is the perceptual ability such as vision (image) and hearing (sound). Everyone knows that perceptual intelligence is advancing by leaps and bounds, such as ImageNet evaluation for image recognition and Switchboard evaluation for speech recognition, which have reached or even surpassed the level of human beings in this test set. The research progress in this field has also promoted the development of many applications, such as security, face recognition, object detection, and the application of speech recognition in mobile phones, smart homes and other devices.
Cognitive intelligence, generally speaking, means "being able to understand and think". Cognitive intelligence has many things, the core of which includes language intelligence, knowledge map, user portrait and so on. On this basis, it supports intelligent writing, chatting, poetry creation, text generation, games and other applications. Some are doing well, such as the game system represented by AlphaGo; But some of them are not satisfactory. At present, cognitive intelligence has lagged behind perceptual intelligence in introducing deep learning, but it is in a catch-up state. For example, the quality of neural machine translation is getting better and better, and the chat system and man-machine dialogue are getting better and better.
Natural language understanding is the core of cognitive intelligence. Its progress will lead to the progress of knowledge map, the enhancement of users' understanding ability and the further improvement of overall reasoning ability. On this basis, chatting, problem solving, translation and dialogue will also be improved. Once cognitive intelligence advances, coupled with the progress of perceptual intelligence, the overall artificial intelligence will further develop.
Bill Gates once said that "language understanding is the jewel in the crown of artificial intelligence", and Dr. Shen Xiangyang also said that "those who know the language have the world", all of which emphasized the importance of NLP. Natural language processing technology will promote the overall progress of artificial intelligence, so that artificial intelligence technology can be put into practical application.
Second, the development of NLP in the next five to ten years
Lei Feng Net: How will NLP develop in the next five to ten years?
Zhou Ming: There are roughly several directions: 1) The progress of question answering and reading comprehension will make search engines more accurate; 2) Speech recognition and neural machine translation will make oral machine translation completely practical; 3) Due to the improvement of the accuracy and real-time of users' portraits, information services and advertisements are more natural, friendly and personalized; 4) Improve the skills of chatting, question answering and dialogue, and promote the practicality of natural language dialogue; 5) Due to the progress of dialogue technology and knowledge map, intelligent customer service and manual customer service are more perfectly combined, which greatly improves customer service efficiency; 6) due to the progress of natural language generation technology, automatic writing of poems, compositions, automatic news generation and even novels will be popularized; 7) The progress of man-machine dialogue promotes the popularization of voice assistant, Internet of Things, intelligent hardware and smart home; 8) Finally, NLP+, that is, NLP is widely used in vertical fields such as finance, law, education and medical care.
Take the intelligence of search engines as an example. In previous search engines, entering keywords would return a bunch of things, which you need to see for yourself. With the improvement of automatic question answering, reading comprehension and other abilities, the current search engine can ask a question without fear of sentences. It can analyze this problem and find the answer from the voluminous documents. Even if it doesn't just give you a document link, it can give you the answer directly, and the results of search engines are getting more and more accurate.
Lei Feng. Com: What direction should NLP research pay attention to in the future?
Zhou Ming: I personally care about the following points: 1) Personalized service through user portraits; 2) Insight into the mechanism of artificial intelligence through interpretable learning; 3) Improve learning efficiency through the combination of knowledge and deep learning; 4) domain adaptation through transfer learning; 5) Realizing continuous evolution through reinforcement learning; 6) Make full use of unlabeled data through unsupervised learning; 7) Understanding, Q&A and conversion between multimedia and multimodal.
Third, China ranks second in the world in NLP research.
Lei Feng. Com: What is the current development status of China in NLP field?
Zhou Ming: The development of NLP in China has two aspects, one is the level of scientific research, and the other is industrialization. China has done a good job in NLP industrialization. For example, NLP occupies a core position in the technical system of search engines, e-commerce, news websites, machine translation and smart speakers. I will focus on the research level of NLP in China.
Take ACL as an example. ACL is the highest academic conference in the field of natural language processing in the world. About 20 years ago, there was no ACL article in China. 1998, the research group of Professor Huang Changning of Tsinghua University published the first ACL article. At that time, China's research foundation in NLP direction was weak, and Japanese, Korean and even China, Taiwan Province Province and Hongkong published more articles about ACL than Chinese mainland.
Microsoft China Research Institute (note: later renamed Microsoft Asia Research Institute) was established in June 1998 1 1, which greatly promoted the development of NLP in China. All previous deans have called on everyone to go international and encouraged researchers of research institutes to cooperate with universities and related societies, so that we can work together to promote the domestic research level. Microsoft Research Institute has helped China cultivate a large number of NLP talents through joint laboratories, summer schools and internship programs.
At the same time, CIPS, CCF and other societies organized various seminars and academic conferences, introduced international advanced theories and technologies, and greatly promoted the improvement of local natural language processing level. In terms of article publishing, NLP people in China are also constantly striving to improve their influence in ACL. The Government of China has strengthened its input and guidance in the field of natural language processing through the Natural Science Foundation and the 863 and 973 programs. Through the efforts of all walks of life, after nearly 20 years of rapid development, China has become the second ACL country.
In recent five years, the number of articles (including long articles and short articles) about ACL in China ranks second, only after the United States. In terms of long articles, there are about 20 to 30 articles from the United States; At the same time, it is far ahead of all other countries, including Japan, South Korea, Germany, Britain and so on. It turns out that China can't keep up with these countries. If the number of ACL long articles in China continues to increase in the future, it is possible to catch up with the United States within three years. Because NLP is developing very well in China, this is an expected goal.
In terms of Chinese articles, in 20 14 years, Chinese first author articles accounted for 36% of the total ACL articles, and then increased year by year. This year, it is 40%, and many of them are students studying in China except China.
Judging from the above figures, China's ACL articles have indeed leapt to the forefront of the world. This is a very surprising result. There was only one ACL article in China 20 years ago, and now it ranks second in the world.
In addition to the number of articles, the quality of ACL articles in China has also been greatly improved. For example, among the 22 excellent papers in ACL 20 17, there are five articles from China on the list.
China is becoming more and more active in international activities. For example, ACL Executive Committee has 65,438+03 members, including 3 from China and Zhao and I from Baidu. I am the candidate chairman of ACL (note: I will take office on 20 19), Zhao is the secretary general, and Taiwan Province Province is the chief IT officer.
In addition, the total number of sponsors and sponsors from China is close to that of the United States; Judging from the number of participants, we also ranked second.
Other important meetings in NLP field, such as COLING or EMNLP, are similar.
So China is the second strongest country in NLP.
The Christian Children's Fund has made many contributions in this regard. CCF Chinese Information Technology Committee organized academic conferences such as NLPCC and ADL lectures, and a number of university activities. At the NLPCC conference, a student workshop was specially organized to teach students how to do research and write papers. The Christian Children's Fund also works closely with CIPS to host language and intelligence summits in turn. This summit has effectively promoted the development of NLP field and enhanced its influence in society.
Of course, we still have some problems to improve. This is manifested in: 1) There are few international conferences or activities in NLP field held in China; 2) There are fewer ACL members from China; 3) In the international NLP conference, there are few invited reports, best papers, SIG chairmen, workshop chairmen and counseling speakers from China; 4) Although the number of papers from China ranks second, many of them are more or less following other people's tastes. It is expected that articles from China will reflect the leading trend in the future.
Fourth, the reasons for the rapid rise of NLP in China.
Lei Feng. Com: What caused the rapid progress of NLP in China?
Zhou Ming:? First, the whole country, whether it is industry and agriculture, national economy or comprehensive national strength, is on the rise. Second, our integration with the international community is getting better and better. For example, the working language of our NLPCC conference is English, and the chairman of the conference, the chairman of the program Committee and the chairmen of various fields have two co-chairmen, one from China and the other from abroad. Thirdly, domestic universities and companies have absorbed a large number of excellent NLP talents through training and introduction.
In particular, I would like to mention the contribution of foreign companies and domestic Internet companies to ACL. For example, Microsoft Research Asia has cooperated with many universities in China and Asia in all directions, including training doctoral students and interns in summer schools and laboratories, and trained a large number of NLP talents. For example, from 65438 to 2008, Microsoft Research Institute trained as many as 450 interns in NLP field. These people come from all over the country. After their internship at Microsoft, they returned to various colleges and universities, and then joined companies or schools to become leadership tasks, thus driving the growth of the next wave of talents and constantly promoting the development of this field.
It should be pointed out that large Internet companies such as Baidu, Ali, Tencent, JD.COM and Today's Headlines, as well as many emerging companies (such as Mobvoi, Guo Shuang, Singularity Wit, Mavericks Translation, Spirits, Xinhua Zhiyun, etc.) have also made great contributions to the development of NLP in China in various aspects. On behalf of CCF, I am very grateful to these domestic and foreign enterprises for their contribution to the development and progress of NLP.
Lei Feng. Com: Japan, South Korea and other countries have developed in NLP earlier than China. Why are they behind China now?
Zhou Ming: I think there are several factors. The first factor is that China has seized the development and opportunity of China's Internet in the Internet age, while many other countries are relatively backward in Internet (especially in mobile Internet, e-commerce, search, etc.). For example, many countries don't have their own search engines, but there are many in China, such as Baidu, sogou and Microsoft's localized search engine Bing. Search engine plays a great role in promoting natural language, because its demand for question understanding, article understanding, question answering and translation promotes the development of related NLP technologies. At the same time, its great economic value has attracted many people to invest in the research and industrialization of this field. A country without a search engine will naturally fall behind in NLP.
Another factor is data. China has the largest data in the world, more than 800 million mobile Internet users and a lot of e-commerce data, which will contribute to the development of research and technology.
The third is the role of the government in this regard. The position of the country in the world economic chain will lead to its position in the era of Internet and mobile Internet, especially in the current era of artificial intelligence. Since China is now the second largest country in GDP, it has caught up with this trend in the Internet era, especially in the era of mobile Internet, and China has even led the trend. China municipal government has formulated relevant plans to support and guide the development of technology and industry. Therefore, it is predicted that in the era of artificial intelligence, China will surpass other countries and become the top developed country of artificial intelligence. Research related to artificial intelligence will also be promoted accordingly, including NLP.
Lei Feng. Com: Apart from China and the United States, which countries are doing better in NLP?
Zhou Ming: According to ACL, the United States, China, Britain, Germany, Japan, South Korea and Canada all have their own characteristics. The University of Edinburgh and the University of Oxford in the United Kingdom have good characteristics in the study of natural language.
NLP also has a good development in Canada. Although there are relatively few people engaged in natural language in China, and there are far more people engaged in NLP in Beijing alone than in Canada as a whole, it has put forward many world-leading methods, such as neural machine translation and new methods of machine reading comprehension. It is worth learning from China in theoretical innovation.
How verbs (abbreviation of verb) become a powerful NLP country
Lei Feng. Com: How should China improve its research or application in NLP next?
Zhou Ming: It depends on several aspects.
First of all, I think we should seize the opportunity of China's development. 1) digital transformation. Now China pays attention to digital transformation, and all enterprises and industries should be digital. Only by digitalization can there be artificial intelligence. However, many enterprises have not even done a good job in digitalization, so there are many opportunities here. 2)AI craze. The AI boom has driven the market investment demand, and talents and data have further developed. This is a very good opportunity, and all people engaged in NLP should follow the trend.
Second, we must do a good job in popularization. Although there are many colleges and universities engaged in NLP in China, many of them are still relatively backward, and they don't know enough about the latest technology. Many colleges and universities (especially in the west) have a weak foundation, so it is necessary to do a good job of popularization. CCF Committee has a special working group called "Joining the College Group". In response to the call of CCF, our natural linguists also entered the university. We went to many universities (such as Tibet University) to teach artificial intelligence, the development of natural language and the latest technology, and called on more students to learn artificial intelligence and natural language.
Lei Feng Net Note: The reading distribution of a representative AI article on WeChat official account (ID: A ItechTalk) is always in a double-digit (even single-digit) state in the western region. This also reflects the distribution of domestic AI workers to some extent.
Third, attract and cultivate top-notch talents. First of all, attract top international talents to China, learn about the development status of China through returning to China for meetings or cooperation, and strengthen exchanges with domestic universities and enterprises. Finally, I hope that some talents will be attracted by domestic development opportunities and stay. In addition, it is more important to cultivate more outstanding talents with solid theoretical foundation and rich practical experience, even high-level leading talents through the degree training mode of the school and the internship channels of the company.
Fourth, promote the internationalization of our research in China. Including NLPCC managed by Christian Children's Fund. In the past few years, it has been held in China. In the future, we will also consider holding conferences in Singapore, Japan, South Korea and even the United States to promote our local research in China to the world, especially to lead the international trend in the field of Chinese computing.
Fifth, strengthen innovation. Including 1). For example, develop unsupervised machine learning algorithm, enhance NLP task modeling by using context and user portraits, and integrate knowledge and data to improve the ability of NLP system. 2) Open up new interdisciplinary fields, such as NLP and the intersection of images and videos. There is also an in-depth study of the wide application of NLP in important vertical fields; 3) product innovation, through the combination of software and hardware, combined with specific scenarios, to enhance user experience.
Sixth, we should attach importance to data and tools and evaluation. CCF and our China Computing Committee have set up a data working group to share data for use, training and evaluation. For example, NLPCC20 17 has attracted many schools and companies to participate in the fields of vocabulary and phonetic relationship recognition, short text classification, single document summarization, question and answer and user portrait.
Seventh, promote greater cooperation in Industry-University-Research. Through CCF and other platforms, we will attract people from industry to join our research process, and promote the industrial development of the company and the academic development of universities through various cooperation.
Finally, China should consider exerting greater influence in international conferences and organizations. Including organizing and hosting world-class conferences, striving to become a member of the executive committee, chairman of the general assembly, chairman of the planning committee and chairman of the field of world-class societies, and giving full play to China's influence.
It should be pointed out that although the development of natural language processing in China is good, we still face many difficulties. It needs the continuous efforts of the government, schools, scientific research institutions, companies, relevant associations and people from all walks of life. In particular, strengthening theoretical innovation and exploring new opportunities in interdisciplinary and vertical fields can gradually transition from followers to leaders. I believe that if these measures can be well implemented, China's NLP will surely develop steadily to a higher goal in the next step, and eventually become one of the top NLP levels in the world.
Persistence is not because there is hope, but because there is hope. The following is a speech I organized for you about sticking to your dreams, hoping to