Current location - Education and Training Encyclopedia - Graduation thesis - Development of Chinese Input Method
Development of Chinese Input Method
Because there are tens of thousands of Chinese characters, it is impossible for a computer keyboard to create a key for each Chinese character. Therefore, people need to encode Chinese characters (search the codes of Chinese characters) and input a Chinese character with several keys. The development of Chinese input method is a process of "ten thousand yards galloping", and thousands of coding methods have appeared in the past 30 years.

Generally speaking, the earliest Chinese character input method came into being with the personal computer PC in the late 1970s or early 1980s. Although telegraph coding existed earlier, and each Chinese character consists of four digits from 0 to 9, which is convenient for post offices to send telegrams, generally speaking, people still think that the Basic Set of Chinese Character Coded Character Sets for Information Interchange was published by the National Bureau of Standards at 198 1. The history of Chinese character input method in Taiwan Province Province can be traced back to 1976, when Cangjie input method was invented by Zhu Bangfu.

The development of Chinese character input method, on the one hand, is the improvement and perfection of input method software function, on the other hand, the emergence of new input method codes. The former is mainly aimed at pinyin input method, while the latter has the situation of "ten thousand yards galloping". Most of the early input method software was charging software, and many enterprises or individuals made money by selling input method software. At present, there are few charging input methods, and most input method software is free. Pinyin input methods include Chinese mainland Pinyin input method and Taiwan Province Pinyin input method. Pinyin input method has natural advantages over other input methods, because every educated China person will spend a lot of time learning Chinese Pinyin or Phonetic Alphabet before learning Chinese characters. Pinyin, originally used to mark the pronunciation of Chinese characters, can be conveniently used as the input code of Chinese characters. Another advantage of pinyin input method is that it is close to spoken language, so pinyin input method can adapt in a short time.

However, pinyin input method has a fatal weakness, that is, when coding with Chinese character input method, the repetition rate of single words is extremely high, and even the repetition rate of phrases is very high. In order to enable Pinyin to input Chinese characters quickly, only with the help of complex input method software can the input ability of Pinyin be improved, such as supporting intelligent sorting, defining words by word, whole sentence input, cloud input function and so on. Looking at the development of pinyin input method, that is, the development of pinyin input method software.

Pinyin input method first appeared at the beginning of the birth of Chinese input method, but the software of Pinyin input method at that time had poor functions, fixed word order, no support for phrase and whole sentence input, and even words could not be displayed together with coding. In the process of inputting Chinese characters, it is often necessary to turn over many pages to find the required Chinese characters, which is very inefficient. Although many people only used pinyin input method at that time, most people were not satisfied with the input efficiency of pinyin input method.

After 1990s, pinyin input method software began to support phrase input and whole sentence input. 1993 Chinese star input method software, which can display words in real time, that is, display Chinese characters while typing pinyin. The Chinese Star software has some ingenious designs, such as space confirmation, comma and period selection, fuzzy sound tolerance and custom strings. These functions have become the necessary functions of all pinyin input method software. At the beginning of 1993, Mr. Zhu Shoutao of Peking University invented the intelligent ABC input method, which was acquired by Microsoft and built into the Windows system. In the next few years, intelligent ABC input method became the most popular input method software in Chinese mainland.

The automatic input method software of 1994 and the dark horse spelling input method software of 1996 have realized the ability of inputting whole sentences (also called sentence input) of Chinese characters. The whole sentence input of Chinese characters can be traced back to the end of 1980s. Wang Xiaolong, a doctoral student at Harbin Institute of Technology, studied Chinese word segmentation, applied for the 863 Project, and wrote a paper on "Minimum Word Segmentation Problem and Its Solution", thus laying a theoretical foundation for the whole sentence input of Pinyin. Since the Chinese version of Windows 95, Microsoft has built-in "Microsoft Pinyin Input Method" which supports the whole sentence input function.

But at that time, the pinyin input method was generally not intelligent, the whole sentence input was immature, and the error rate was high in the input process. Moreover, Chinese characters cannot be displayed synchronously with the typed pinyin (Microsoft Pinyin is one word behind, Autopass is a few words behind, and Dark Horse Pinyin needs final confirmation before Chinese characters appear). It is inconvenient to modify pinyin and select Chinese characters in the whole sentence input process, which greatly restricts the use of the whole sentence pinyin input method, so many users continue to be intelligent ABC. It was not until 1998 that the Pinyin Star software invented by Tan Yajun fully supported the "real-time display" mode. No matter how many pinyin you enter, press each letter and display Chinese characters at the same time. Users will immediately find that there is something wrong with pinyin, and because they support automatic word segmentation and whole sentence input, users don't have to worry about whether to input a word or a sentence, and the system can handle it. Without this word, the system can also learn and save it automatically. 1999 There are several other pinyin input methods: pinyin addition, free pinyin input method and koala input method. Pinyin plus software began to support direct input of English letters with Enter key without switching input methods.

In 1990s, the double spelling input method and the corresponding input method software also developed rapidly. There are many schemes of double spelling input method, such as the natural code scheme provided by natural code input software, and the combination of phonetic coding with double spelling and radicals or strokes, which provides a fast way to input Chinese characters, which is beyond the scope of pinyin input method. Strictly speaking, the spelling of natural codes is not pure pinyin, but a phonetic code. In addition, there are Microsoft, Pinyin Star, Pinyin Gaga, Xiaohe Shuangpin and so on. They all provide their own different double spelling schemes.

After entering the new century, the function of pinyin input method software tends to be mature, and it has officially entered the era of intelligent pinyin input method. At this time, the pinyin input method software mainly integrates the advantages of the previous pinyin input method software, provides a larger vocabulary, and the software is more intelligent and has stronger learning ability.

Intelligent spelling, which came out in early 2000, also provided a more intelligent input method. Purple Pinyin input method is developed on the basis of koala input method, which provides a larger vocabulary and increases intelligent word formation. That is to say, users can continuously input Pinyin strings within 9 words, and the system can automatically convert them into Chinese characters. With or without this word, the system will give word string combinations according to word frequency and high frequency prediction. Violet input method has finally become one of the favorite input methods for users.

With the rapid development of the Internet, in June, 2006, Sohu launched the Sogou Pinyin Input Method under the Windows platform. Sogou pinyin input method is a new generation pinyin input method product based on search engine technology. Users can back up their personalized thesaurus and configuration information through the Internet. Sogou pinyin input method soon replaced intelligent ABC input method and became the mainstream Chinese character input method in China.

After the sogou input method came out, Google, Tencent, Baidu and Microsoft also launched the same type of intelligent pinyin input methods: Google Pinyin input method, QQ Pinyin input method, Baidu input method and Bing input method.

With the popularity of smart phones and tablets, many IT companies have developed pinyin input methods for Android, iPhone and iPad, such as Baidu mobile phone input method, QQ mobile phone input method and Sogou mobile phone input method. These input methods continue the characteristics of computer input methods, and the input method software provides more flexible input methods according to the characteristics of touch screen.

Pinyin input method in Taiwan Province Province is mainly phonetic symbol input method. Similar to Chinese mainland, the input method software is constantly improving and becoming more intelligent. Different from Chinese mainland, the pinyin input method in Chinese mainland always uses English 26 keys as the key position of pinyin input method, while there is no uniform standard for the key position setting of pinyin input method used in Taiwan Province Province, ranging from 40 keys to 30 keys and then to 26 keys. Because phonetic symbols are not in one-to-one correspondence with English letters of the keyboard, numerical keys and symbol keys are often used as codes when setting phonetic symbols.

In Hong Kong, the Cantonese Pinyin Input Method (also known as Cantonese Input Method) is popular, which uses the Cantonese pronunciation of Chinese characters to input Chinese characters on the computer. Due to the lack of a unified pinyin standard method for Cantonese pinyin, the pinyin methods of various software are inconsistent, which hinders the further development and popularization of Cantonese pinyin input method. Although the pinyin input method is simple and easy to learn, due to the phenomenon of homophones in Chinese characters, the duplication rate is high. Even if the input phrases are duplicated, even if the cloud input function is added, the characters cannot be accurately input. Therefore, a large number of coding schemes have appeared besides pinyin coding, mainly including tangible codes and pictophonetic codes. These codes tend to have a lower duplication rate than the pinyin input method, and you can quickly input Chinese characters after you are proficient. The earliest and popular shape code input method in Chinese mainland is the five-stroke font input method invented by Wang Yongmin in 1983. The earliest shape code input method in Taiwan Province Province is the Cangjie input method invented by Zhu Bangfu in 1976.

With the popularization of computers in China, the first urgent problem to be solved is how to input Chinese characters into computers. Although Pinyin can be used as the code of Chinese characters and made into Pinyin input method, for a long time, the efficiency of Pinyin input method is extremely low. In order to input Chinese quickly on the computer, someone abandoned the English keyboard layout and designed a special Chinese keyboard. Some of these keyboards have dozens or even hundreds of keys as codes, but these schemes do not realize simple or fast Chinese input.

Until August of 1983, Wang Yongmin introduced the epoch-making five-stroke font input method. The five-stroke input method uses an ordinary computer keyboard and only uses 25 English letter keys to encode, which not only allows us to input Chinese characters, but also greatly solves the long-standing problem of input speed. Wu Bi font is a typical "shape code", which codes Chinese characters completely according to the characteristics of strokes and glyphs. During the development of Wu Bi fonts, three coding schemes were born, namely, the 86th edition, the 98th edition and the New Century edition. As the first popular shape code input method in China, it was warmly welcomed by many users once it was launched. In the 1980s and 1990s, the first task for many people to learn computers was to learn the five-stroke font input method, and the five-stroke teaching and training classes also blossomed everywhere.

In the late 1980s, another famous shape code input method-Zheng code input method appeared. Zheng Ma is a Chinese input method invented by Zheng Yili and his daughter Zheng Long. Compared with, Zheng code input method is more standardized and extensive, because Microsoft built Zheng code input method into Windows 95 system and became the default input method of the system. It was not until 20 12 Windows 8 that the built-in Zheng code input method was cancelled. Shortly after its listing, Zheng Ma was granted patent authorization by China, the United States and Britain, and passed the national appraisal, winning the Beijing International Invention Gold Award and the Best Invention Award. Won the 22nd Geneva Invention Gold Award. In order to solve the common problem of traditional Chinese characters and simplified Chinese characters, Zheng code adopts double radicals coding to reduce the repetition of radicals. Because it adopts the method of finding radicals and area codes according to features, and mostly adopts standard radicals, it is easier to learn.

In the 1980s and 1990s, because the State Education Commission had not recommended the input method scheme, there were quite a few Chinese character input methods taught by primary and secondary schools, and the Chinese character input methods in different schools were different. Some of them taught fonts, some taught natural codes and some taught Xiao codes. Although Wu Bi font input method can input Chinese characters quickly, it has been widely popularized in China. However, Wu Bi has been unable to become the input method recommended by the State Education Commission, because it is difficult to learn, and there are many unreasonable places in Wu Bi coding itself, such as the root does not conform to the basic components of Chinese characters and violates the stroke order.

After entering the 1990s, the State Education Commission approved the research project of key input methods in the Eighth Five-Year Plan. From 0: 00 to 3: 00 on August 1992, the Department of Basic Education of the State Education Commission and its affiliated National Computer Education Research Center for Primary and Secondary Schools held a seminar on "National Standards for Chinese Character Coding for Primary and Secondary Schools and Computer Chinese Character Input System" in Beijing. Finally, the delegates held that in the current computer teaching in primary and secondary schools, the Chinese Pinyin scheme should be mainly used as the computer Chinese character input method, and the shape code should be chosen with special care to avoid "pollution" to the language and characters, and resolutely oppose the practice of forcing the implementation of non-standard Chinese character input coding scheme in primary and secondary schools by means of commercial competition or administrative orders. Two years later, the project research group introduced the shape code input method named "Cognitive Code Computer Chinese Character Input System". 1995, the state education commission recommended the use of cognitive codes in primary and secondary schools, and it was fully extended to primary and secondary schools across the country.

However, due to many shortcomings of cognitive code itself, it encountered great controversy and resistance in its implementation. Many academic journals have published articles on cognitive codes. Because of the fatal defects of cognitive codes, such as high reproduction rate, complicated coding rules, poor learning, lack of correct and standardized radical selection, and unscientific use of simplified codes, the official codes developed later have been refuted by critics. In the end, the comprehensive promotion of cognitive code will go away.

While the State Education Commission is striving to develop a standardized, fast and easy-to-learn input method, a better input method, the two-stroke input method, has been born among the people. Two-stroke input method is a phonetic code input method invented by Chen Jinsong in 1992, which uses the combination of pinyin initials and strokes (two strokes with one key) to get codes. It was not until Guangdong Erbi Software Co., Ltd., which was established in June 2000, introduced the software of Erbi input method to the outside world that the Erbi input method officially appeared in the public's field of vision. Two strokes input method is not only easy to learn, but also can input words at the same speed as five strokes. The two-stroke input method is standardized, easy to learn and fast, so it passed the evaluation of the Basic Education Curriculum Development Center of the Ministry of Education and was allowed to enter the basic textbooks of primary and secondary schools. This is the only Chinese character input method approved to enter the basic textbooks of primary and secondary schools at the end of 20 13.

Because Guangdong Bi Er Software Co., Ltd. sold Bi Er input method software at a high price, and the intelligent ABC input method and Wu Bi input method widely used at that time (from 2000 to 2004) were free products, only a few people were willing to try to use Bi Er. Finally, Guangdong Bi Er Software Co., Ltd., which mainly deals in Bi Er input method software, was on the verge of bankruptcy in 2004. On the other hand, the two-stroke input method has attracted many two-stroke lovers with its excellent characteristics. Some fans have further improved and optimized the two-stroke input method while maintaining the two-stroke input method software. Most software of two-stroke input method can be obtained and used online for free.

With the advent of the era of intelligent pinyin input method, especially after the birth of Sogou pinyin input method in 2006, non-pinyin-type shape code or phonetic code input method has received less and less attention, and no relevant state departments have participated in the research and development and promotion of input method. However, this does not affect the enthusiasm of many input method enthusiasts for the research of input method coding scheme. Many fans will consider the input method from many aspects, such as reproduction rate, easy learning, support for large fonts, comfort of input keys and so on.

Some input method enthusiasts still hope to get an input method with ultra-low repetition rate, so they made a code reading input method. Among the 6,763 Chinese characters in the GB23 12-80 character set, only 14 words are repeated. In the aspect of supporting large fonts, besides the input method coding itself, it also needs the support of input method software and thesaurus, so Haifeng Wu Bi software, which contains all over 70,000 UNICODE Chinese characters, was born. However, there is still no coding method that can surpass the two-stroke input method in terms of the easy learning of the input method coding scheme, which can be both efficient and easy to learn.

In Taiwan Province Province, Chinese input method also has many coding schemes. 1976, Zhu Bangfu invented the first shape code input method-Cangjie input method. After the invention of the input method, Zhu Bangfu made the input method public for free, which made great progress in the localization of computers. Therefore, Taiwan Province Windows operating system has built-in Cangjie input method. It has also become one of the most popular shape code input methods in Taiwan Province Province. After the advent of Chengcangjie input method, a number of shape code input methods were born. For example, Wang Zanjie invented Dayi input method and Liao Mingde invented line input method. Like Cangjie input method, these input methods are all patented by their authors, so they are also built into Windows system. The most widely used shape code input method in Taiwan Province Province is the shrimp input method, which was invented by Liu Chongji, a native of Taiwan Province Province in the late 1980s. Usually, the input method we refer to refers to the input method on the keyboard of a computer or mobile phone, including pinyin, shape code and phonetic code. In addition to these common input methods, there are voice input, handwriting input and fast recording technology. The development of these input methods is closely related to the development of ordinary keyboard input methods, but independent of ordinary keyboard input technology.

Chinese character phonetic input is an input method which uses speech recognition technology to convert speech into characters. Usually, Markov information model is used for statistical processing, and rule-based method is used for ambiguity discrimination. In the middle and late 1990s, IBM finally launched ViaVoice, a speaker-independent continuous speech recognition system, which was in a leading position in the field of speech recognition at that time. At the same time, many people who are engaged in the research of Chinese character speech recognition in China have established a huge Chinese database (also called corpus) by using the knowledge or research results they have learned in research institutes or universities, and launched a Putonghua speech input system. Iflytek has become the largest intelligent language technology provider in China. External devices are often needed to realize Chinese voice input on personal computers. Nowadays, with the popularity of smart phones, many smart phone input methods have their own voice input functions, such as Baidu mobile phone input method and Xunfei voice input method, and users can also conveniently input words by voice with their mobile phones. However, voice input cannot improve very accurate text input.

Besides pinyin input, handwriting is also a common Chinese character input method. Handwriting input method is a Chinese input method that writes directly on the screen of a handwriting board or a touch-screen mobile phone. Since 1997, a basically available handwritten Chinese character input system has appeared, and a pattern recognition method based on semantic syntax has been adopted. In 1990s, many handwritten products were born, such as Hanwang 99 in China and Bi Hui in Motorola. However, in the following years, handwriting was not widely used. It was not until the appearance of touch-screen mobile phones, especially after the popularity of smart phones and tablets, that handwriting input method was widely used.

Strictly speaking, the speed recording technology does not belong to the input method coding method, and the coding methods used in speed recording are actually mainly Pinyin, Shape Code and Phonetic Code. Quick recording is generally oriented to specific fields, and the employment of quick recording personnel is oriented to government agencies and the judicial system. These fields need high writing speed, especially in meetings, where the stenographer can type while listening, just as words appear in front of him immediately. In addition, the keyboard for quick recording is usually different from the ordinary keyboard, and the keyboard for quick recording is adopted. For example, Yahweh's speed recording adopts the international keyboard of speed recording.

Shorthand recording is the earliest Chinese shorthand recording technology, which was invented by Tang Yu 1993 by pinyin input. Yahweh shorthand is also the most widely used Chinese shorthand recording technology. After Yahweh's shorthand recording, many other shorthand recording techniques have appeared, such as national shorthand recording, supersonic shorthand recording, rapid shorthand recording, five-stroke double recording and so on.

Although the coding scheme used in speed recording is usually Pinyin (a few use five strokes or two strokes), after special coding combination, most of them use combined stroke technology, which requires multiple fingers to press multiple different keys each time, which can effectively improve the keystroke efficiency, thus breaking through the limit speed of 200 to 300 words per minute of ordinary keyboards and reaching the speed of more than 600 words per minute.