Current location - Education and Training Encyclopedia - Resume - What is the difference between semantic analysis and text analysis? It's urgent ! ! ! !
What is the difference between semantic analysis and text analysis? It's urgent ! ! ! !
1, semantic analysis is a logical stage in the compilation process, and the task of semantic analysis is to examine the context-related properties and types of correctly structured source programs. Semantic analysis is to check whether there are semantic errors in the source program and collect type information for the code generation stage. For example, one of the tasks of semantic analysis is to conduct type checking to check whether each operator has operands allowed by language specifications. When it does not conform to the language specification, the compiler should report an error. If some compilers want to report errors when real numbers are used as array subscripts. For example, some programs stipulate that operands can be forced, so when performing binary operations on an integer and a real object, the compiler needs to convert the integer into a real object, which cannot be considered as an error of the source program.

2. Text analysis refers to the representation of text and the selection of its feature items; Text analysis is a basic problem in text mining and information retrieval. It quantifies the feature words extracted from the text to represent the text information. Transform them from an unstructured original text into structured information that can be recognized and processed by computers, that is, scientifically abstract the text and establish its mathematical model to describe and replace the text. In this way, the computer can recognize the text through the calculation and operation of this model. Because the text is unstructured data, in order to mine useful information from a large number of texts, we must first convert the text into a manageable structured form. At present, people usually use vector space model to describe the text vector, but if the feature items obtained by word segmentation algorithm and word frequency statistics method are directly used to represent each dimension of the text vector, then the dimension of this vector will be very large. This unprocessed text vector not only brings huge computational overhead to the follow-up work, which makes the whole processing process very inefficient, but also damages the accuracy of classification and clustering algorithms, thus making the results unsatisfactory. Therefore, it is necessary to further purify the text vector and find out the most representative text features on the basis of ensuring the original intention. In order to solve this problem, the most effective method is to reduce the dimension through feature selection.