Current location - Education and Training Encyclopedia - Graduation thesis - Information extraction supported by integrated system
Information extraction supported by integrated system
(a) Remote sensing image processing supported by GIS

GIS is often combined with remote sensing image processing methods to enhance and extract remote sensing information more effectively. This is mainly manifested in two aspects.

On the one hand, GIS is used as an important auxiliary means of visual interpretation of remote sensing images to improve the interpretation accuracy. The specific method is to display vector thematic layers (such as geological map, topographic map, land use map and vegetation cover, water system development, etc.). ) is helpful for image interpretation superimposed on the image to be interpreted, as long as these vector layers have been registered with the image and have unified coordinates. This can be used for human-computer interactive image interpretation, drawing the image interpretation results directly on the screen, and selecting the correct training sample area before supervising image classification to improve the classification accuracy. In this application, we should also pay attention to the time difference between these thematic layers and images, such as the great correlation between vegetation coverage and time phase. In addition, if the time difference is long, we should fully consider the changes of some objective conditions, such as land use types, buildings and roads.

On the other hand, the combination of g is and remote sensing image processing is that with the support of GIS technology, geoscience and other knowledge directly participate in remote sensing image processing. For example, in the classification of remote sensing images, DEM, NDVI and other knowledge can be directly used as new bands of remote sensing images and classified together with other bands, so that the distribution of these thematic information will be reflected in the classification results. For example, the application of expert system is also the result of the combination of GIS and remote sensing technology.

(b) Thematic information extraction supported by GIS

1. Research progress of remote sensing thematic information extraction methods

Remote sensing thematic information extraction is to obtain the information of a specific feature from remote sensing image data, and its purpose is to distinguish the thematic targets contained in the image. Classification is also a method to extract thematic information, but thematic information extraction is different from remote sensing image classification in general. First, the target is set, and then the object-oriented recognition is consciously carried out, and classification is to assign and classify the existing pixels in the image. With the improvement of remote sensing technology and the deepening of remote sensing application, its methods are constantly improving, and it has gone through many stages, such as visual interpretation, automatic classification, spectral feature information extraction, spectral and spatial feature information extraction.

Visual interpretation is the original way of image recognition. Now image recognition is developing in two directions, one is the automation of information recognition brought by the appearance of computers, and the other is to explore the high precision of information recognition along the essence of remote sensing information transmission. They don't have strict boundaries, and they infiltrate each other with their own development. The existing computer automatic classification methods only use image data, and do not automatically add other contents, such as geoscience knowledge, and do not make full use of the knowledge applied by human brain in analyzing images, so it will not achieve high accuracy. Classification based on knowledge and expert system improves the accuracy of classification. Similarly, the preliminary extraction of thematic information is also to analyze the spectral characteristics of specific targets, form laws, and operate images. The emergence of artificial intelligence in the computer field makes it possible to extract subject information based on knowledge or information. Remote sensing imaging is a mapping from more to less, which is a definite process, while image interpretation is a mapping from less to more, which is an uncertain process. Therefore, remote sensing interpretation involves an important geoscience processing process, including two aspects: one is to supplement the information that remote sensing has not brought back, that is, to supplement the geoscience-related information; The other is to infer the information that is not reflected in the image according to the geoscience analysis of the image information, which needs the strong support of geoscience knowledge. How to use geoscientists to quantitatively express the knowledge of visual interpretation, let them participate in computer processing, and fundamentally realize the automatic extraction of knowledge participation is the focus of the current research on automatic extraction of thematic information.

Before automatic classification by computer, it is actually a statistical process to train the training area. This statistical process is only in terms of this image. Then, the statistical results are used for regression, and a category judgment model which is basically suitable for this image is established. When extracting thematic information, there is usually a set of remote sensing information model first, and then it is constantly modified according to the actual situation of a specific image. The essence is to adjust the model parameters and finally make the model suitable for the image. Remote sensing information model is an inversion model of ground objects extracted on the basis of existing ground experiments, but the reflection of ground objects on satellite images is not in one-to-one correspondence with the measured data on the ground, which makes the image data very random for many reasons, which involves the problem of spectral radiation correction. Therefore, it is necessary to effectively combine remote sensing information theory with actual map images to extract thematic information.

2. Remote sensing geological thematic information extraction

Nowadays, remote sensing satellites are "starlight flashing", and remote sensing data is unprecedented. However, the utilization rate of remote sensing information is extremely low. The reason is that we lack methods and models to extract remote sensing thematic information. Compared with remote sensing information extraction of land use/land cover, remote sensing geological information extraction is more difficult. Generally speaking, there are three main ways to extract remote sensing information: visual interpretation extraction, information extraction based on classification and remote sensing information extraction based on knowledge discovery. There are three ways to extract remote sensing geological thematic information.

(1) visual interpretation and extraction

The main methods of extracting information from remote sensing images in the early days were visual interpretation and extraction. Because visual interpretation can make comprehensive use of image feature knowledge such as tone or color, shape, size, shadow, texture, pattern, location and layout of ground objects, as well as expert knowledge of ground objects, combined with other non-remote sensing data for comprehensive analysis and logical reasoning, it can achieve more accurate thematic information extraction, especially when extracting ground objects with strong texture structure characteristics. It is a technology of commercial production at present, which has obvious advantages compared with traditional non-remote sensing methods. Although this method is labor-consuming and time-consuming, it will still exist for a long time in the extraction of remote sensing geological information because of the difficulty of automatic extraction of remote sensing geological information by computer.

(2) Automatic extraction of remote sensing information based on classification method.

In the aspect of automatic extraction of remote sensing information, classification method has the longest research history, and its core is remote sensing image segmentation, and its methods include supervised classification and supervised classification. As far as unsupervised classification is concerned, there are K-MEANS method, dynamic clustering method, fuzzy clustering method and artificial neural network method. In supervised classification, there are minimum distance method, maximum likelihood method, fuzzy classification method and artificial neural network method. The maximum likelihood method needs prior knowledge of all types and their probabilities, especially it needs to assume that all types of distributions belong to normal distribution, so it is a classifier with parameters. In the case of prior probability knowledge and various normal distributions, the classification effect is good and the classifier has the advantage of fast classification speed. Fuzzy classification is a classifier based on fuzzy mathematics. It is based on the assumption that a pixel consists of many types, but each type has different membership. When training the classifier, it is necessary to determine the membership degree of each type in the training sample pixels. It does not need prior probability knowledge of all types, nor does it require all types to obey normal distribution. It is a nonparametric classifier. However, it is difficult to determine the membership degree of each type in training pixels. This method is suitable for extracting sub-pixel information. Artificial neural network classifier is a classifier constructed by artificial neural network technology. Artificial neural network is a nonlinear science that has developed rapidly in recent years. It is an artificial intelligence technology that simulates biological neural network, and has been widely used in trend analysis, pattern recognition and classification of remote sensing images. Artificial neural network is a nonparametric classifier, which does not need all kinds of prior probability knowledge and does not require all types to obey normal distribution. Although it takes a short time to classify with a classifier, it takes a long time to train the classifier.

As far as unsupervised classification is concerned, the results of its classification need experts to interpret and merge the categories and finally determine their types. As far as supervised classification is concerned, it is necessary to select a large number of training sample areas, which is not only time-consuming and laborious, but also directly affects the classification effect. At the same time, classification is to segment the whole image, which requires the highest overall accuracy, so it is impossible to completely guarantee the highest accuracy of the thematic information we need. Classification is based on mathematical statistics, not on the mechanism analysis of remote sensing information, nor on knowledge mining, which makes it difficult to realize full automation of thematic information extraction from remote sensing images. At the same time, the classification based on spectral characteristics is difficult to solve the problem of the same spectrum of foreign objects. The knowledge gained in classification is usually neither transferable nor easy to explain. This is why we know the result, but we don't know the reason. We must repeat the work of selecting training samples for images anytime and anywhere. In this way, it is obvious that the automation of remote sensing information extraction is greatly limited. Therefore, remote sensing thematic information extraction based on knowledge discovery will become another most promising direction.

3. Remote sensing thematic information extraction based on knowledge discovery.

Remote sensing thematic information extraction based on knowledge discovery is the development trend of remote sensing thematic information extraction. Its basic contents include discovering knowledge, applying knowledge to establish an extraction model, and extracting remote sensing thematic information by using remote sensing data and models. In the aspect of knowledge discovery, it includes the spectral characteristics, spatial structure and shape of a single remote sensing image, and the spatial relationship between objects. Among them, the knowledge of spatial structure and morphology includes the knowledge of spatial texture, shape and edge shape characteristics of ground objects; From multi-period remote sensing images, not only the above knowledge can be found, but also the knowledge of dynamic change process of ground objects can be further discovered. All kinds of related knowledge can be found in GIS database. In the aspect of using knowledge to build a model, the corresponding remote sensing thematic information extraction model is mainly established by using some knowledge, some knowledge or all knowledge found, as shown in Figure 3-8. When using remote sensing data and models to extract remote sensing thematic information, it should be from simple to complex, from the application of single knowledge and single model to the comprehensive application of multi-knowledge and multi-model. From the use of single data to the comprehensive use of multiple data.

4. Remote sensing thematic information extraction based on spectral knowledge.

Spectral knowledge of ground objects is the most important knowledge in remote sensing thematic information extraction. The research on spectral characteristics of ground objects has long been highly valued by all countries. China has conducted in-depth research on the spectrum of ground objects, and published books such as Spectrum of Typical Ground Objects in China and Analysis of Their Characteristics, Test and Application of Remote Sensing Reflectance Spectrum, etc. Zhou Chenghu and Du Yunyan established an effective NOAA AVHRR water extraction model on the basis of analyzing the spectral characteristics of water. According to the spectral characteristics of rice and its background, the extraction models of rice planting area (TM4/TM 1, TM4/TM3, TM4/TM2) were established in Jennifer. Helmut Mayer Carsten Steger discussed the method of extracting roads from remote sensing images by analyzing the knowledge of road spectrum. Jinfei Wang, Paul M.Treitz and Philip J.Howarth discussed the method of extracting new roads from SPOT PAN images by gradient direction profile analysis, and used it to update the road network in urban GIS database. V.Lacroix and M.Acheroy use the constrained gradient method to extract the corners of houses.

R.Haralick, S.Wang, G.Shapiro and J.B.Campbell discussed the extraction of river network and its flow direction by using consistency labeling technology. Moller-Jenson proposed to establish a water extraction model by using CH4 < 45 and CH5 < 35 of NOAA AVHRR. Jupp et al. once proposed to use TM7 band to extract water by threshold method.

Figure 3-8 Remote Sensing Thematic Information Extraction Model Based on Knowledge Discovery

It is found that spectral knowledge includes typical sampling method, spectral curve method and spectral profile method. Taking the Landsat remote sensing image of the United States in Washixia, Xinjiang as an example, the image size is 5 12×5 12 pixels, and the main typical features are exposed rocks, snow and shadows. In order to make use of the spectral knowledge of these features, firstly, these typical features are spectrally sampled, and the spectral sampling results are shown in Figure 3-9. It can be found that these celestial bodies have obvious spectral differences.

By comparison, it can be found that the spectral characteristics of rock exposed area, snow-covered area and mountain shadow are obviously different:

(1) On the whole, the exposed rock area has higher reflectivity and lower shadow reflectivity. The exposed rock area in the middle is slightly higher than or close to the snow area in TM 1, TM2, TM3 and TM4 bands, but much higher than the snow area in TM5 and TM7 bands.

(2) The snow cover area is higher than the shadow in TM 1, TM2, TM3 and TM4 bands, and close to or slightly higher than the shadow in TM5 and TM7 bands.

Figure 3-9 Spectral Sampling Curve of Typical Ground Objects in Washixia Area

(3) In TM 1 ~ TM7 band, the exposed area of rock is much larger than the shadow area.

(4) From the spectral relationship, the reflectivity of the shadow area gradually decreases from TM 1 to TM7, that is, TM1> TM2 > TM3 > TM4 > TM5 > TM6 > TM7. From the spectral relationship, the exposed area of rock reaches the peak at TM4, that is, the spectral relationship between TM3 TM5 snow-covered areas is also obvious, that is, there is an obvious downward trend from TM4 to TM5.

Through the above spectral analysis, the extraction models based on spectral knowledge are established for bare rock, snow and shadow respectively:

Snow:

1∶ 250,000 remote sensing geological mapping method and technology

Shadow:

1∶ 250,000 remote sensing geological mapping method and technology

Rock and roll:

1∶ 250,000 remote sensing geological mapping method and technology

Snow, shadows and bare rocks can be extracted according to the above model.

The extraction of thematic information based on spectral knowledge requires that the ground object and the background are spectrally separable, and there is little isomorphism between them, and the spectra inside the ground object should be consistent. When the spectrum inside the object is inconsistent, it can be extracted by means of the spectrum of the characteristic components inside the object. When there are many isomorphic phenomena between the spectrum of the internal components of local objects and the background, it is necessary to extract them with the help of other knowledge of local objects.

5. Extracting thematic information based on texture knowledge of ground objects.

When the composition of ground objects is complex and larger than the spatial resolution of the sensor, the structure and composition of ground objects can be remotely sensed. Its image has obvious texture features. When there are texture features different from background objects, it is difficult to extract thematic information completely based on spectral feature knowledge, so it is necessary to extract thematic information by using spectral knowledge and texture knowledge of objects. Texture refers to the spatial variation of gray value, which is a pattern composed of some texture primitives according to different spatial configurations. The spatial configuration of texture primitives can be random, deterministic, probabilistic and functional. Texture can be divided into structural texture and unstructured texture, also known as random texture. In visual interpretation, texture is generally described and expressed by thickness, smoothness, granularity, randomness, directionality, linearity, periodicity and repeatability. When using texture to identify ground objects, it is necessary to compare and analyze the texture characteristics of a theme with the surrounding ground objects. There are four main texture recognition algorithms in ERDAS IMAGINE: average Euclidean distance method (first order), variance method (second order), slope (third order) and kurtosis (fourth order). Their calculation method is as follows:

(1) mean euclidean distance method (first order)

1∶ 250,000 remote sensing geological mapping method and technology

Where xijλ- digital value of λ band (i, j) pixel of multi-band image;

XC λ-λ band digital value of the center pixel of the active window;

N- the number of pixels in the window.

(2) Variance method (second order)

1∶ 250,000 remote sensing geological mapping method and technology

Where xij—— is the digital value of the pixel (i, j);

N- the number of pixels in the active window;

M- the average value of the active window.

(3) Slope (third order)

1∶ 250,000 remote sensing geological mapping method and technology

Where xij—— is the digital value of the pixel (i, j);

N- the number of pixels in the active window;

M—— the average value of active windows;

Variance.

(4) Kurtosis (fourth order)

1∶ 250,000 remote sensing geological mapping method and technology

Where xij—— is the digital value of the pixel (i, j);

N- the number of pixels in the active window;

M—— the average value of active windows;

Variance.

In addition, the common texture detection method is * * * generating matrix method.

This project mainly uses the second-order variance method of ERDAS IMAGEINE software to calculate the texture features of images. The moving window used in the calculation is 5×5. From the texture map of TM image in Washixia area, Xinjiang, it can be found that the texture index (second-order variance) of rock exposed area is higher and the image appears brighter, while the texture index of non-rock exposed area is lower and the image is darker. Using the appropriate threshold to extract the exposed area of rock, we can find that the results basically accord with the actual situation.

6. Thematic information extraction based on the knowledge of the shape of ground objects.

Sometimes, objects and backgrounds not only have the same or similar spectral characteristics, but also have similar texture characteristics. In this case, it needs to be further extracted according to the shape knowledge of ground objects. For geological lithology, different lithology often has different spatial characteristics:

(1) intrusive rocks

Intrusive rocks generally have regular plane geometry, such as circle, ellipse, lenticular and vein. And most of them lack the characteristics of bedding image. Large intrusive rocks often have annular and radial types of water systems, joints or dike groups in the image.

(2) Sedimentary rocks

The plane shape is banded or banded, which has obvious bedding image characteristics. A group of sedimentary rocks with orderly distribution often constitute layered image features with different colors.

(3) Metamorphic rocks

The image characteristics of metamorphic rocks are generally related to the composition of original rocks, the addition of new substances and the change of structure during metamorphism. If it is metamorphic rock, the image characteristics are similar to magmatic rock; If it is a negative metamorphic rock, its image characteristics are similar to sedimentary rocks.

There are three methods to discover the morphological knowledge of ground objects: methods based on perimeter and area, methods based on area and methods based on area and area length.

Method based on perimeter and area

Shape index

1∶ 250,000 remote sensing geological mapping method and technology

For a circle, k is greater than 0.25, for a square, k is equal to 0.25, and for a rectangle, k is less than 0.25. Linear objects, such as roads, airports, rivers, etc., have smaller K values. For irregular objects, the more complex the shape, the smaller the K.

Roundness:

1∶ 250,000 remote sensing geological mapping method and technology

Compactness:

1∶ 250,000 remote sensing geological mapping method and technology

Thin rate:

1∶ 250,000 remote sensing geological mapping method and technology

Area-based measurement

Compact index:

1∶ 250,000 remote sensing geological mapping method and technology

1∶ 250,000 remote sensing geological mapping method and technology

Based on area and area length

Shape rate

1∶ 250,000 remote sensing geological mapping method and technology

Elliptic index

1∶ 250,000 remote sensing geological mapping method and technology

In all the above formulas:

A- the area of the object;

AC- minimum circumscribed circle area;

P-the perimeter of the feature;

The length of the major axis.

Shape knowledge can be used to locate or qualitatively extract features. When it is used for positioning qualitative extraction, firstly, the boundary between the extracted features is enhanced; Then determine the shape index, so as to achieve qualitative positioning extraction. When used for qualitative extraction, it is mainly to further affirm the attributes of the extracted thematic information.

(c) Comprehensive analysis of information from multiple sources with the support of geographic information system.

In the extraction of remote sensing geological thematic information, in addition to remote sensing data, there are a large number of related data, such as geological maps, physical and geochemical data. , generally used. When using these data, there are two steps: the first step is to mine knowledge; The second step is to use this knowledge to connect graphic data with remote sensing images to support the extraction of thematic information. This knowledge is some positive related knowledge and anti-related knowledge. These two kinds of knowledge can be further divided into deterministic knowledge and probabilistic knowledge.

2 1 century satellite remote sensing will provide multi-spectral, multi-temporal, multi-resolution and all-weather earth observation data for geoscience research, and promote the wider and deeper application of remote sensing. But in the past twenty or thirty years, geoscience thinking has guided the development direction of remote sensing technology; Meanwhile, the application level of remote sensing lags behind the development of space remote sensing technology. The outstanding performance is that the remote sensing data sent back by satellites are not fully utilized, and the lag of information extraction level makes the rich knowledge hidden in remote sensing data far from being fully tapped and utilized, resulting in a huge waste of remote sensing information resources and a decrease in application value. Therefore, the ability and efficiency of information extraction will become one of the outstanding problems faced by remote sensing applications in the future.

The technical theories of data mining (DM) and knowledge discovery from database (KDD) appeared in the late 1980s, and developed rapidly in recent years, which are the products of the combination of artificial intelligence, machine learning and database technology. It is different from simply retrieving and querying information from the database management system, and it emphasizes the extraordinary process of "discovering hidden and previously unknown potential useful information from the database" and "identifying efficient patterns from the data". This model is new, potentially useful and finally understandable ",and its purpose is to transform a large amount of raw data into valuable knowledge. This is the bottleneck of satellite remote sensing information processing at present and in the future. Learning from the theory and technology of data mining and knowledge discovery will help to solve the contradiction between the rapid growth of remote sensing data and people's difficulties in data processing and understanding.

1. Spatial data mining and knowledge discovery

The emergence and development of KDD and data mining technology are based on the fact that, on the one hand, data and databases are expanding rapidly; On the other hand, the application of the database is still in the stage of query and retrieval, and the rich knowledge hidden in the database is far from being fully tapped and utilized. The massive increase of databases is in sharp contrast with people's difficulties in processing and understanding databases. The word KDD first appeared in the symposium of 1 1 International Joint Conference on Artificial Intelligence held in Detroit, USA in August, and then continued to hold KDD symposiums. With the increasing number of participants, an international KDD conference has been held every year since 1995. In addition to theoretical research, a considerable number of KDD products and application systems have appeared, and they have achieved certain success in practical application.

According to Fayyad's definition, KDD is "an extraordinary process of identifying effective, novel, potentially useful and ultimately understandable patterns from data sets". The general process of KDD (Figure 3- 10) includes data preparation, data mining and result interpretation and evaluation.

Figure 3-3- 10 KDD process diagram

Data preparation includes data selection, data preprocessing and data conversion. The purpose of data selection is to determine the operating object of the discovery task, that is, the target data, which is a set of data extracted from the original database according to the needs of users. The purpose of data preprocessing is to remove noise and so on. When the object of data mining is data warehouse, generally speaking, data selection and data preprocessing have been completed when the data warehouse is generated. The main purpose of data transformation is to reduce the dimension of data, that is, to find really useful features from initial features, so as to reduce the number of features or variables that need to be considered in data mining.

In the stage of data mining, we must first determine what the task or purpose of mining is, and consider and decide what mining algorithm to use. The same task can be realized by different algorithms, and two factors need to be considered when choosing the implementation algorithm: First, different data have different characteristics, and relevant algorithms need to be used for mining; The second is the requirements of users or actual operating systems, such as the preference between accuracy and understandability.

2. The main types and methods of data mining and knowledge discovery.

The data mining of general statistical database is the earliest and most mature. Generally speaking, data mining and knowledge discovery can be divided into the following types (Fayyad, 1997):

(1) Classification: a learning function that maps data items to one or several defined classes.

(2) Regression: a learning function that maps data items to real-valued predictive variables.

(3) Clustering: a method to find limited categories to describe data sets.

(4) Generalization (or generalization): Find a way to describe the * * * properties of each data subset.

(5) Dependency patterns: look for patterns that describe the significant dependencies between variables.

(6) Detection of changes and deviations: Compared with previous data, significant changes were found.

At present, a large number of new methods and combinations of various methods have appeared in the research of data mining and knowledge discovery, among which the famous methods are:

(1) ID3 and C4.5 methods based on decision tree classification.

(2) AQ 15 and CN2 generalization methods.

(3) Rough set method is used to solve imprecise and uncertain knowledge.

(4) A large number of artificial neural network methods, such as classical back propagation [[BP]] algorithm, self-organizing mapping (SOM) and adaptive resonance theory (ART).

(5) Bayesian probabilistic network learning method.

(6) The generation method of association rule priors.

As one of the research hotspots abroad, data mining and knowledge discovery are not only the research hotspots of artificial intelligence scholars, but also the exploration objects of database experts. Their work covers many fields such as medicine, machine learning, artificial intelligence, mathematics and marketing. Got a lot of useful knowledge. So far, there are not many units engaged in this research in China, and it is still a brand-new subject to apply KDD and data mining technology to satellite remote sensing information processing.

3. Data mining and knowledge discovery in remote sensing images.

As a kind of database, satellite remote sensing database can naturally learn from general data mining and KDD technology to process and identify the information stored in it. As a special database & image database, it has different information content from general relational database and transaction database, and contains rich time, spectrum and spatial information. So as far as knowledge discovery in this class library is concerned, data mining should also have special processes and methods.

According to the technical flow chart of DM and (Figure 3- 1 1) and considering the particularity of satellite remote sensing data, how did the Chinese Academy of Sciences put forward the theoretical and technical framework of satellite remote sensing data mining and knowledge discovery for geological applications? In this framework, data mining plays an extremely important role. It includes phase selection, application preprocessing, feature analysis, information identification and knowledge interpretation of remote sensing data. In real life, many remote sensing users ignore the special function of this process and directly take the interpretation results of the original remote sensing images as the basis of application (although human knowledge is also added in the interpretation process), so the knowledge obtained is often superficial, superficial and inaccurate. Only when the spectral, spatial and temporal characteristics of the original data are fully considered in the process of remote sensing data mining can valuable, accurate and high-level knowledge discovery for remote sensing applications be better realized.

Figure 3- Remote Sensing Mining and Knowledge Discovery of Satellite Data +0 1