Current location - Education and Training Encyclopedia - Graduation thesis - Brief Introduction of Mid-infrared Spectroscopy (MIR)
Brief Introduction of Mid-infrared Spectroscopy (MIR)
This book comes from the summary of my master's thesis.

When FT-MIR detects a specific substance, it will generate a characteristic wave belonging to the substance according to its function keys and functional groups. The research shows that in the process of establishing a model for predicting a substance by using multiple independent variables, choosing the characteristic wave of the substance as the independent variable can not only improve the accuracy of the model prediction, but also enhance the stability of the model (Leardi et al 2002, Zou et al 20 10, Vohland et al 20 14). John and others proposed that feature selection can be divided into two categories. The first category is filtering method, which is a feature selection algorithm independent of predictive variables, and measures the importance of each independent variable separately to filter out features that are almost useless in data analysis. The second method is encapsulation, in which all the independent variables are added or deleted one by one and applied to a certain kind of algorithm, and the best combination of independent variables is found according to the model results (John et al 1994). Both have their advantages and disadvantages. The filtering method is fast, but it can't analyze the independent variables according to the results of modeling, while the encapsulation method can select variables according to the results of modeling accuracy, but its calculation amount is larger than that of the filtering method, and there is a risk of over-fitting (Saeys et al 2007). But then a third embedding method appeared, which added feature wave selection to the algorithm construction, which was similar to the encapsulation method, but could not be used in other algorithms (only the algorithm of filtering feature waves), and its advantage was that the calculation amount was reduced compared with the encapsulation method (Saeys et al 2007). Therefore, a more reasonable method for screening spectral characteristic waves is to narrow the range by filtering first, and then screen the final characteristic waves by packaging or embedding.

Qualitative discriminant analysis is to establish a discriminant model on the training set of known features and categories, and then use the discriminant model to classify and predict the new data of known features and unknown categories.

Qualitative discriminant analysis can be divided into Fisher discriminant, distance discriminant and Bayes discriminant according to discriminant criteria. Fisher discriminant is to project multidimensional data to a certain dimension, separate all kinds of people to the maximum extent, and then select appropriate discriminant rules to classify and discriminate new samples. Distance discrimination is to calculate the center of gravity of each category with known classification, and then calculate the distance from the center of gravity of each category to the data of unknown category. The nearest center of gravity belongs to that category. Bayesian discriminant is to calculate posterior probability from prior probability, and then make statistical inference on new data according to posterior probability distribution.

Quantitative analysis is a regression method that enables independent variables to accurately predict dependent variables through some algorithm. Dependent variables are generally continuous data, which are generally divided into linear, generalized linear and nonlinear. Mainly includes: partial least squares, PLS), principal component analysis-linear discriminant analysis (PCA-LDA), decision tree, DT), artificial neural network (ANN), support vector machine, SVM), K nearest neighbor, KNN), Logistic regression, LR), random forest, RF). For the theoretical part of these eight algorithms, please see the machine learning part 1 1 and 12.

Confusion matrix can be used to evaluate qualitative discriminant model, and confusion matrix and its related parameters are one of the simplest and most intuitive evaluation indexes. Taking binary classification as an example, the confusion matrix is shown in table 1-3. Based on the confusion matrix, the evaluation index of discriminant model can be calculated.

The exported parameters are:

The evaluation indexes of quantitative analysis model mainly include determining coefficient (R2) and root mean square error (RMSE). The larger the R2 value, the better the model (0≤R2≤ 1), and the smaller the RMSE value, the better the model (RMSE≥0).

The calculation formula is as follows:

In different data sets, the corresponding R2 and RMSE parameters can be calculated respectively. In the training set, in formula (2), n= number of samples-number of principal components-1, and all data modeling parameters are expressed as R2C (calibration determination coefficient) and RMsec (calibration root mean square error); When n= the number of samples in formula (2)-the number of samples remaining during cross-validation, the modeling parameters are expressed as cross R2CV (the determining coefficient of cross-calibration) and cross RMSECV (the root mean square error of cross-calibration); In the test set, when n= the number of samples in formula (2), the parameters obtained from the verification results are verification R2V (verification determination coefficient) and verification RMSEp (verification root mean square error, RMSEP).