Current location - Education and Training Encyclopedia - Graduation thesis - Criteria for repeated inspection
Criteria for repeated inspection
The criteria for repeated inspection are as follows:

Duplicate checking is usually a comparison operation between texts or articles to determine whether there is plagiarism in the articles. Specifically, duplicate checking refers to the text fingerprint or hash value generated according to a given algorithm.

By comparing whether the similarity between fingerprints or hash values of these texts reaches a certain threshold, it is judged whether there is similarity or plagiarism between articles, which belongs to the behavior of digital feature comparison.

In the modern network environment, many universities and institutions use large-scale duplicate checking software, and the principles and methods of these duplicate checking tools are similar.

Usually, the duplicate checking tool will convert the original content into a feature vector represented by numbers. This feature vector is usually related to word frequency, which is embodied in dividing the article into many paragraphs, then calculating the corresponding word frequency for each paragraph to generate the feature vector of the paragraph, and finally generating the feature vector of the whole article.

At the same time, for some duplicate checking tools, not only the feature vectors are calculated digitally, but also some features such as semantics, syntax and format are established by the way, which can describe the characteristics of articles more comprehensively and accurately and obtain more accurate results.

Generally speaking, the criteria for judging duplicate checking mainly include the following categories:

1, the repetition rate of several local paragraphs and the overall similarity of the whole article. Usually, the contrast tool will detect whether some core paragraphs or keywords in the text are reused in turn.

If the repetition rate of these targets exceeds the preset threshold, they may be identified as plagiarism. In addition, the repetition rate of the whole article can be measured by calculating the similarity between the two articles.

2. Comparison between the old and new versions. When the paper is repeated, it may be necessary to compare the historical articles of the same author or the same topic at the same time. In this way, even if old articles and new papers contain overlapping content, they will not be mistaken for plagiarism.

3. Exclude references. When duplicate checking is carried out, references and contents of papers cited in papers or other people's works should be excluded to ensure more accurate and reliable duplicate checking results.

4. Detection before and after the deadline. Writing a paper too early or too late may lead to repetition of articles with the same content in the past or in the future. Therefore, when checking the duplicate, we should choose an appropriate deadline and only detect the conditions within this date range in order to obtain more accurate detection results.

Generally speaking, the standard of duplicate checking is mainly the comparison method based on feature vector, that is, after the article is digitized, whether there is plagiarism can be judged by comparing the similarities between different texts. However, in actual operation, there may be errors in this process, so it is necessary to choose more mature and effective duplicate checking tools to improve the accuracy of duplicate checking judgment.