Checking duplicate is almost the only way to pass college graduation thesis. At present, there are many brands of duplicate checking system in the market, and many students don't know how to choose and how to check duplicates better.
First of all, we need to understand the principle of duplicate checking software.
Duplicate checking software detects the repetition rate of your paper according to different algorithms, including citation algorithm, fuzzy algorithm, context model and so on. All these algorithms need to be superimposed and combined to get accurate results. Next, let's learn more about the principle of duplicate checking.
First, reference algorithm.
When the paper is repeated, the cited literature data will also be included in the repetition. For example, the threshold for double-checking system settings is 5%. If we check the contents of 1000 words, if the system compares the original text with the contents of the database, and there is suspected plagiarism within 50 words, then the system will not detect it and will not judge it as duplication; If there are more than 50 words suspected of plagiarism, then these contents will be found to be plagiarized, and the system will mark the contents of the paper and judge them as duplicates.
In addition, the duplicate checking of the system needs the format of reference documents. Only when the format of the reference is correct can the system correctly identify the reference, and the reference will not participate in the detection, otherwise the reference will be judged as duplicate, and the duplication will be more serious.
Second, the breakdown data comparison
Each duplicate checking system will collect a considerable number of documents in the database, and will also compare them with internet data when checking duplicates. We submit a complete article for upload detection, and the system will automatically divide your submitted content into different parts, and then compare it with the system database.
Plagiarism detected by the system will be marked in detail, so if the content of the same paper is different twice, the repeated content will be different. In other words, students will encounter the situation that the second duplicate check will be marked with a new red color after the first duplicate check.
Third, the situational model.
This model thinks that each part of an article is an independent chunk, just like the environment it says, so it can effectively detect citation, repetition and incoherence in the paper.
There are two main ways to put a model in context:
1. Based on Chinese word segmentation technology: By counting the similarity of a sentence in different contexts, we can judge whether there is duplicate content in the sentence.
2. Based on word segmentation technology, the relationship and similarity between different chunks of a sentence are statistically judged by using the contextual information of the language.
Fourthly, fuzzy query.
This means that a word can be queried in some vague ways in different contexts. This means that repetition is not considered as a continuous and identical word, but it will be judged as repetition as long as the similarity of the compared contents reaches a certain degree. This will make us find that many articles with different contents and sources are also repetitive, which is normal.
These duplicate checking algorithms look very complicated and tedious, but they can achieve very high accuracy. Although the meaning of a paragraph may be quite different from that of the source document, this situation has greatly reduced the problem rate with the continuous improvement and optimization of artificial intelligence technology.
Five, the system calculates the repetition rate
Through the above algorithm, after accurate calculation and comparison, the system will calculate the total number of repeated words in each part and full text. Repetition rate = repeated words/total words * 100%. The total repetition rate of each part can be obtained.
In this way, we can intuitively see the repetition content and repetition rate of the article. If the repetition rate is high, it means that the article needs to be revised seriously. After modification, the repetition rate can be determined again. If the repetition rate is low, you can give it to the instructor. Finally, no problem, you can submit it to the school's final draft system for repeated inspection.
Although the principle of the duplicate checking system in this paper is basically the same, there are still differences between different systems in terms of specific thresholds and algorithms. China Knowledge Network is the most widely used and authoritative duplicate checking system in China. After the paper is finalized, students try to check the knowledge network system to make sure there are no problems, and then submit it to the school knowledge network for inspection to ensure that they pass the school examination smoothly.