The duplicate checking system of HowNet was developed by Tsinghua Tongfang Company, and now it is an authoritative testing system in China. It has the largest and most extensive database, so the test results will be relatively strict. Not only that, HowNet is used by many universities and periodical agencies.
Principle and rules of duplicate checking in HowNet: After the paper is uploaded, the system will generate a table of contents according to the uploaded text, automatically detect the chapter information of the paper, and use the "chapter" to carry out subsection detection. The cover, abstract, research purpose and the first chapter will be checked separately, and each paragraph will have a repetition rate.
If the continuous words of 13 are similar and will be marked in red, then the repetition of this paragraph is serious, and the system will automatically calculate the repetition rate of this part. After the duplicate checking is completed, the duplicate checking system will automatically mark the paper indexes such as repetition rate, citation rate and total words in the duplicate checking report. This whole process is the principle of paper duplicate checking.
Characteristics of duplicate checking system
1, fuzzy detection: When the paper duplicate checking system of HowNet finds that a sentence in your paper is suspected of plagiarism, the system will automatically make fuzzy recognition before and after this sentence. This algorithm is very strict and can be detected by adding some adverbs. Only by modifying the repeated content or modifying a lot of the repeated parts will it not be detected at this time.
2. Sensitive threshold: HowNet system sets the sensitive threshold to 5% and uses paragraph calculation. For example, if a large paragraph of 5000 words is detected and a document cited is less than 250 words, it will not be judged as duplication. Therefore, when students reduce the weight of follow-up papers, it is best not to repeatedly quote a document, but to apply several documents. The content of each article only chooses accumulation, so it can't be detected.
3. Influence of format: In fact, the format of the paper may also have an influence on the repetition rate. If the paper uploads PDF text, the system needs to convert PDF into Word before testing. However, this conversion link may confuse the format of the table of contents and references in the text, and then the system may judge these two parts as text to participate in the test, which will lead to an increase in repetition rate.
3. Table of Contents Impact: After the article is uploaded to the paper system, the system will automatically monitor the chapter information of the article according to the table of contents generated by the article, and then divide the paper into corresponding chapters to complete the detection, which can display the repetition rate of each big chapter, and the table of contents will not be included in the text for detection.