Current location - Education and Training Encyclopedia - Graduation thesis - What is the principle of duplicate checking in HowNet? Explain the principle in detail.
What is the principle of duplicate checking in HowNet? Explain the principle in detail.
The principle of duplicate checking in HowNet mainly includes two parts: text comparison and chapter detection.

Text comparison: HowNet duplicate checking system will compare the text in the paper with the literature in the database for similarity detection. In this process, the system will identify the text content in the paper and then compare it with the literature in the database. If the similarity is found to exceed a certain threshold, then this part of the content will be judged as duplicate.

Chapter detection: HowNet duplicate checking system will also carry out chapter detection according to the chapters of the paper. The content of each chapter will be compared separately and the repetition rate of each chapter will be calculated. Finally, the repetition rate of these chapters is weighted average to get the final repetition rate of the whole paper.

In addition, HowNet duplicate checking system will also consider the structure and semantics of sentences and paragraphs when detecting text similarity. If a sentence quotes a large number of other documents, even if the quotations are properly marked, it will be judged as repetition.

It should be noted that the duplicate checking system of HowNet is not completely accurate in detecting text similarity. For example, you can't recognize non-text contents such as pictures, tables and formulas, and you can't recognize the contents in references. In addition, due to the differences in citation methods and language expressions of different documents, misjudgments sometimes occur. Therefore, when using the HowNet duplicate checking system, we need to judge and deal with it according to our own paper content and actual situation.