Current location - Education and Training Encyclopedia - Graduation thesis - The number of repeated words in HowNet is inconsistent with the number of words in the paper.
The number of repeated words in HowNet is inconsistent with the number of words in the paper.
Generally speaking, there may be the following reasons why the number of repeated words in HowNet is inconsistent with the number of words in the paper:

1 There are different ways to count the number of characters: HowNet's duplicate checking system counts all the characters in the paper, including spaces and punctuation marks. Word documents of papers usually only calculate plain text content, excluding spaces and punctuation marks. This may lead to a difference in the number of words.

2 Difference between format and typesetting: When processing documents, the duplicate checking system of HowNet will uniformly remove information such as format and typesetting, and only keep pure text content for duplicate checking. The Word document of the thesis may contain various formats and typesetting, such as font, font size, line spacing, paragraph spacing and so on. These factors will affect the statistics of word count.

3 Chinese-English and symbol conversion: There may be some differences between Chinese-English and symbol conversion in the duplicate checking system of HowNet. For example, in a Word document, an English word is counted as a whole, while in the HowNet duplicate checking system, English words may be divided into single letters for calculation.

4 Citations and references: References and footnotes cited in the paper are usually included in the duplicate checking system of HowNet. This part of the content may not be included in the total number of words in the word document of the paper.

It is common that the number of words in duplicate checking is different from the actual number. Regular duplicate checking systems such as HowNet, VIP, paperfree and papertime are all calculated by characters.