Strictly speaking, it is the repetition rate of the article. As long as the words in the paper and the HowNet data are duplicated, the system will judge that they are similar and count them as "Total Text Copy Rate", as shown in the figure below, which is 25.6%. That is, including the similarity of the quoted content.
There is also an indicator, "the replication rate of the cited documents is 16.9%", that is, the total text replication rate MINUS the actual similarity rate after citation. What is the actual citation rate? That is 25.6%- 16.9%=8.7%. In other words, the content similarity ratio quoted by the author in quotation marks accounts for 8.7%.
The duplicate checking system of extended information knowledge network adopts the latest semantic level detection technology, and there is no longer the concept of "how many consecutive words overlap is plagiarism". At present, the saying that "13 consecutive words overlap and will be marked in red" is still popular on the Internet, but it is actually out of date. When the system identifies repeated and quoted content, it will judge the content that reaches a certain semantic level in combination with the content of the context, not just based on one or two words, words or single sentences.
The coincidence of complete content is synthesized by the system according to the algorithm, which can automatically detect and identify the original plagiarism, rewriting and sentence order adjustment of document content, and can quickly locate, dynamically mark and display.