The database of paper duplicate checking system usually includes self-built database and external database. Self-built database refers to the thesis database collected by the institution where the system is located, including students' graduation thesis and scientific research project thesis. As a part of the existing database, these papers are used to compare whether there are similar or plagiarized contents in the papers to be detected. External database refers to the paper database obtained from other institutions or the Internet. Compared with the self-built database, the accuracy and comprehensiveness of the duplicate checking system are further improved.
In the process of duplicate checking, the test paper to be detected will be compared with the test paper in the database, and the system will analyze the text content of these test papers, including words, sentences, paragraphs and so on. , and give the corresponding similarity score. If the similarity between the paper to be detected and some papers in the database exceeds the set threshold, the system will mark the paper as suspected plagiarism.
In order to improve the accuracy and efficiency of the paper duplicate checking system, researchers have been constantly improving the algorithm and model of the system. Some improved methods include text similarity algorithm, semantic analysis algorithm and machine learning algorithm. These methods can help the system to judge the similarity between papers and accurately identify plagiarism.
In a word, the database of paper duplication checking system is an important tool to find and prevent academic misconduct. By establishing self-built libraries and using external libraries, the system can effectively help schools and institutions to monitor academic misconduct and improve academic level and academic integrity. With the progress of technology and the improvement of algorithm, it is believed that the paper duplicate checking system will be more widely used and developed in the future.