Data collection, data processing and publishing. Data collection is the first step in making digital collections. The collected data can come from various places, including the Internet, books, newspapers, papers, archives and so on. It is necessary to manage and arrange the collected data in a unified way in order to facilitate the subsequent data processing.
Data processing is the second step in making digital collections. The processing process mainly includes two parts: data cleaning and data labeling. Data cleaning is a process of filtering, removing duplicates and uniformly formatting the collected data, with the purpose of reducing redundant information and error information in the data and making the data more unified and accurate.
Data annotation is to annotate data by manual or machine learning on the basis of data cleaning, which makes the data easier to understand and use.