1, digitized
The process of converting analog signals into digital signals by computer technology.
2. Digitization of paper documents
The process of digitizing paper documents by using digital devices such as scanners or digital cameras and converting them into computer-readable digital images or digital texts stored on carriers such as magnetic tapes, disks and optical disks.
3. Digital images
An array of integers representing a physical image. Two-dimensional or higher-dimensional sampling and quantization functions generated by continuous images of the same dimension. Sample continuous functions on a matrix (or other) network and minimize the values at the sampling points.
4. Black and white binary image
Digital image with only black and white gray scale. Corresponding to the black and white text draft, line drawing and so on.
5, continuous tone still image
A static digital image composed of more than two gray levels or different shadows of different color channels. In the process of digitizing paper documents, there are usually two modes: gray scanning and color scanning.
Step 6 solve
The number of dots or pixels contained in an image per unit length is usually expressed in dots per inch (dpi).
7. Deformation
The degree of deviation between the digital image and the original file in color and geometry after digital conversion.
8, intelligibility
The ability of digital images to provide information to people or machines.
9. Image compression
Any process of removing image redundancy or image approximation, the purpose of which is to represent images in a more compact form.
Second, the basic requirements of digital paper files
1, basic principles
The basic principle of digitalization of paper archives is to make the archives information resources available accurately, conveniently and quickly, and to share the archives information resources that can be made public, so as to meet the needs of the society for archives utilization.
2. Principles for determining digital objects.
The objects to be digitized should be confirmed according to certain principles and methods, and only paper files that meet certain requirements can be digitized.
1) conforms to the principles of national laws and regulations.
The digitization of paper files must conform to the national regulations on file opening and related regulations.
2) Value principle
Archives that belong to the scope of filing and should be preserved permanently or for a long time and have high social use value can be included in the scope of digital processing.
3. Basic links
The basic links of digitalization of paper files mainly include: file arrangement, cataloging establishment, file scanning, image processing, image storage, data quality inspection, data linking, data acceptance, data backup and achievement management.
4. Process management
1) It is necessary to strengthen the security and confidentiality management mechanism of all aspects of paper archives digitization to ensure the safety of original archives and digitized archives information.
2) All aspects of the digitization of paper files should be registered in detail, sorted and summarized in time, bound into a book, and complete and standardized records should be established while completing the digitization work.
Third, file arrangement.
Before scanning, according to the archives management, the archives should be properly sorted according to the following steps, and marked as needed to ensure the digital quality of the archives.
1 directory data preparation
2 remove the bundle
3 Distinguish between scanned and non-scanned parts
4-page cutting
5 filing and registration
6 binding
Fourth, document scanning.
1, scan mode
1) Depending on the size of the file format (A4, A3, A0, etc. ), select a scanner with corresponding specifications or a professional scanner (for example, the engineering drawings can be scanned with No.0 drawing scanner). Large-format documents can be scanned by large-format digital platform, or scanned by film digital conversion equipment after microfilming, and can also be spliced with images after small-format scanning.
2) The paper is in poor condition, and files that are too thin, too soft or too thick should be scanned with a flat plate; Documents with good paper conditions can be scanned at high speed to improve work efficiency.
2. Scan color mode
1) scanning color modes generally include black and white binary, gray scale, color, etc. Black and white binary values are usually used.
2) Black-and-white pages and documents with clear handwriting and no illustrations can be scanned in black-and-white binary mode.
3) Black and white pages, but documents with poor legibility or illustrations, and documents with multi-color pages can be scanned in gray mode.
4) Files with red-headed documents, seals or black-and-white photos, color photos and color illustrations on the page can be scanned in color mode as required.
3. scan resolution
1) scan resolution parameters are selected on the premise that the scanned image is clear and complete, and the utilization effect of the image will not be affected.
2) When scanning files in black-and-white binary, grayscale and color modes, the resolution is generally recommended to be ≥ 100dpi. Under special circumstances, such as small text, dense text and poor clarity. The resolution can be improved appropriately.
3) For documents requiring OCR Chinese character recognition, scan resolution suggested choosing ≥200dpi.
Step 4 Scan for registration
Carefully fill in the handover registration form of digital conversion process of paper documents, register the number of scanned pages, and check whether the actual scanned pages of each document are consistent with the number of documents filled in the document arrangement. In case of inconsistency, the specific reasons and treatment methods shall be indicated.
Verb (abbreviation of verb) image processing
1, image data quality inspection
1) Check the skewness, clarity and distortion of the image. If it is found that it does not meet the requirements of image quality, it is necessary to reprocess the image.
2) If the scanned image file is incomplete or cannot be clearly identified due to improper operation, it should be scanned again.
3) If there is any missing scanned document, make up the scanning in time and insert the image correctly.
4) If it is found that the arrangement order of scanned images is inconsistent with the original file, it should be adjusted in time.
5) Fill in the relevant forms carefully and record the quality inspection results and handling opinions.
2. correct, correct; Rectification; Rectification; Find the length
The deflection image should be corrected to realize the basic visual perception of deflection. Pictures with incorrect direction should be rotated and restored, which is in line with reading habits.
Step 3 purify
Black spots, black lines, black boxes, black edges and other impurities. Impurities in the image page that affect the image quality should be removed. In the process of processing, we should follow the principle of showing the original appearance of the document without affecting the understandability.
4. Image mosaic
Multiple images formed by scanning large format files in different areas should be spliced and merged into a complete image to ensure the integrity of digital images of files.
5, trimming processing
The scanned image in color mode should be trimmed to remove redundant white edges, so as to effectively reduce the capacity of image files and save storage space.
Sixth, image storage.
1, storage format
1) image files scanned in black and white binary mode are usually stored in TIFF(G4) format. Files scanned in gray mode and color mode are usually stored in JPEG format. The selection of compression ratio during storage should be based on the premise of ensuring the readability of scanned images and minimizing the storage capacity.
2) Provide scanned images of network query, which can also be saved as CEB, PDF or other formats.
2. Naming of image files
1) Every file in the paper file directory database has a unique file number corresponding to it, and the scanned image file of this file is named after this file number.
2) Multi-page files can use this file number to create corresponding folders, and image files can be named in page number order.
Seven, the directory database.
1, data format selection
General data format should be selected for cataloging database construction. The selected data format should be able to exchange data directly or indirectly through XML documents.
2. Document description
According to the requirements of Archives Description Rules (DA/T 18), the archives catalogue database is established.
3, directory data quality inspection
Check the quality of the catalog database by manual proofreading or software automatic proofreading. Check whether the description items are complete and whether the description contents are standardized and accurate. If unqualified data is found, it should be revised or re-recorded.
Eight, data linking
1, summary hook
The catalog database and image database formed in the process of digital conversion of archives are confirmed to be "qualified" through quality inspection, and then loaded into the data server in time through the network for summary. Through programming or with the help of corresponding software, it is possible to automatically find relevant digital images in the catalog data, add corresponding electronic address information, and realize batch and fast hook-up.
2. Data association
Based on the paper file directory database, one or more images scanned from each paper file are stored as image files. When storing image files in the corresponding folder, it is necessary to carefully check whether the name of each image file is the same as the file number in the archive directory database, whether the number of pages of image files is the same as that of files in the archive directory database, and whether the total number of image files is the same as that of files in the archive directory database. Through the consistency and uniqueness of the file name of each image file and the file number of the file in the archive directory database, a one-to-one correspondence relationship is established, which provides conditions for batch connection between the archive directory database and image files.
3. Fill in the handover registration form of digital conversion process of paper documents carefully, record the number of pages after data association, and check whether the number of pages after each file association is consistent with the number of pages filled in during file sorting and scanning. In case of inconsistency, the specific reasons and treatment methods shall be indicated.
Nine, data acceptance
1, data sampling
1) Check the overall quality of all sampled and digitized data, including catalog database, image file and data hook.
2) For fonds, the sampling rate for data acceptance shall not be less than 5%.
2, acceptance index
1) When there is an error code link between the catalog database and the image file, or one of the catalog database and the image file has quality problems such as incompleteness, ambiguity and errors, the sampling inspection is marked as "unqualified".
2) When the qualified rate of sampling inspection on the quality of digital conversion of fonds files reaches more than 95% (including 95%), it is deemed as "passed".
Qualified rate: number of documents that passed sampling inspection/total number of documents that passed sampling inspection × 100%.
3. Acceptance review
The conclusion of "passing" acceptance must be reviewed and signed by the leader in charge before it can take effect.
4. Acceptance registration
Fill in the registration form of digital acceptance of paper files carefully.
X. data backup
1, backup range
Complete and qualified data should be backed up in time.
2. Backup method
In order to ensure data security, the choice of backup carriers should be diversified, and multiple sets of backups can be realized by combining online and offline, and attention should be paid to remote storage.
3. Data check
You should also check the backup data. The inspection of backup data mainly includes whether the backup data can be opened, whether the data information is complete and whether the number of files is accurate.
Step 4 Back up the label
After data backup, the corresponding backup media should be marked for easy searching and management.
5. Backup registration
Fill in the paper file digital backup management registration form.
XI。 Digital achievement management
1. The management of digital achievements of paper archives should be strengthened to ensure their safety, integrity and long-term availability.
2. When providing online retrieval and utilization of digital results of paper archives, the electronic identification of the production unit shall be provided, and the downloadable or non-downloadable data formats shall be adopted respectively according to the specific circumstances.