Current location - Education and Training Encyclopedia - Graduation thesis - How to compress characters and numbers
How to compress characters and numbers
Multimedia data compression method The compression of classified data is actually an encoding process, that is, the original data is encoded and compressed. Decompression of data is the reverse process of data compression, that is, the compressed code is restored to the original data. Therefore, data compression method is also called coding method. At present, data compression technology is becoming more and more annoying, and coding methods suitable for various applications are constantly emerging. According to different redundancy types of multimedia data, there are different compression methods. According to whether the compression method produces distortion, it is classified according to whether the decoded data is completely consistent with the original data. Compression methods can be divided into two categories: distorted coding and undistorted coding. Distortion compression will compress entropy and reduce the amount of information, and the lost information cannot be recovered, so this compression method is irreversible. Lossless compression can eliminate or reduce redundancy in data, but these redundant values can be reinserted into data, so redundant compression is a reversible process. Distortionless compression does not produce distortion. From the information point of view, lossless coding generally refers to compression technology that does not consider the nature of compressed information and compression technology. It is a technology based on average information, which regards all data as a bit sequence, rather than optimizing compression according to the type of compressed information. In other words, the average information coding ignores the compressed information content. In multimedia technology, it is generally used to compress text and data, which can ensure 100% recovery of the original data. However, the compression ratio of this method is relatively low. For example, the compression ratio of LZW coding, run-length coding and huffman encoding is generally between 2: 1 and 5: 1. According to the principle of compression method, it is classified according to the principle of coding, including coding, transformation coding, statistical coding, analysis-synthesis coding, mixed coding and other coding methods. Among them, statistical coding is lossless coding, and other coding methods are basically distortion coding. Predictive coding is a compression method for spatial redundancy. The basic idea is to predict the data values of adjacent pixels by using the data values of coding points. The forecast is based on a certain model. If the model selection is good enough, only the initial pixels and model parameters need to be stored and transmitted to represent all the data. According to different models, predictive coding can be divided into linear prediction, intra prediction and inter prediction. Transform coding is also a spatial redundancy and temporal redundancy compression method. The basic idea is to transform the image intensity matrix (time domain signal) into the system space (frequency domain), and then encode and compress the system. Signals with strong spatial correlation are reflected in the frequency domain that the energy in some specific areas is often concentrated together, or that the release of coefficient matrix has a certain law. These rules can be used to allocate the number of quantized bits in frequency domain, so as to achieve the purpose of compression. Because the mapping from time domain to frequency domain is always carried out by some kind of transformation, it is called transformation coding. Because the transformation matrix of orthogonal transformation is reversible and the inverse matrix is equal to the transformation matrix, the decoding operation is convenient and the solution is guaranteed, so the transformation coding always adopts orthogonal transformation. Statistical coding belongs to lossless coding. It is compression coding according to the distribution of information occurrence probability. When encoding, the probability of a certain bit or byte pattern is high, which is represented by a shorter codeword; The probability of occurrence is small, which is represented by a long code word. This can ensure the shortest total average code length. The most commonly used statistical coding method is huffman encoding method. Analysis-synthesis coding is essentially to decompose the original data into a series of parameters that are more suitable for expressing "primitives", or to extract some more essential parameters from them, and coding is only carried out on these basic units or characteristic parameters. When decoding, these primitives or parameters are "synthesized" into the approximate value of the original data with the help of certain rules or models and according to certain algorithms. This coding method can obtain a very high data compression ratio. Hybrid coding combines more than two coding methods, and these coding methods must be compressed for different redundancy, so as to improve the overall compression performance.