Digital images can be divided into three types: bitmap, vector diagram, algorithm and post-program modeling. Among them, bitmap images, also known as pixel images or raster images, are created by specifying the color of each pixel point pixel by pixel. Vector images describe the creation of colors and shapes by using object definitions and mathematical formulas. Modeling the programmed image is also called algorithmic art, which determines the color of each pixel by combining mathematics, logic, control structure and recursion.
Bitmaps can be created by drawing software such as photo studios, scanning simulated images with scanners, or shooting with digital cameras. Here, we mainly focus on bitmaps taken with digital cameras.
The process of creating bitmap by digital camera can be divided into two stages: sampling and quantization. Sampling refers to the process of extracting discrete pixels from continuous analog images. Some devices allow users to set the sampling rate when taking photos or recording videos, that is, the number of pixels collected in the horizontal and vertical dimensions during sampling. For example, in the iPhone of iOS 12 system, the sampling rate of video recording can be set in the setting interface.
Quantization is a process of specifying a color model and corresponding bit depth, and using actual values to represent pixel colors. The pictures with reasonable sampling and quantization are as follows.
Both sampling and quantization processes will introduce errors. In other words, the obtained image is not the original image we observed. If the sampling rate used in the sampling process is too low, many details will be lost in the picture. When the display size is the same as the original image, we will see many spots similar to mosaic effect, as shown below.
Similarly, if the bit depth is set too low in the quantization process, the number of colors that can be represented in the obtained picture will be very limited. Even if some similar colors show the same color, the picture will lose a lot of details, as shown below.
When we describe the size of a photo, we usually use the number of pixels in the width and height direction, but for a computer screen, the number of pixels is related to the number of specific bright spots on the screen. Therefore, in order to clearly distinguish these two different contexts, we use the term logical pixel when describing the picture size and physical pixel when describing the computer screen.
Advertisements in life usually describe that the camera of a mobile phone is as high as several million pixels, which refers to the maximum pixel size that the device can support. For example, the maximum logical pixel of an image produced by its hardware is 2048* 1536, which means that the maximum image produced by this device contains 3145,728 pixels, which is 3 million pixels. It should be noted that some camera manufacturers will use "digital zoom" to exaggerate their camera performance. This software method can increase the pixels of the picture, but it can't really improve the definition.
Resolution is defined as the number of pixels per inch of space of an image file in a certain dimension, and the unit is ppi. Say 200ppi. The printing resolution refers to the maximum number of dots that a printer can print in an inch space, and the unit is dpi, such as 1440dpi.
The size of an image is defined as the physical size of an image file printed or displayed on a computer screen. The unit is inches or centimeters.
Changing the pixel size of an image is called resampling. Increasing the pixel size of an image by zooming in is called upsampling, and reducing the pixel size by zooming out is called downsampling. The up-sampled pixel is only based on the interpolation calculation of the original pixel, and the down-sampled pixel value is the average of the existing pixel values, which can not improve the clarity of the image.
In the digital data of this series of articles, it has been introduced that data can be represented by functions and can be transformed from one domain to another without losing information. This law can also be applied to digital images, and Nyquist theorem still applies to digital images.
For the gray image on the left in the above picture, we want to define a straight line parallel to the X axis, express its gray value as Y and its pixel position as X, and then we can get the waveform image on the right. For color images with RGB channels, only three channels need to be processed in the same way. In the picture on the right, we assume that the period of the gray image on the left is repeated, so that the waveform image with repeated periods is obtained.
When processing digital images, we get discrete data, such as the left picture above. We can also get the periodic waveform by repeating the periodic signal, as shown on the right side of the above picture.
In fact, when dealing with real digital photos, the picture can be divided into three channels, RGB3, and the picture of each channel can be processed separately. Here, for the sake of simple color, we process the picture into a gray image, as shown in the left picture above. We select a row of pixels and draw its waveform in the same way as before, as shown in the upper right of the above picture. It can be seen that we can describe a row of pixels in a digital picture with a two-dimensional waveform.
It has been mentioned that a line of pixels in a digital image can be described by a two-dimensional waveform, but this is not enough. Pictures have two-dimensional characteristics, so we should also find ways to describe the changes of pixels in the X axis and Y axis. Spatial waveform can achieve this goal well. For example, the picture on the left above is a gray map. Mapping its pixel position to the X-axis and Y-axis of three-dimensional space, and mapping the actual gray value to the Z-axis, we can get a three-dimensional space waveform map, that is, the upper right picture.
For an actual digital image, we still take the image of a bird as an example here, and we only consider its gray image. For color images with RGB channels, it is only necessary to separate the three RGB color channels and do similar processing. The image on the left above is an actual digital image, and the converted spatial waveform is shown on the right. It can be seen that we can describe any digital image with spatial waveform diagram. This is very important, because only when physical models can be used to describe digital images can we facilitate further processing. Only by describing the spatial wave can it be converted into spatial frequency, which is the mathematical basis of image lossy compression.
In the first article in this series, Fourier series and its various Fourier transforms are introduced and deduced in detail. As mentioned earlier, any periodic function with compound Dirichlet condition can be decomposed into infinitely many simple sine functions. We often see cosine component in formulas, because sine function Asin(ωx+φ) can be expanded by trigonometric formula to get cosine value. Discrete cosine transform is the result of reducing sine term under some conditions. Please refer to the first article in this series, Representation and Processing of Digital Signals.
The above picture is a schematic diagram of discrete cosine transform. Again, this is just a schematic diagram. The rightmost waveform can be synthesized by the three basic waveforms on the left. For more complex waveforms, it can still be converted into a combination of multiple simple waveforms. This theme, combined with digital images, can be understood as that the waveform diagram of a single pixel can be converted into a combination of multiple basic waveforms.
The above is the formula of inverse discrete cosine transform, where f[u] is the function of the original signal in frequency domain and m is the number of the original signal. This is a one-dimensional discrete cosine transform formula. In fact, a two-dimensional discrete cosine transform formula is needed in digital image processing, which will be introduced later in this paper. We first pay attention to 1 dimensional discrete cosine transform, which is used to divide a row of pixels in a color image into a combination of multiple waveforms.
Suppose there is a digital image, and the eight pixels adjacent to a row in its gray image are as shown in the above figure.
We chose eight basic waveforms. And 8 pixels are calculated from each waveform. As shown in the above figure, the leftmost function is the triangle formula of the basic waveform, the middle graph is its waveform diagram, and the pixel row on the right is the first 8 effective pixel values calculated according to this basic waveform.
As mentioned earlier, any row of pixels can be represented by a two-dimensional waveform. These two-dimensional waveforms must satisfy Dirichlet condition, and they must be decomposed into the combination of several basic waveforms. What we need to do now is to calculate the coefficients of each basic waveform, in other words, the value of the function in frequency domain. This can be calculated by discrete cosine transform formula, and the one-dimensional discrete cosine transform formula is as follows.
In the above formula, f(r) is a spatial function. For gray digital image processing, that is, the specific gray value of a row of pixels, R is the coordinates of pixel points, and M is the total number of pixels in this row. U can be understood as the frequency value of the frequency domain function, and the bracketed part after cos is the trigonometric function of the basic waveform. It should be noted that the number of frequency values of the frequency domain function and the number of independent variables of the spatial domain function are the same. For a digital image, that is, there are several pixels in a line, several frequency components can be obtained.
Using the above formula, we can calculate the function value in frequency domain. Because this line has eight pixels, eight effective frequency domain components can be calculated, that is, the product of the value of F(u) in [0,7] and the constant in the discrete cosine transform formula is [W0 ~ W7] = [389.97, -280. 13,-1 respectively. -20.5 1,-19.80,-16.34], which is the coefficient of each basic waveform when the basic waveform is mixed with the original waveform. In other words, the sum of appetite functions of all frequency components multiplied by their coefficients is the original image.
The above figure is a schematic diagram of merging the original image with the image obtained by using the basic waveform. It should be noted that the frequency components obtained by DCT are heavy, F(0) is called DC component, and F( 1) to F(M- 1) is called AC component. This word comes from analog circuits, DC components are related to DC circuits and AC components are related to AC circuits.
Digital images have two-dimensional physical characteristics, so in fact, we use two-dimensional discrete cosine transform when processing digital images. The formula is as follows.
For gray-scale digital images, in the above formula, the spatial two-dimensional function is f(r, s), the pixel coordinates (r in abscissa and s in ordinate) describe the gray-scale value of the original image, m and n are the horizontal and vertical pixel numbers respectively, (u, v) is the spatial frequency, the product of two cos functions is the function of basic two-dimensional spatial wave, and F(u, v) is the spatial function f that describes the original image. The valid values of u and v are [0, M- 1] and [0, N- 1] respectively, and each value of the function F(u, v) is a coefficient of the corresponding two-dimensional spatial waveform.
It should be noted that in the field of image processing, discrete cosine transform is usually limited to 8×8 pixel sub-blocks, also known as macroblocks. This can greatly reduce the computational complexity and improve the efficiency of image processing. This is the most important step in JPEG image compression and MPEG video compression. Of course, different coding standards have different macro block sizes, but they are generally chosen between 8 times 8 and 16 times 16, which will be discussed in detail in the following article.
The above picture is an 8× 8 pixel macro block. We can get the following color matrix table by counting the gray values.
Then we use the two-dimensional discrete cosine transform just introduced to calculate the coefficients of each spatial frequency component, that is, the value of function F(u, v), and get the following frequency component amplitude matrix. Similar to the formula of one-dimensional discrete cosine transform, f (0,0) is called DC component, and the rest values are called DC component.
Just like a row of pixels can be decomposed into several basic functions by one-dimensional discrete cosine transform, and then restored by these functions. A two-dimensional digital image can also be decomposed into several basic functions by two-dimensional discrete cosine transform, and then synthesized by these basic functions. The difference is that the basis function of one-dimensional discrete cosine transform decomposition can be expressed by two-dimensional waveform, while the basis function of two-dimensional discrete cosine transform decomposition should be expressed by three-dimensional waveform. The formula of inverse transform of two-dimensional discrete cosine is as follows.
For the 8× 8 pixel macroblock in the above example, the formula of two-dimensional inverse discrete cosine transform needs to be composed of 8× 8 and 64 basis functions, each basis function can be represented as a two-dimensional waveform, and each wave can be represented as a simple two-dimensional image, as shown in the following figure.
Each image is the result of multiplying the values of the last two cosine components of the two-dimensional inverse discrete cosine transform formula at the discrete point P(r, s), where the value space of r and s is [0,7].
Any 8× 8 two-dimensional image can be synthesized by the above basis functions, and the coefficients of each basis function, that is, the weights, can be calculated by two-dimensional discrete cosine transform. The value of the F(u, v) part excluding the constant is also called the frequency component amplitude matrix. For color images using RGB color model, only three channels need to be processed separately, that is, three DCT processes.
Multiply the product of the frequency component amplitude matrix of the above 8× 8 pixel macro block example and the constant in the two-dimensional inverse discrete cosine transform formula by its corresponding fundamental frequency function, and finally sum up to synthesize the original two-dimensional image.
Let's look at a more practical example, or the previous bird map. We chose a macro block with 8x8 pixels, as shown below.
We use the z axis to represent the color value of each pixel, and the xy axis to represent the index value of the pixel in the horizontal and vertical directions respectively, and draw the following spatial pixel histogram.
We use two-dimensional discrete cosine transform to calculate the coefficients of each frequency component, that is, the value of frequency domain function F(u, v). We use the z axis to represent the value of F(u, v), and the xy axis represents the frequency in the horizontal and vertical directions respectively. Finally, we get the following frequency component amplitude histogram.
We can see that in the frequency component amplitude histogram above, the DC component is the largest, and there are some smaller AC components. Taking the product of them and the constant term in the two-dimensional discrete cosine transform formula as the coefficient of the fundamental frequency function, the original image can be restored.
It should be noted here that in the frequency component amplitude histogram, except for several frequency components from the coordinate origin, the values of other AC components are almost zero. In other words, we can completely discard the frequency components in the lower right corner and still restore the original image, because the spatial frequency in the lower right corner is extremely high, which has exceeded the resolution accuracy of the human eye from a psychological point of view. This processing method is also the theoretical basis of JPEG compression and intra-frame compression in MPEG.
The application of Nyquist frequency in digital images can be understood as that when the horizontal and vertical sampling rates are lower than twice the highest frequency sensitive to human eyes, the resulting images will be distorted. When the sample is enlarged to the original size, there will be obvious sawtooth and block effect.
We use the appropriate sampling rate to arrange the photos as follows.
And when we use a low sampling rate, we will take the following picture. This phenomenon is also called over-sparse sampling.
Moire fringe refers to the high-frequency interference fringe obtained when the sampling frequency is not enough but close to the details of the original picture, which is a kind of high-frequency irregular fringe that will make the picture appear color. In addition, when the sampling direction has a certain angle with the texture direction of the original image, moire fringes will also be obtained.
For example, in the above picture, the texture of the left image forms a certain angle with the sampling direction, and the sampling frequency is close to the original picture. We assume that the black part in the sample is more than half of its area, and the sampling result is black, and vice versa. In this way, the last image we get is the correct image, and we can see that the image has been obviously distorted, which is the interesting moire fringe.
When we know what moire is, we can recall the flickering effect when we see transparent curtains folded and the swirling effect when we see woven chairs through the screen. These are common moire patterns in life.
Similarly, we consider the actual images in life, and a backpack made of high-frequency texture is as follows.
For this kind of scene, when our sampling rate is out of sync with the original image frequency, moire fringes are likely to appear, as shown in the following figure.
The actual original picture is as follows.
When taking pictures with a digital camera, we can solve the moire fringe by tilting the camera angle, changing the focal length or changing the lens. This will change the sampling direction and generalized spatial frequency of the original image.
Traditional cameras use silver-plated film for imaging. The thin film is divided into three layers, which are sensitive to red, green and blue light respectively.
Digital cameras use charge coupled device (CCD) technology to sense light and color, and complementary metal oxide semiconductor (CMOS) is another new photosensitive technology.
CCD consists of a two-dimensional array of image points, and each image point corresponds to a sample (a pixel on a digital image), and each image point is covered by red, green and blue filters.
There are four ways to realize CCD. The first one divides the incident light into three beams, each photosensitive point has three sensors, and each sensor can only perceive red, green and blue respectively. The advantage of this is that each pixel can directly get the original three-color value, but the disadvantage is that it is expensive, which leads to the camera being too large.
The second one will rotate the sensor when taking pictures, so that it can continuously perceive red, green and blue light. The disadvantage of this method is that it can't perceive three colors at the same time and can only take static pictures.
The third way, such as Foveon X3, uses silicon sensor and vertical superposition technology. Silicon with different depths absorbs light with different wavelengths, so three colors can be perceived at the same time on one image point.
The fourth and most common way is to use Bayer color filter. Each pixel only perceives one color, and the values of the other two components are calculated by interpolation. Although this method is cheap, it sometimes leads to color distortion.
The picture above is a schematic diagram of Bayer color filter. Bayer color filter has twice as many green photosensitive pixels as the other two, because human eyes are more sensitive to green light.
After getting the original data, we need to calculate the color of pixels by interpolation, which is the so-called demosaicing algorithm. One of the simplest methods is the nearest neighbor method. When calculating the values of R and B components of each image point where the G component is collected, it is only necessary to take the average value of two adjacent image points. When calculating the values of the other two components of an image point with R and B components, it is necessary to take the average value of four adjacent image points. The adjacent color points in the calculation area are shown in the following figure.
Interpolation algorithm can not reconstruct the scene well, so there will be some distortions in this process, such as moire, blocks and spots.
As shown in the left picture above, suppose a white line is photographed, just passing through the CCD sensor, and both sides of the line are black. Then the light intensity felt by other image points is zero. As shown in the figure on the right, for each pixel that the white line passes through, the value obtained from the adjacent pixel is always 0, so the color of the pixel that the white line passes through cannot be calculated from its adjacent pixel, and finally we will get a distorted image.
Some cameras use anti-aliasing or anti-aliasing mirrors on the lens, which can effectively blur the picture and reduce color distortion. As an optional function, the camera manufacturer can turn on the anti-aliasing filter in the settings. But usually high-quality cameras do not provide this function.
When drawing diagonal lines on a computer screen, sometimes you will see jagged edges. This distortion is caused by the limited resolution of the computer. In geometry, lines are made up of infinite points, while lines on computer screens are made up of discrete pixels.
In the above picture, the image on the left is an actual 2-pixel straight line. Suppose that when the area covered inside each sample is more than half, we paint the sample black, then we will get a straight line with sawtooth as shown on the right.
Anti-aliasing is a technology to reduce the degree of straight line sawtooth or edge distortion. One feasible method is to draw pixels along the edge of the line with the specified color. The color is directly proportional to the coverage rate and related to the color of the nearest point on the line.
As shown above, when the bitmap is enlarged, the pixel value is increased by up-sampling, and the color of its pixel is obtained by interpolation calculation, which will highlight the jagged edge.
As shown above, the vector diagram is drawn in real time, so its distortion is not so serious compared with the bitmap when it is enlarged.
Chromatics is an independent and interesting subject, and only the most important basic knowledge is introduced here. For more details, please refer to the following two reference books.
Color is both a physical phenomenon (electromagnetic wave) and a psychological phenomenon (when these electromagnetic waves fall into the color receptors of human eyes, the human brain controls the interaction between electromagnetic waves and eyes in an ambiguous way, which is color perception). The colors we feel in nature are all combinations of different wavelengths.
Modern anatomical research shows that human eyes have three kinds of cones, namely L, M and S cones, which are sensitive to long-wave, medium-wave and short-wave battery waves respectively, as shown in the following figure.
However, the reflectivity of substances in nature to electromagnetic waves with different wavelengths is different. For example, spinach leaves mainly reflect electromagnetic waves with a wavelength of about 550nm. The reflected electromagnetic wave is felt by cone cells again, and finally the concept of color director is produced in the brain.
Newton first made a systematic study of color. He found that colors can be obtained by mixing colors, and put forward the concept of color wheel in his paper, also known as Newton color wheel, as shown below.
The three elements of color are hue (hue), saturation (purity) and lightness. Hue (also called hue) can be understood as the main wavelength of color, and the periphery of the circle is the monochrome of all unit energy obtained by white light decomposition. The starting point of hue is red, increasing towards yellow.
Saturation (also known as purity) can be understood as the amount of white light mixed into a single color. If the white light is mixed enough, the lower the saturation, the closer it is to gray.
Brightness is a subjective concept, which is related to the observer's perception and represents the intensity of colored light emitted by human eyes. Usually, the monochromatic brightness value of unit energy is defined as 1, and the brightness decreases with the gradual decrease of its light intensity. When the light intensity becomes 0, the brightness value is also 0.
It should be noted that the words bright, light and light are easily confused, but they all have different meanings.
Among them, brightness refers to the energy perception value of light radiation by human eyes, and the unit is nits, that is, Candeira per square meter. Usually, we mean relative brightness, that is, its Y/Yr, and Yr refers to the brightness of reference white light.
Brightness refers to the amount of light that our vision feels. This is a very subjective concept, and it has no mathematical definition of the world. Brightness has a clear mathematical definition, which is related to the wavelength and energy of light and the sense of brightness felt by human eyes. Interestingly, when light with different wavelengths and the same power is used, the human eye will think that the wavelength of 550nm looks the brightest.
Brightness refers to the relative brightness of color and white under the same lighting environment.
The Munsell color system in the figure below is an earlier method to describe colors systematically. It describes a color through three dimensions: brightness (value), hue (hue) and saturation (chroma).
The scientific method to express color is spectral density function, but it is usually not used in computer systems. Because multiple spectral density functions can represent the same color perceived by human eyes.
The above figure describes a simplified spectral density function, in which the hue is determined by the main wavelength, the brightness is determined by the image area surrounded by the function, and the saturation is determined by the ratio of the peak part to the total area. The calculation formula is as follows.
There are many common ways to express colors, which are called color space. It mainly includes five color model series: RGB, CMKY, cylindrical transformation, standard CIE, brightness+chromaticity.
Rgb is a constant based on wavelength, and RGB is the coefficient of each constant component, also called color channel. It should be noted that the RGB color model does not define the wavelengths of the three kinds of light used.
As shown above, all the colors in the RGB color model are the result of mixing three primary colors, and the complementary colors of red, green and blue are cyan, magenta and yellow respectively. The value range of rgb in mathematical model is 0 ~ 1, and it is usually 0 ~ 255 in image processing program. Psychological research shows that the human eye is most sensitive to green. Using the sensitivity to three colors, the formula for calculating gray scale or brightness from RGB color model is as follows.
CMY is a subtractive color model, which represents the proportion of red, green and blue components deducted from white light respectively. The conversion formula between color model and RGB color model is as follows.
Because the result of CMY mixing is not pure black, K component is introduced into CMYK color model to represent pure black. The conversion formula is as follows.
According to the hue (basic color), saturation, lightness or brightness of the color itself, the color is expressed. Hue-Saturation-Brightness (HSV) color model is also called Hue-Saturation-Brightness (HSB) color model, as shown in the following figure.
Hue-brightness-saturation (HLS) color model.
CIE conducted a color mixing experiment, using standard red, green and blue lights with wavelengths of 700nm, 546. 1nm and 435.8nm as three monochromatic lights. Participants were asked to control the content of these three kinds of light with unit energy until they thought it was the same color as monochromatic light with unit energy in the visible spectrum. Based on this, the color matching function is drawn as follows.
Experiments show that if RGB tricolor light is used to mix all pure light with a single wavelength in nature, sometimes it is necessary to subtract a part of red light from the green-blue mixed light, that is, add red light to the pure light.
In addition, no computer monitor can combine all visible light through its own red, green and blue light. The range of colors that a given display can display is called color gamut. Different models of displays with the same color model may have different color gamut, and different displays with the same color model must have different color gamut.
According to this experimental result, the CIE XYZ color model was put forward by the International Committee on Luminescence and Lighting at 193 1, which was expressed by the formula in the above figure. Suppose there are theoretically three kinds of visible light with a single wavelength, and the energy functions of the three components are all positive. And after deliberate selection, the mathematical model of the coefficient of y in the above formula has the same shape as the luminous efficiency function, that is to say, y can be understood as brightness.
The RGB color model and CIE-XYZ color model can be converted by the following formula.
In order to represent the overlapping areas of various color models more conveniently, it is necessary to draw a color gamut map on a two-dimensional plane. Firstly, the energy functions of three components of XYZ are standardized, and the calculation formula is as follows.
The parameter function of CIE-XYZ color model is expressed as follows.