The human visual system is the ultimate receiver of all decompressed video signals, so after considering whether people can detect the video, more noise can be allowed in the video without affecting its subjective quality. The traditional fidelity criteria MSE and PSNR can not accurately reflect the subjective quality of content. Due to the sensitivity and masking characteristics of human beings to pixel space and time, people can't perceive the changes below JND threshold. Obviously, undetectable changes do not need to be encoded again to waste costs. A large number of literatures have proposed two types of JND models: image domain JND and transform domain JND. The following introduces the JND model of image domain and its application. JND model in image domain can be applied to low bit rate and slow motion, such as video phone, shoulder and head motion.
Previous studies show that JND in image domain can be regarded as a complex subband (transform domain) effect, and brightness adaptation and texture masking should be the main factors to be considered in JND model. In the previous study of JND in the image domain, only the Y (brightness) component was considered.
In this paper, a nonlinear additive masking model (NAMM) is proposed to better match the human visual system. In this model, we add brightness adaptation and texture masking to reduce their overlapping effects, which are similar to the significant effects caused by different stimuli in the "recent" research. In our proposed PVS (perceptual video coding scheme), the JND value calculated by NAMM model will be used for motion estimation and determine which residual data will enter the next coding step after motion compensation.
The basic idea is:
1. When a pixel in motion estimation is lower than the corresponding JND threshold, it will be excluded from the calculation of SAD (this change is imperceptible to human beings in any case), thus improving the subjective quality and reducing the cost of motion estimation. (ps: that is to say, . )
2. Similarly, after motion compensation, there is no need to transform the residual below the JND threshold, which can save some space for better DCT coding of the residual above the JND threshold at a given bit rate. However, the DCT coefficient with large residual plays a more critical role in the objective fidelity index, so the PVC scheme in this paper can indirectly improve the objective evaluation index, such as improving the subjective quality. MSE & ampPSNR。
Using the intensity (size) of the pixel at (x, y) of the T frame image in the color channel (color component),
.
The goal of this section is to determine (the corresponding JND threshold), and the spatial part is the first parameter related to visual information to be considered in the frame, (hereinafter referred to as) (then why should it be calculated? What a headache! )
There are two main factors that affect JND in the image domain: background brightness adaptation and texture masking.
1. Background brightness adaptation: Compared with absolute brightness value, human visual system is more sensitive to brightness contrast. The following table shows a rough curve between visual threshold and background brightness of digital images (data from experiments):
2. Texture mask:
The decrease of change visibility is caused by the increase of texture inhomogeneity in adjacent areas, so texture areas can hide more distortion than smooth areas.
The above two kinds of masking exist in most images, so how to synthesize these two factors to obtain accurate JND configuration is an important problem. There are also some shortcomings in previous studies: the mixed masking effect in spatial domain is simplified to the maximum of "background brightness adaptation" and "texture masking"; JND threshold only considers the brightness factor in the image; The calculation of edge region is no different from that of non-edge region.
We think: First, the combined effect of multiple masking effects should be to combine independent factors through addition (of course, not linear addition), because the simultaneous existence of multiple masking factors will make the target (such as the coding loss of decoded pictures) more difficult to notice than a single masking factor. Two. JND threshold in chroma channel can also improve compression performance. Three. Distinguishing the edges of smooth regions and texture regions helps to avoid the masking effect of over-prediction of edge regions.
It is the visual threshold of the first masking factor, the gain attenuation coefficient of the aliasing of the two masking factors, and a nonlinear function to evaluate the aliasing effect of the two masking factors. The space JND can be calculated by the nonlinear model of (1):
And are two main masking factor values in the spatial domain: brightness adaptation &; Texture masking is used to measure the aliasing effect of T (texture masking) and L (brightness adaptation). The bigger, the more overlap. , l and t overlap to the maximum (that is, overlap the most); When there is no aliasing. In reality, the average is between 0- 1. The aliasing factor of and is greater than sum, so it comes from the Y color channel, and the same is true. Of course, it will also be different due to observation conditions, such as lighting, display equipment, sight distance and so on.
The experimental environment of this paper is in a room illuminated by fluorescent lamps (which is a typical condition for people to watch digital images). And equipped with A21'eizo T965 professional color display with resolution of 1600 1200. The viewing distance is about six times the height of the image. Calculated attenuation coefficient:,,.
Then, Formula (2) is an optimization of the model proposed by some previous papers, and the JND value of the L factor can be determined according to Figure 1:
Is the average background brightness of (x, y) (? What the hell, the average background brightness of a point? ), the calculation of t factor will be introduced below.
In order to obtain more accurate JND estimation, we must distinguish between edge and non-edge texture masking effects. Because the edge part is directly related to the image content of important vision such as object boundary, surface crease, reflection transformation and so on. Moreover, the edge is easier to attract people's attention. If there is distortion here, the observer can easily notice [9, 1 1]. A large number of documents have proved that edge perception is very important for primates. Therefore, this paper suggests to consider the edge information:
(6)
The maximum weighted average gradient near (x, y) is the control parameter in the color channel, the weight related to the edge information at (x, y), and its corresponding matrix.
Because HVS is more sensitive to the change of Y channel, in the environment mentioned above,,,.
(7)
? (8)
The detailed theory of image gradual change can refer to the relevant information of image processing. Here are high-pass filters in different directions, as shown in the figure below. In short, it is to calculate the gradient in the k-th direction.
The calculation passes through edge detection and then through a low-pass filter.
(9)
It is an edge map [4] obtained by using Canny detection to set the threshold to 0.5, which is a reduced version of Yes, and 0. 1 and 1 are used to correspond to edge and non-edge pixels respectively.
It is a Gaussian low-pass filter with a standard deviation of, and its function is smoothing to prevent excessive change in a small range. It needs to be greater than 0.5 to have smoothing function. According to [9], it is set to 0.9, and the core size for this should be 7.
Test it by comparing the changes of JND model.
( 10)
Through the calculation of formula (2), the values are randomly selected from+1 and-1, so that the fixed artifact pattern (? When the machine is turned over, it should all be+1(- 1), which will affect the image quality).
If the JND value in the spatial domain is very close to the JND boundary of HVS, it should go to the maximum possible value, and accordingly, the visual distortion constructed by (10) should be minimized. In extreme cases, if you add random noise (10), it will become (1 1):
The random value from 0.0- 1.0 is the amplitude control factor.
For a "lena" diagram of 5 12*5 12, take a part as shown in Figure 3.a, and insert the NAMM model noise of formula (10) and the random noise of formula (1 1) respectively, and the results are shown in Figure 3.
It can be seen that NAMM model can make images with the same visual effect tolerate more information redundancy.
The JND in time effect can be merged into the space JND with varying amplitude [7]. Generally, large sports events lead to large time masking, which roughly conforms to the following curve (Figure 6)[7]:
The total JND can be expressed as:
? ( 12)
Where represents the average inter-frame brightness difference between t frame and t- 1 frame.
Where is the average pixel value, which is a function constructed by the model shown below.
The following figure shows the hybrid video coding scheme (image domain and transform domain) using JND model.
Peak signal to perceived noise ratio:
Among them,
Wherein, it represents the reconstructed pixel value of the (x, y) point in the color channel in the T frame. What time? =0, the calculation of PSPNR becomes the traditional calculation of PSNR.
Vector prediction is the most complicated part of hybrid video coding. Traditional vector prediction is often to find a matching block of 0 or SAD (sum of abstract differences) for a brightness block (macroblock). But from the visual point of view, SAD depends not only on its brightness amplitude but also on local JND, so this paper proposes SAPD(sum of abstract perceptual? Difference), the motion vector (p, q) of the (k, l) th block (macroblock) of a T frame can be defined as:
In ...
Represents the s-th luminance pixel value (? ) in the current block (macroblock) and the previously reconstructed block (macroblock) to be matched. ), s= 1, 2, ..., corresponding to the scanning order of raster pixels of a B-dimensional block (macroblock) (the block size is different under different coding methods), and r is the possible maximum displacement of the motion vector.
Represents the brightness JND value of the s-th pixel of the current block (macro block). When it is 0, SAPD is calculated the same as SAD.
According to SAPD criterion, the motion vector can be determined by matching the current block (macroblock) with the block (macroblock) of the previous reconstructed frame.
After (18) processing, the algorithm has better PSPNR performance and faster motion prediction speed.
The complexity of motion prediction lies in:
1. Number of candidate motion vectors in the search window (i.e. search points)
2. Match the calculation amount when calculating each candidate motion vector.
Therefore, the speed of motion prediction can be improved from two aspects: reducing the number of search points (fast search) or reducing the calculation cost of matching each candidate motion vector. The concept of SAPD can be applied to full search motion estimation and any fast search and fast matching algorithm.
For the fast search algorithm, SAPD does not consider the objective distortion lower than JND configuration, so it improves the probability of 0, and also avoids the deep search when the block (macro block) changes below the detectable level, which is worthless. Once the SAPD with a value of 0 or a sufficiently small SAPD value appears, the motion estimation of the current block (macroblock) can be ended.
JND residual adaptive filter
Aft motion compensation, that remaining image
Where are the intensity component and the color component of the pixel after motion compensation. When it is smaller than JND configuration, it is imperceptible to human eyes, so it is not coded, which will not affect the viewing effect and improve the coding efficiency.
Specifies the previous inter coding (? ), we can pass the residual that needs DCT transform through JND adaptive residual filter:
Is a threshold to ensure that the filter is enabled when the frame motion is relatively small. According to experience, this value is often set to 10. (The average quantization step size is greater than this value, which means that the pixel changes greatly, that is, the motion is fast ... obviously). When the JND adaptive residual filter is turned on, for a block (macroblock), if the residual in the whole block does not exceed the JND threshold, then this block (macroblock) can be regarded as an all-zero block to simplify compression. If only part of the residual is lower than the JND threshold, the variance of DCT coefficients will become smaller after passing through the residual filter. From the point of view of rate distortion, a signal with low variance will have a reconstructed signal with low objective distortion at a given bit rate.
For a given bit, if the residual above JND threshold can make up for the loss of the residual below JND threshold, then JND adaptive residual filter can not only reduce the perceptual distortion, but also reduce the objective distortion.
If not, there will be buffer overflow and low bandwidth utilization. When the average motion of previous inter-frame coding is relatively small, those residuals below JND threshold will be coded to make full use of bandwidth and reduce objective distortion.
Rate distortion analysis of JND adaptive residual filter
Followed by a series of rate-distortion analysis and experimental data (probably demonstrating that "when the average motion of previous inter-frame coding is small, those residuals below JND threshold will be coded to make full use of bandwidth and reduce objective distortion").
I've only seen so much so far, but I haven't learned the piece of rate distortion. I'm going to take time to read it and come back to read the original. I'm going to use this as a demonstration in English class, and I'm going to grab PPT.