The first is the background. In the field of image segmentation, the image is input to CNN (typical network such as FCN[3]), and FCN convolves the image first, and then poops it, which reduces the image size and increases the receptive field. However, because the image segmentation prediction is output pixel by pixel, it is necessary to upsample the pooled smaller image size to the original image size for prediction (the upsampling generally adopts deconvolution operation, and the deconvolution can be found in Zhihu's answer: How to understand the deconvolution network in deep learning? ), the previous aggregation operation enables each pixel to predict more receptive field information. Therefore, image segmentation FCN has two keys. One is that aggregation reduces the image size and increases the receptive field, and the other is that upsampling enlarges the image size. There must be some information lost in the process of decreasing first and then increasing, so can you design a new operation and see more information without pooling? The answer is expanded conv.