Current location - Education and Training Encyclopedia - Graduation thesis - Research on the Depth and Diversity of Routing Model
Research on the Depth and Diversity of Routing Model
This paper is Google Brain's research on heterogeneous structure and different depths of routing model, which is included in ICLR20 19. Some views on the selection of topk in this paper have certain influence on the follow-up research of switching transformers. It can be said that this is some intermediate thinking of the bosses of Google's brain in the exploration of routing models.

Thesis title: "Diversity and depth of case-based routing model"

Address: component simplifies modeling and improves model effect.

Additional ablation experiments were done on different data sets, and the final results were very general. The final results have different effects on different data sets, and have little to do with the data size. In this paper, the author concludes that it is a direct way to improve the model effect by adding different operations and adding copies of important operations. Actually, I don't know much about this place either. It can be seen that the effect of using only 3×3 convolution kernel is very good, and the effect of heterogeneous expert layer is similar, but the diversity & personally, this conclusion is a bit far-fetched.

Wiring depth:

This paper gives the influence on CIFAR- 10. As a result, it seems that the routing model is even worse. For example, under the configuration of cell nums=6 and filters nums=64, the effect of the routing model is equivalent to that of all-on C=6 and F=32, but the calculation amount is twice as much as that of the searched single model, even 3.5 times. When c is increased to 12, the effect even becomes worse. The author thinks that complex routing optimization leads to poor results, and the router method like noisy top-k gating is empirical and cannot learn strong solutions.

Conclusion:

In terms of structural heterogeneity, this experiment has achieved important results in the final effect of the model? At that time, the optimization of routing depth was still uncertain.

The author thinks that it is necessary for routing model to open up the pain area of static model. It is predicted that the very large-scale task is the field of routing model (switch transformer will live up to expectations in two years), and we are optimistic about the optimization of routing model.