Current location - Education and Training Encyclopedia - Graduation thesis - How to write an inference engine paper
How to write an inference engine paper
TensorFlow Lite(TFLite) now supports GPU reasoning using OpenCL on Android devices, which makes the performance of TFLite about 2 times higher than that using the existing OpenGL backend.

TensorFlow Lite team introduced its progress in mobile GPU reasoning using OpenCL, and announced the official launch of Android mobile GPU reasoning engine based on OpenCL. Compared with the existing OpenGL backend, the engine can provide up to 2 times performance improvement on a reasonable scale of neural network.

OpenGL ES 3. 1 adds computational shaders, but its backward-compatible API design decision limits the full potential of GPU. On the other hand, OpenCL was designed for computing with various accelerators from the beginning, so it is more relevant to the field of mobile GPU reasoning.

Therefore, TFLite team studied the reasoning engine based on OpenCL and introduced some functions to optimize the reasoning engine of mobile GPU.

In contrast, the new mobile GPU inference engine has the following points:

Performance analysis: Compared with OpenGL, the optimization of OpenCL backend is much easier, because OpenCL provides good analysis function and Qualcomm Adreno also supports it well. Using these analysis APIs, the performance of each kernel schedule can be measured very accurately.

Optimize workgroup size: The performance of TFLite GPU on Adreno GPU in Qualcomm is very sensitive to workgroup size. Choosing the right workgroup size can improve performance, and vice versa. Using the performance analysis function in OpenCL mentioned above, an optimizer for workgroup size can be realized, and the average speed is increased by 50%.

Native 16-bit precision floating point (FP 16): OpenCL natively supports FP 16 and requires the accelerator to specify the availability of data types. As a part of the formal specification, even some older GPUs, such as Adreno 305 released on 20 12, can play all the functions.

Constant memory): OpenCL has the concept of constant memory. Qualcomm added physical memory function, making it very suitable for use with OpenCL's constant memory.

For some special cases, such as the very thin layer at the beginning or end of the neural network, this has proved to be very effective. OpenCL on Adreno can greatly exceed the performance of OpenGL through the synergy with physical constant memory and the native FP 16 support mentioned above.

TFLite specifically shows the performance comparison between GPU with existing OpenGL backend and GPU with new OpenCL backend on CPU (single thread on large kernel).

The above figure illustrates the performance of reasoning engine on specific Android devices using OpenCL on two famous neural networks, MNASNet 1.3 and SSD MobileNet v3 (large scale). As you can see, the speed of the new OpenCL backend is about twice that of the OpenGL backend, and the performance of OpenCL is even better on a larger network.

In addition, because OpenCL itself is not part of Android, some users may not be able to use it. In order to simplify the development, some modifications have been added to the TFLite GPU delegation. First, it checks the availability of OpenCL at runtime, and if it is available, it will use the new OpenCL backend, otherwise it will return to the existing OpenGL backend.

In fact, OpenCL backend has existed in TensorFlow repository since 2065438+mid-2009, and it is seamlessly integrated through TFLite GPU delegate v2.

In addition, it is now the "golden September and silver ten job-hopping period", and many friends are eager to try. There are too many difficulties for programmers to learn knowledge and technology. If you want to be eliminated by the interview, you can only make a review plan in advance, brush the questions carefully, and constantly improve yourself in your study. Personally, it is essential to give yourself a complete knowledge and brush the questions before the interview. Doing knowledge combing can deepen your grasp of the principle, while brushing questions can improve the breadth and depth of your technical interview.

Remember, it's always just that we adapt to the environment, not that the environment adapts to us!

Attached are 20 sets of Android interview questions I collected before (including BAT, Xiaomi, Huawei, Meituan and Didi) and my own Android review notes (including Android basic knowledge points, Android extended knowledge points, Android source code analysis, design pattern summary, Gradle knowledge points and summary of common algorithm problems * * * 732 pages).

Tencent Android interview questions (Java part)

Tencent Android Interview Zhenti (Android Part)

Overview of the Classification of Android Interview Questions in First-and Second-tier Internet Companies