The performance optimization on TensorFlow framework on Mobile GPU devices using OpenCL

Published in NA, 2018

Abstract

The advancement of mobile computing technology and the recent progress in AI have driven the prosperity of edge computing which means the computation used to happen in the cloud is now shifting to edge devices. Before the blossom of smart phones, mobile devices merely served as a communication medium; however, it’s so powerful and energy efficient now, it’s capable of operating intensive AI computation within a reasonable power budget. Yet, not all open-source AI frameworks in the market support AI training on mobile devices. In this paper, the feasibility of training a small AI task, the MNIST handwritten dataset, using Tensorflow framework on mobile CPU/GPU was demonstrated. To further optimize the Tensorflow framework performance on mobile devices. Benchmark programs were executed on mobile GPU to better understand the underlying architecture. Based on the benchmark results collected, GPU optimization techniques were applied to conquer the system bottleneck. As a result, the matrix multiplication task was accelerated by 2.16x times compared to the baseline performance.

Download paper (11 pages) here

Download full report (85 pages) here


Despite its unpolished words and rusty experiments. This paper is my first attempt to analyze a challenging topic systematically, and it is meaningful to me personally. This work has not been formally published due to its sub-optimal quality, and it’s listed here for your information only.