PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Experimental Results

2 Apr 2024

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Minghao Yan, University of Wisconsin-Madison;

(2) Hongyi Wang, Carnegie Mellon University;

(3) Shivaram Venkataraman, [email protected].

Table of Links

B EXPERIMENTAL RESULTS

In this section, we further demonstrate the tradeoff between memory frequency and maximum GPU frequency by presenting an array of results. These results underline the interesting observation that the energy consumption patterns vary for the same model operating on different devices. Furthermore, even for the same model device pairing, the optimization landscape can be significantly influenced by the batch size. This underlines the complexities of energy optimization and the need for an adaptive framework that can take these factors into account. Figures 6 − 12 show the energy consumption patterns of EfficientNet and Bert on Jetson TX2 and Orin under various batch sizes. Table 7 shows the optimal CPU frequency and corresponding energy consumption reduction in image preprocessing.

Figure 6. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP16 on JetsonTX2 versus varying Memory and GPU frequency with batch size fixed at 1.

Figure 7. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP32 on JetsonTX2 versus varying Memory and GPU frequency with batch size fixed at 1.

Figure 8. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for Bert at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 8.

Figure 9. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B4 at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 16.

Figure 10. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson TX2 versus varying Memory and GPU frequency with batch size fixed at 16.

Figure 11. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 8.

Figure 12. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B7 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 1.

Figure 13. This figure shows per query energy cost as we vary the GPU frequency and memory frequency for EfficientNet B4 at FP16 on Jetson Orin versus varying Memory and GPU frequency with batch size fixed at 8.

← Previous

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices: Opportunities

Up Next →

Using Autodiff to Estimate Posterior Moments, Marginals and Samples: Related Work