Neural Network Model Compression Algorithms for Image Classification in Embedded Systems

Authors

    Heejung Shin, Hyondong Oh Department of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea Department of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea

Keywords:

Deep learning, Model compression, Pruning, Quantization, Knowledge distillation, Embedded system

Abstract

This paper introduces model compression algorithms that make a deep neural network smaller and faster for embedded systems. The model compression algorithms can be largely categorized into pruning, quantization, and knowledge distillation. In this study, gradual pruning, quantization aware training, and knowledge distillation which learns the activation boundary in the hidden layer of the teacher neural network are integrated. As a large deep neural network is compressed and accelerated by these algorithms, embedded computing boards can run the deep neural network much faster with less memory usage while preserving reasonable accuracy. To evaluate the performance of the compressed neural networks, we evaluate the size, latency, and accuracy of the deep neural network, DenseNet201, for image classification with the CIFAR-10 dataset on the NVIDIA Jetson Xavier.

References

Krizhevsky A, Sutskever I, Hinton GE, 2017, ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60(6): 84–90. https://doi.org/10.1145/3065386

Deng J, Dong W, Socher R, et al. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15–20, 2009: ImageNet: A Large-scale Hierarchical Image Database. 2009, Miami. https://doi.org/10.1109/CVPR.2009.5206848

He K, Zhang X, Ren S, et al. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27–30, 2016: Deep Residual Learning for Image Recognition. 2016, Las Vegas. https://doi.org/10.1109/CVPR.2016.90

Dosovitskiy A, Beyer L, Kolesnikov A, et al. The 9th International Conference on Learning Representations (ICLR 2021), May 3–7, 2021: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021, Vienna. https://openreview.net/forum?id=YicbFdNTTy.

Oh S, Kim H, Cho S, et al., 2020, Development of a Compressed Deep Neural Network for Detecting Defected Electrical Substation Insulators Using a Drone. Journal of Institute of Control, Robotics and Systems, 26(11): 884–890. https://doi.org/10.5302/j.icros.2020.20.0117

Howard AG, Zhu M, Chen B, et al., 2017, Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861. https://arxiv.org/abs/1704.04861

Uetsuki T, Okuyama Y, Shin J. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20–23, 2021: CNN-Based End-to-End Autonomous Driving on FPGA Using TVM and VTA. 2021, Singapore. https://doi.org/10.1109/MCSoC51149.2021.00028

Han S, Mao H, Dally WJ, 2015, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv: 1510.00149. https://arxiv.org/abs/1510.00149

LeCun Y, John D, Sara S, 1989, Optimal Brain Damage. Advances in Neural Information Processing Systems. https://papers.nips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html

Hassibi B, Stork D, 1992, Second Order Derivatives for NetworkPruning: Optimal Brain Surgeon. Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/1992/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html

Wu H, Judd P, Zhang X, et al., Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation. arXiv, preprint: 2004.09602. https://arxiv.org/abs/2004.09602

Hinton GE, Vinyals O, Dean J, 2014, Distilling the Knowledge in a Neural Network. Advances in Neural Information Processing Systems. https://arxiv.org/abs/1503.02531

Zhu M, Gupta S. The 6th International Conference on Learning Representations (ICLR 2018), April 30–May 3, 2018: To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. 2018, Vancouver. https://arxiv.org/abs/1710.01878.

Heo B, Lee M, Yun S, et al. The 33rd AAAI Conference on Artificial Intelligence (AAAI-19), January 27–February 1, 2019: Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. 2019, Honolulu. https://doi.org/10.1609/aaai.v33i01.33013779

Lecun Y, Bottou L, Bengio Y, et al., 1998, Gradient-Based Learning Applied to Document Recognition. IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791

Simonyan K, Zisserman A. The 3rd International Conference on Learning Representations (ICLR 2015), May 7–9, 2015: Very Deep Convolutional Networks for Large Scale Image Recognition. 2015, San Diego. https://arxiv.org/abs/1409.1556

Huang G, Liu Z, van der Maaten L, et al. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21–26, 2017: Densely Connected Convolutional Networks. 2017, Honolulu. https://doi.org/10.1109/CVPR.2017.243

Han S, Pool J, Tran J, et al., 2015, Learning Both Weights and Connections for Efficient Neural Networks. arXiv: 1506.02626. https://arxiv.org/abs/1506.02626

See A, Luong M-T, Manning CD. The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL), August 11–12, 2016: Compression of Neural Machine Translation via Pruning. 2016, Berlin. https://doi.org/10.18653/v1/K16-1029

Narang S, Elsen E, Diamos G, et al., 2017, Exploring Sparsity in Recurrent Neural Networks. arXiv: 1704.05119. https://arxiv.org/abs/1704.05119

Romero A, Ballas N, Kahou SE, et al., 2015, Fitnets: Hints for Thin Deep Nets. arXiv: 1412.6550. https://arxiv.org/abs/1412.6550

Lim S, Kim I, Kim T, et al., 2019, Fast Autoaugment. Advances in Neural Information Processing Systems, 32. https://papers.nips.cc/paper/2019/hash/6add07cf50424b14fdf649da87843d01-Abstract.html.

Downloads

Published

2022-12-31