Neural Network Model Compression Algorithms for Image Classification in Embedded Systems
Keywords:
Deep learning, Model compression, Pruning, Quantization, Knowledge distillation, Embedded systemAbstract
This paper introduces model compression algorithms that make a deep neural network smaller and faster for embedded systems. The model compression algorithms can be largely categorized into pruning, quantization, and knowledge distillation. In this study, gradual pruning, quantization aware training, and knowledge distillation which learns the activation boundary in the hidden layer of the teacher neural network are integrated. As a large deep neural network is compressed and accelerated by these algorithms, embedded computing boards can run the deep neural network much faster with less memory usage while preserving reasonable accuracy. To evaluate the performance of the compressed neural networks, we evaluate the size, latency, and accuracy of the deep neural network, DenseNet201, for image classification with the CIFAR-10 dataset on the NVIDIA Jetson Xavier.
References
Krizhevsky A, Sutskever I, Hinton GE, 2017, ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60(6): 84–90. https://doi.org/10.1145/3065386
Deng J, Dong W, Socher R, et al. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15–20, 2009: ImageNet: A Large-scale Hierarchical Image Database. 2009, Miami. https://doi.org/10.1109/CVPR.2009.5206848
He K, Zhang X, Ren S, et al. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27–30, 2016: Deep Residual Learning for Image Recognition. 2016, Las Vegas. https://doi.org/10.1109/CVPR.2016.90
Dosovitskiy A, Beyer L, Kolesnikov A, et al. The 9th International Conference on Learning Representations (ICLR 2021), May 3–7, 2021: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2021, Vienna. https://openreview.net/forum?id=YicbFdNTTy.
Oh S, Kim H, Cho S, et al., 2020, Development of a Compressed Deep Neural Network for Detecting Defected Electrical Substation Insulators Using a Drone. Journal of Institute of Control, Robotics and Systems, 26(11): 884–890. https://doi.org/10.5302/j.icros.2020.20.0117
Howard AG, Zhu M, Chen B, et al., 2017, Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861. https://arxiv.org/abs/1704.04861
Uetsuki T, Okuyama Y, Shin J. 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), December 20–23, 2021: CNN-Based End-to-End Autonomous Driving on FPGA Using TVM and VTA. 2021, Singapore. https://doi.org/10.1109/MCSoC51149.2021.00028
Han S, Mao H, Dally WJ, 2015, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv: 1510.00149. https://arxiv.org/abs/1510.00149
LeCun Y, John D, Sara S, 1989, Optimal Brain Damage. Advances in Neural Information Processing Systems. https://papers.nips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html
Hassibi B, Stork D, 1992, Second Order Derivatives for NetworkPruning: Optimal Brain Surgeon. Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/1992/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html
Wu H, Judd P, Zhang X, et al., Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation. arXiv, preprint: 2004.09602. https://arxiv.org/abs/2004.09602
Hinton GE, Vinyals O, Dean J, 2014, Distilling the Knowledge in a Neural Network. Advances in Neural Information Processing Systems. https://arxiv.org/abs/1503.02531
Zhu M, Gupta S. The 6th International Conference on Learning Representations (ICLR 2018), April 30–May 3, 2018: To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression. 2018, Vancouver. https://arxiv.org/abs/1710.01878.
Heo B, Lee M, Yun S, et al. The 33rd AAAI Conference on Artificial Intelligence (AAAI-19), January 27–February 1, 2019: Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons. 2019, Honolulu. https://doi.org/10.1609/aaai.v33i01.33013779
Lecun Y, Bottou L, Bengio Y, et al., 1998, Gradient-Based Learning Applied to Document Recognition. IEEE, 86(11): 2278–2324. https://doi.org/10.1109/5.726791
Simonyan K, Zisserman A. The 3rd International Conference on Learning Representations (ICLR 2015), May 7–9, 2015: Very Deep Convolutional Networks for Large Scale Image Recognition. 2015, San Diego. https://arxiv.org/abs/1409.1556
Huang G, Liu Z, van der Maaten L, et al. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21–26, 2017: Densely Connected Convolutional Networks. 2017, Honolulu. https://doi.org/10.1109/CVPR.2017.243
Han S, Pool J, Tran J, et al., 2015, Learning Both Weights and Connections for Efficient Neural Networks. arXiv: 1506.02626. https://arxiv.org/abs/1506.02626
See A, Luong M-T, Manning CD. The 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL), August 11–12, 2016: Compression of Neural Machine Translation via Pruning. 2016, Berlin. https://doi.org/10.18653/v1/K16-1029
Narang S, Elsen E, Diamos G, et al., 2017, Exploring Sparsity in Recurrent Neural Networks. arXiv: 1704.05119. https://arxiv.org/abs/1704.05119
Romero A, Ballas N, Kahou SE, et al., 2015, Fitnets: Hints for Thin Deep Nets. arXiv: 1412.6550. https://arxiv.org/abs/1412.6550
Lim S, Kim I, Kim T, et al., 2019, Fast Autoaugment. Advances in Neural Information Processing Systems, 32. https://papers.nips.cc/paper/2019/hash/6add07cf50424b14fdf649da87843d01-Abstract.html.