Indexed by:
Abstract:
As the important research achievement, deep convolutional neural networks have been widely applied to various fields such as computer vision, natural language processing, information retrieval, speech recognition, semantic understanding, and have attracted a wave of neural networks research from both academia and industry and have contributed to the development of artificial intelligence. The convolutional neural networks directly treat the original data as input, automatically learn the feature representations from a large number of training data. The convolutional neural networks have the characteristics of local connection, weight sharing and pooling operation, which can effectively decrease the network complexity and reduce the number of training parameters, so that the model has some certain invariance to translation, distortion and scale. Currently, many approaches of deep neural networks, including the increase of size and complexity of neural networks, the use of larger sets of training data, the improvement of neural network architecture and training methods, etc., have been proposed to simulate the complex hierarchical cognitive attributes of human brain and pull close the gap between the human brain and visual system, so that the machine has the capability to capture 'abstraction concepts'. The deep convolutional neural networks have been a great success in many computer vision tasks, such as image classification, object detection, face recognition, and person re-identification. In this paper, we first review the history of the development of convolutional neural networks, and briefly introduce M-P neuron model, Hubel-Wiesel model, Neocognitron, LeNet for handwriting recognition, and deep convolutional neural network for image classification in the ImageNet competition. Then we have a detailed analysis of the fundamental principle of deep convolutional neural networks, and introduce the mathematical representation and the respective functions of the convolution layer, the pooling layer and the fully connected layer. Besides, this paper focuses on the representative works of convolutional neural networks on the following three aspects, and demonstrates various technical methods in improving the accuracy of image classification using examples. In the aspect of increasing the number of neural networks' layers, the architectures of classical convolutional neural networks such as AlexNet, ZF-Net, VGG, GoogLeNet and ResNet are discussed and analyzed. In the aspect of increasing the amount of data, we introduce the difficulties of increasing the number of annotated samples by manual way, and the effect in improving the performance of convolutional neural networks by data augmentation. In the aspect of improving training methods, we introduce the generalized regularization techniques such as the L2 regularization, Dropout, DropConnect and Maxout, several frequently-used neuron activation functions such as the sigmoid function, the tanh function, the ReLU function, the LReLU function, and the PReLU function, several different loss functions such as the softmax loss, the hinge loss, the contrastive loss and the triplet loss, and the basic idea of the batch normalization technique. In the field of computer vision, this paper focuses on the more recent research progress of convolutional neural networks in image classification, object detection, face recognition, pedestrian recognition, image semantic segmentation, image captioning, image super resolution, human action recognition and image retrieval. From the prospective of the human visual cognitive mechanism, we analyze the relevant theoretical achievements of hierarchical processing in the visual system and 'global first' visual and cognitive process, and some theoretical implications for the current computational models. Finally, some remained problems and challenges of the brain-like intelligence research based on deep convolutional neural networks are concluded. © 2019, Science Press. All right reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Jisuanji Xuebao/Chinese Journal of Computers
ISSN: 0254-4164
Year: 2019
Issue: 3
Volume: 42
Page: 453-482
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 135
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 13
Affiliated Colleges: