• Complex
  • Title
  • Author
  • Keyword
  • Abstract
  • Scholars
Search
High Impact Results & Cited Count Trend for Year Keyword Cloud and Partner Relationship

Query:

学者姓名:孙宏滨

Refining:

Source

Submit Unfold

Co-Author

Submit Unfold

Language

Submit

Clean All

Export Sort by:
Default
  • Default
  • Title
  • Year
  • WOS Cited Count
  • Impact factor
  • Ascending
  • Descending
< Page ,Total 8 >
Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration EI SCIE
期刊论文 | 2021 , 68 (8) , 3279-3292 | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
Abstract&Keyword Cite

Abstract :

Depthwise separable convolution (DSC) has become one of the essential structures for lightweight convolutional neural networks. Nevertheless, its hardware architecture has not received much attention. Several previous hardware designs incur either high off-chip memory traffic or large on-chip memory usage, and hence have deficiency in terms of hardware efficiency as well as performance. This paper proposes two efficient dynamic design techniques, i.e. adaptive row-based dataflow scheduling and adaptive computation mapping, to achieve a much better trade-off between hardware efficiency and performance for DSC-based lightweight CNN accelerator. The effectiveness and efficiency of the proposed dynamic design techniques have been extensively evaluated using six DSC-based lightweight CNNs. Compared with the reference architectures, the simulation results show the proposed architectural techniques can at least reduce on-chip buffer size by 50.4% and improve the performance of convolution calculation by 1.18x while maintaining the minimum off-chip memory traffic. MobileNetV2 is implemented on Zynq UltraScale+ ZCU102 SoC FPGA, and the results show the proposed accelerator can achieve 381.7 frames per second (fps), which is 1.43x of the reference design, and it can save about 36.3% on-chip buffer size compared with the reference design, while maintaining the same off-chip memory traffic.

Keyword :

Convolutional neural network adaptive computation mapping depthwise separable convolution adaptive row-based dataflow scheduling

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Li, Baoting , Wang, Hang , Zhang, Xuchong et al. Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS , 2021 , 68 (8) : 3279-3292 .
MLA Li, Baoting et al. "Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 68 . 8 (2021) : 3279-3292 .
APA Li, Baoting , Wang, Hang , Zhang, Xuchong , Ren, Jie , Liu, Longjun , Sun, Hongbin et al. Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS , 2021 , 68 (8) , 3279-3292 .
Export to NoteExpress RIS BibTex
Efficient Repair Analysis Algorithm Exploration for Memory With Redundancy and In-Memory ECC EI SCIE
期刊论文 | 2021 , 70 (5) , 775-788 | IEEE TRANSACTIONS ON COMPUTERS
WoS CC Cited Count: 1
Abstract&Keyword Cite

Abstract :

In-memory error correction code (ECC) is a promising technique to improve the yield and reliability of high density memory design. However, the use of in-memory ECC poses a new problem to memory repair analysis algorithm, which has not been explored before. This article first makes a quantitative evaluation and demonstrates that the straightforward algorithms for memory with redundancy and in-memory ECC have serious deficiency on either repair rate or repair analysis speed. Accordingly, an optimal repair analysis algorithm that leverages preprocessing/filter algorithms, hybrid search tree, and depth-first search strategy is proposed to achieve low computational complexity and optimal repair rate in the meantime. In addition, a heuristic repair analysis algorithm that uses a greedy strategy is proposed to efficiently find repair solutions. Experimental results demonstrate that the proposed optimal repair analysis algorithm can achieve optimal repair rate and increase the repair analysis speed by up to 10(5) x compared with the straightforward exhaustive search algorithm. The proposed heuristic repair analysis algorithm is approximately 28 percent faster than the proposed optimal algorithm, at the expense of 5.8 percent repair rate loss.

Keyword :

reliability Memory repair yield repair analysis algorithm in-memory ECC

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Lv, Minjie , Sun, Hongbin , Xin, Jingmin et al. Efficient Repair Analysis Algorithm Exploration for Memory With Redundancy and In-Memory ECC [J]. | IEEE TRANSACTIONS ON COMPUTERS , 2021 , 70 (5) : 775-788 .
MLA Lv, Minjie et al. "Efficient Repair Analysis Algorithm Exploration for Memory With Redundancy and In-Memory ECC" . | IEEE TRANSACTIONS ON COMPUTERS 70 . 5 (2021) : 775-788 .
APA Lv, Minjie , Sun, Hongbin , Xin, Jingmin , Zheng, Nanning . Efficient Repair Analysis Algorithm Exploration for Memory With Redundancy and In-Memory ECC . | IEEE TRANSACTIONS ON COMPUTERS , 2021 , 70 (5) , 775-788 .
Export to NoteExpress RIS BibTex
Exploring Effective DNN Models for Forensic Age Estimation based on Panoramic Radiograph Images EI
会议论文 | 2021 , 2021-July | 2021 International Joint Conference on Neural Networks, IJCNN 2021
Abstract&Keyword Cite

Abstract :

Dental age estimation is widely used in forensic identification, but the accuracy of traditional methods cannot satisfy the demand for accuracy, especially for age estimation of adults. We introduce a deep learning-based methodology to estimate the age based on collected X-ray images of the teeth. We present a new dental dataset, which contains labeled orthopan-tomograms (OPGs) of 27, 957 people, including 16, 383 OPGs for females as well as 11, 574 OPGs for males. All ages range from 0 to 93-year-old with a median of 27. The accuracy of the age labels is guaranteed by the ID card information. Aiming at the characteristics of the dental data itself, we explore various neural network elements that are effective for age estimation, including proper network depth, convolution kernel size, multi-branch structure, and the feature reusing of early layers. Based on the characteristic exploration, we further search models for dental age estimation by using the popular Neural Architecture Search (NAS) method. Experiment results show that our model achieves a mean absolute error (MAE) of 1.64 years, surpass all existing CNN models. Compared with Inception-v4 with an MAE of 1.70 and 20.46B FLOPs (inputs size 384×384), the FLOPs of our model can be reduced by 2.7 times (7.49B FLOPs). To our best knowledge, this is the first study for age estimation by exploring and searching the DNN model. Our results have surpassed legal medical expert-level performance (with an MAE of more than 2) for age estimation. Our methodology and results in this paper are very meaningful to forensic medicine for aging estimation with panoramic radiograph images. © 2021 IEEE.

Keyword :

Radiography Digital forensics Multilayer neural networks Deep learning Network architecture

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Hou, Wenxuan , Liu, Longjun , Gao, Jinxia et al. Exploring Effective DNN Models for Forensic Age Estimation based on Panoramic Radiograph Images [C] . 2021 .
MLA Hou, Wenxuan et al. "Exploring Effective DNN Models for Forensic Age Estimation based on Panoramic Radiograph Images" . (2021) .
APA Hou, Wenxuan , Liu, Longjun , Gao, Jinxia , Zhu, Anguo , Pan, Keyang , Sun, Hongbin et al. Exploring Effective DNN Models for Forensic Age Estimation based on Panoramic Radiograph Images . (2021) .
Export to NoteExpress RIS BibTex
Lane Shared Bit-Pragmatic Deep Neural Network Computing Architecture and Circuit EI SCIE
期刊论文 | 2021 , 68 (1) , 486-490 | IEEE Transactions on Circuits and Systems II: Express Briefs
Abstract&Keyword Cite

Abstract :

It is critical to continously improve the hardware efficiency of deep neural network accelerators for its application on resource constrained platform. This brief proposes a lane shared bit-pragmatic architecture to address the synchronization induced performance bottleneck and hence further improve the performance and efficiency of bit-serial computing architecture. The effectiveness and efficiency of the proposed architecture are demonstrated by extensive evaluation results. © 2004-2012 IEEE.

Keyword :

Computer architecture Deep neural networks Efficiency Network architecture Neural networks Timing circuits

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Yang, Shaofei , Liu, Longjun , Li, Yingxiang et al. Lane Shared Bit-Pragmatic Deep Neural Network Computing Architecture and Circuit [J]. | IEEE Transactions on Circuits and Systems II: Express Briefs , 2021 , 68 (1) : 486-490 .
MLA Yang, Shaofei et al. "Lane Shared Bit-Pragmatic Deep Neural Network Computing Architecture and Circuit" . | IEEE Transactions on Circuits and Systems II: Express Briefs 68 . 1 (2021) : 486-490 .
APA Yang, Shaofei , Liu, Longjun , Li, Yingxiang , Li, Xinxin , Sun, Hongbin , Zheng, Nanning . Lane Shared Bit-Pragmatic Deep Neural Network Computing Architecture and Circuit . | IEEE Transactions on Circuits and Systems II: Express Briefs , 2021 , 68 (1) , 486-490 .
Export to NoteExpress RIS BibTex
AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning EI
会议论文 | 2021 , 648-657 | 29th ACM International Conference on Multimedia, MM 2021
Abstract&Keyword Cite

Abstract :

Pruning can remove redundant parameters and structures of Deep Neural Networks (DNNs) to reduce inference time and memory overhead. As an important component of neural networks, the feature map (FM) has stated to be adopted for network pruning. However, the majority of FM-based pruning methods do not fully investigate effective knowledge in the FM for pruning. In addition, it is challenging to design a robust pruning criterion with a small number of images and achieve parallel pruning due to the variability of FMs. In this paper, we propose Adaptive Knowledge Extraction for Channel Pruning (AKECP), which can compress the network fast and efficiently. In AKECP, we first investigate the characteristics of FMs and extract effective knowledge with an adaptive scheme. Secondly, we formulate the effective knowledge of FMs to measure the importance of corresponding network channels. Thirdly, thanks to the effective knowledge extraction, AKECP can efficiently and simultaneously prune all the layers with extremely few or even one image. Experimental results show that our method can compress various networks on different datasets without introducing additional constraints, and it has advanced the state-of-the-arts. Notably, for ResNet-110 on CIFAR-10, AKECP achieves 59.9% of parameters and 59.8% of FLOPs reduction with negligible accuracy loss. For ResNet-50 on ImageNet, AKECP saves 40.5% of memory footprint and reduces 44.1% of FLOPs with only 0.32% of Top-1 accuracy drop. © 2021 ACM.

Keyword :

Extraction Frequency modulation Arts computing Deep neural networks Data mining

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhang, Haonan , Liu, Longjun , Zhou, Hengyi et al. AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning [C] . 2021 : 648-657 .
MLA Zhang, Haonan et al. "AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning" . (2021) : 648-657 .
APA Zhang, Haonan , Liu, Longjun , Zhou, Hengyi , Hou, Wenxuan , Sun, Hongbin , Zheng, Nanning . AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning . (2021) : 648-657 .
Export to NoteExpress RIS BibTex
HSC: Leveraging horizontal shortcut connections for improving accuracy and computational efficiency of lightweight CNN EI SCIE
期刊论文 | 2021 , 457 , 141-154 | NEUROCOMPUTING
Abstract&Keyword Cite

Abstract :

The past few years have witnessed the dramatic increase in layers of convolutional neural networks (CNN). Most studies focused on the CNN's vertical structure design (e.g. residual structure, creating short paths architecture from early layers to later layers in vertical connections), but few people pay their attention to the process of feature generation and extraction in a single convolutional layer in CNN. In this paper, we find the non-feature suppression phenomenon in the process of extracting features. On the basis of this, we proposed an orthogonal approach named HSC (Horizontal Shortcut Connections) to improve feature representation fusion and computational efficiency for CNN. Especially, our HSC approach can effectively reduce interference overhead of non-feature areas and enhance the information fusion for depthwise convolution and group convolution which are the key blocks in lightweight neuron network. At HSC layer, the feature-maps of all preceding layer are properly connected with our strategy in horizon direction to constitute features and then produce a new representation which are used as input feature-maps passed on subsequent layers. Our HSC block can be plugged into convolution neural networks that include group convolution or depewise convolution, and can effectively improve accuracy of convolutional networks with slight additional computational cost. We evaluate our design on the popular lightweight neural networks and standard CNN structure. Compared with existing methods, we can achieve 1.63% accuracy improvement for MobileNet v2 on CIFAR-10 dataset and up to 3.70% accuracy improvement on CIFAR-100 dataset by adding HSC block after depthwise convolution, and 2.80% accuracy improvement on ImageNet dataset. For Mobilenet v3-small, we can achieve 0.8% accuracy improvement on ImageNet dataset. In order to prove the improvement effect of group convolution, the standard convolution is changed manually to group convolution and then the HSC block is added after group convolution, we can achieve 4X to 6X FLOPs improvement while maintaining the accuracy of neural networks. Notably, on ILSVRC-2012, our method reduces more than 43% FLOPs on ResNet-50 without accuracy declines and reduces 60.1% FLOPs on ResNet-50 with 0.44% accuracy declines.We also present primary hardware experiment results when HSC framework running on special hardware platform. (c) 2021 Elsevier B.V. All rights reserved.

Keyword :

Computational efficiency Mobile model Lightweight Neural network architecture FLOPs decrease Depthwise convolution Group convolution

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhu, Anguo , Liu, Longjun , Hou, Wenxuan et al. HSC: Leveraging horizontal shortcut connections for improving accuracy and computational efficiency of lightweight CNN [J]. | NEUROCOMPUTING , 2021 , 457 : 141-154 .
MLA Zhu, Anguo et al. "HSC: Leveraging horizontal shortcut connections for improving accuracy and computational efficiency of lightweight CNN" . | NEUROCOMPUTING 457 (2021) : 141-154 .
APA Zhu, Anguo , Liu, Longjun , Hou, Wenxuan , Sun, Hongbin , Zheng, Nanning . HSC: Leveraging horizontal shortcut connections for improving accuracy and computational efficiency of lightweight CNN . | NEUROCOMPUTING , 2021 , 457 , 141-154 .
Export to NoteExpress RIS BibTex
Fine-grained dynamic head for object detection EI
会议论文 | 2020 , 2020-December | 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Abstract&Keyword Cite

Abstract :

The Feature Pyramid Network (FPN) presents a remarkable approach to alleviate the scale variance in object representation by performing instance-level assignments. Nevertheless, this strategy ignores the distinct characteristics of different sub-regions in an instance. To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation. Moreover, we design a spatial gate with the new activation function to reduce computational complexity dramatically through spatially sparse convolutions. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks. Code is available at https://github.com/StevenGrove/DynamicHead. © 2020 Neural information processing systems foundation. All rights reserved.

Keyword :

Object detection

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Song, Lin , Li, Yanwei , Jiang, Zhengkai et al. Fine-grained dynamic head for object detection [C] . 2020 .
MLA Song, Lin et al. "Fine-grained dynamic head for object detection" . (2020) .
APA Song, Lin , Li, Yanwei , Jiang, Zhengkai , Li, Zeming , Sun, Hongbin , Sun, Jian et al. Fine-grained dynamic head for object detection . (2020) .
Export to NoteExpress RIS BibTex
Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators EI Scopus
会议论文 | 2020 , 315-319 | 2020 IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2020
Abstract&Keyword Cite

Abstract :

In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67×, 2.42×, 1.52× without other overhead on convolution layers. © 2020 IEEE.

Keyword :

Classification (of information) Chemical activation Artificial intelligence

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Yang, Shaofei , Liu, Longjun , Li, Baoting et al. Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators [C] . 2020 : 315-319 .
MLA Yang, Shaofei et al. "Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators" . (2020) : 315-319 .
APA Yang, Shaofei , Liu, Longjun , Li, Baoting , Sun, Hongbin , Zheng, Nanning . Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators . (2020) : 315-319 .
Export to NoteExpress RIS BibTex
Rethinking learnable tree filter for generic feature transform EI Scopus
会议论文 | 2020 , 2020-December | 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Abstract&Keyword Cite

Abstract :

The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. Nevertheless, the intrinsic geometric constraint forces it to focus on the regions with close spatial distance, hindering the effective long-range interactions. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. Besides, we propose a learnable spanning tree algorithm to replace the original non-differentiable one, which further improves the flexibility and robustness. With the above improvements, our method can better capture long-range dependencies and preserve structural details with linear complexity, which is extended to several vision tasks for more generic feature transform. Extensive experiments on object detection/instance segmentation demonstrate the consistent improvements over the original version. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles. Code is available at https://github.com/StevenGrove/LearnableTreeFilterV2. © 2020 Neural information processing systems foundation. All rights reserved.

Keyword :

Trees (mathematics) Image segmentation Semantics Object detection Benchmarking Signaling Markov processes Mathematical transformations

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Song, Lin , Li, Yanwei , Jiang, Zhengkai et al. Rethinking learnable tree filter for generic feature transform [C] . 2020 .
MLA Song, Lin et al. "Rethinking learnable tree filter for generic feature transform" . (2020) .
APA Song, Lin , Li, Yanwei , Jiang, Zhengkai , Li, Zeming , Zhang, Xiangyu , Sun, Hongbin et al. Rethinking learnable tree filter for generic feature transform . (2020) .
Export to NoteExpress RIS BibTex
Algorithm and VLSI Architecture Co-Design on Efficient Semi-Global Stereo Matching EI SCIE
期刊论文 | 2020 , 30 (11) , 4390-4403 | IEEE Transactions on Circuits and Systems for Video Technology | IF: 4.685
WoS CC Cited Count: 2
Abstract&Keyword Cite

Abstract :

Semi-global matching (SGM) is favored for high accuracy real-time stereo matching design as it achieves a good trade-off between disparity image quality and computational complexity. Nevertheless, most of previous SGM designs so far are restricted to the real-time processing of small image resolution and disparity range, or achieve high throughput by simplifying the original algorithm at the penalty of significant disparity image quality degradation. We analyze that the major challenge to efficient SGM design is its memory architecture, including both on-chip memory cost and off-chip memory bandwidth. We address the memory architecture challenge by algorithm and architecture co-design. Based on two observed features of SGM algorithm, i.e. incompleteness and inaccuracy, this paper proposes several efficient techniques to reduce on-chip memory cost and compress off-chip memory bandwidth respectively. Moreover, we also design high throughput and pipelined architecture to implement the proposed techniques. The disparity image quality and hardware efficiency of the proposed SGM design are evaluated on both KITTI2015 and Middlebury V3 stereo datasets. Evaluation results demonstrate that, the throughput of the proposed circuit designs can easily achieve 1080P@30fps at the disparity range of 128, and can reduce the on-chip memory cost and off-chip memory bandwidth by up to 4× and 2× respectively while achieving better or the same disparity image quality, compared with the best reference design techniques. © 1991-2012 IEEE.

Keyword :

Bandwidth Cost reduction Quality control Memory architecture VLSI circuits Stereo image processing Printed circuit design Image quality Integrated circuit design Economic and social effects Image resolution

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhang, Xuchong , Dai, He , Sun, Hongbin et al. Algorithm and VLSI Architecture Co-Design on Efficient Semi-Global Stereo Matching [J]. | IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 (11) : 4390-4403 .
MLA Zhang, Xuchong et al. "Algorithm and VLSI Architecture Co-Design on Efficient Semi-Global Stereo Matching" . | IEEE Transactions on Circuits and Systems for Video Technology 30 . 11 (2020) : 4390-4403 .
APA Zhang, Xuchong , Dai, He , Sun, Hongbin , Zheng, Nanning . Algorithm and VLSI Architecture Co-Design on Efficient Semi-Global Stereo Matching . | IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 (11) , 4390-4403 .
Export to NoteExpress RIS BibTex
10| 20| 50 per page
< Page ,Total 8 >

Export

Results:

Selected

to

Format:
FAQ| About| Online/Total:862/98212849
Address:XI'AN JIAOTONG UNIVERSITY LIBRARY(No.28, Xianning West Road, Xi'an, Shaanxi Post Code:710049) Contact Us:029-82667865
Copyright:XI'AN JIAOTONG UNIVERSITY LIBRARY Technical Support:Beijing Aegean Software Co., Ltd.