• Complex
  • Title
  • Author
  • Keyword
  • Abstract
  • Scholars
Search

Author:

Zhao, Chen (Zhao, Chen.) | Wang, Fei (Wang, Fei.) | Lin, Zhen (Lin, Zhen.) | Zhou, Huiyang (Zhou, Huiyang.) | Zheng, Nanning (Zheng, Nanning.)

Indexed by:

CPCI-S Scopus EI

Abstract:

GPUs are widely used to accelerate general purpose applications, and could hide memory latency through massive multithreading. But multithreading can increase contention for the L1 data caches (L1D). This problem is exacerbated when an application contains irregular memory references which would lead to un-coalesced memory accesses. In this paper, we propose a simple yet effective GPU cache Bypassing scheme for Un-Coalesced Loads (BUCL). BUCL makes bypassing decisions at two granularities. At the instruction- level, when the number of memory accesses generated by a non-coalesced load instruction is bigger than a threshold, referred as the threshold of un-coalescing degree (TUCD), all the accesses generated from this load will bypass L1D. The reason is that the cache data filled by un-coalesced loads typically have low probabilities to be reused. At the level of each individual memory access, when the L1D is stalled, the accessed data is likely with low locality, and the utilization of the target memory sub-partition is not high, this memory access may also bypass L1D. Our experiments show that BUCL achieves 36% and 5% performance improvement over the baseline GPU for memory un- coalesced and memory coherent benchmarks, respectively, and also significantly outperforms prior GPU cache bypassing and warp throttling schemes.

Keyword:

Cache Bypassing Data Cache GPU Load Instruction Memory divergence Un-Coalesced

Author Community:

  • [ 1 ] [Zhao, Chen; Wang, Fei; Zheng, Nanning] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
  • [ 2 ] [Lin, Zhen; Zhou, Huiyang] North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC USA
  • [ 3 ] [Zhao, Chen]Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
  • [ 4 ] [Wang, Fei]Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
  • [ 5 ] [Zheng, Nanning]Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
  • [ 6 ] [Lin, Zhen]North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC USA
  • [ 7 ] [Zhou, Huiyang]North Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC USA

Reprint Author's Address:

  • Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China.

Show more details

Related Keywords:

Related Article:

Source :

2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS)

ISSN: 1521-9097

Year: 2016

Page: 908-915

Language: English

Cited Count:

WoS CC Cited Count: 4

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

FAQ| About| Online/Total:1253/178180354
Address:XI'AN JIAOTONG UNIVERSITY LIBRARY(No.28, Xianning West Road, Xi'an, Shaanxi Post Code:710049) Contact Us:029-82667865
Copyright:XI'AN JIAOTONG UNIVERSITY LIBRARY Technical Support:Beijing Aegean Software Co., Ltd.