基于GPU加速的全源对最短路径并行算法

肖汉; 肖诗洋; 李焕勤; 周清雷

doi:10.7540/j.ynu.20220170

基于GPU加速的全源对最短路径并行算法

All-source pair shortest path parallel algorithm based on GPU acceleration

摘要

摘要: 针对最短路径算法处理大规模数据集低效的问题，提出了基于图形处理器（Graphics Processing Unit, GPU）加速的全源对最短路径并行算法. 首先通过优化矩阵乘法算法实现了在工作组内和组间进行并行运算数据，然后减少了非规则行造成的工作项分支，最后降低了工作项对邻接矩阵计算条带存储资源的访问延时. 实验结果表明，与基于AMD Ryzen5 1600X CPU的串行算法、基于开放多处理（Open Multi-Processing, OpenMP）并行算法和基于统一计算设备架构（Compute Unified Device Architecture, CUDA）并行算法相比，最短路径并行算法在开放式计算语言（Open Computing Language, OpenCL）架构下NVIDIA GeForce GTX 1070计算平台上分别获得了196.35、36.76和2.25倍的加速比，验证了提出的并行优化方法的有效性和性能可移植性.

Abstract: Aiming at the problem that the shortest path algorithm is inefficient in processing large-scale datasets, this paper proposes an all-pairs shortest paths parallel algorithm based on Graphics Processing Unit (GPU) acceleration. Firstly, the optimized matrix multiplication algorithm is used to realize the parallel computing data inter-workgroup and intra-workgroup. Then the branch of work-items caused by irregular rows is reduced. Finally, the access delay of work-items to the strip storage resources of adjacency matrix calculation is reduced. The experimental results show that compared with the performance of the serial algorithm based on AMD Ryzen5 1600X CPU, parallel algorithm based on Open Multi-Processing (OpenMP) and parallel algorithm based on Compute Unified Device Architecture (CUDA), the shortest path parallel algorithm obtains 196.35, 36.76 and 2.25 times speedup in the NVIDIA GeForce GTX 1070 computing platform under the Open Computing Language (OpenCL) architecture respectively. The validity and performance portability of the proposed parallel optimization method are verified.

HTML全文

参考文献(31)

施引文献

资源附件(0)