李孟娟, 贾连印, 陈文焰, 吕晓伟, 章露露. 基于OpenMP的并行集合包含查询算法[J]. 云南大学学报(自然科学版), 2016, 38(3): 376-382. doi: 10.7540/j.ynu.20150685
引用本文: 李孟娟, 贾连印, 陈文焰, 吕晓伟, 章露露. 基于OpenMP的并行集合包含查询算法[J]. 云南大学学报(自然科学版), 2016, 38(3): 376-382. doi: 10.7540/j.ynu.20150685
LI Meng-juan, JIA Lian-yin, CHEN Wen-yan, LYU Xiao-wei, ZHANG Lu-lu. OpenMP based parallel set containtment query algorithm[J]. Journal of Yunnan University: Natural Sciences Edition, 2016, 38(3): 376-382. DOI: 10.7540/j.ynu.20150685
Citation: LI Meng-juan, JIA Lian-yin, CHEN Wen-yan, LYU Xiao-wei, ZHANG Lu-lu. OpenMP based parallel set containtment query algorithm[J]. Journal of Yunnan University: Natural Sciences Edition, 2016, 38(3): 376-382. DOI: 10.7540/j.ynu.20150685

基于OpenMP的并行集合包含查询算法

OpenMP based parallel set containtment query algorithm

  • 摘要: 集合包含查询分为子集、等值和超集3种查询,在多个领域有重要的研究意义和应用价值.随着集合数据集规模的不断增大,迫切需要提高集合包含查询的效率.集合包含查询并行化是解决这一问题的一条途径,基于OpenMP提出并行子集、等值和超集查询算法,这些算法采用反向索引结构,通过for循环并行化实现查询间的并行执行.为提高算法效率,设计2个高效的并行共享数据结构:①PVEC结构:用于存储并行查询结果.②CountArr数组:针对超集查询,用于对反向列表中的元素计数,并行线程可异步地对这两个结构进行访问.在MSWEB和DBLP 2个数据集上进行扩展实验,结果表明,实现的3种并行集合包含查询具有较高的效率,对3种查询在MSWEB数据集上均可达到4X以上的加速比.

     

    Abstract: Set containment query is important in many fields and can be divided into three parts:subset query,equality query and superset query.With the increasing scale of set data,it is urgent to accelerate set containment query.Parallel set containment query is the way to solve this problem.In this paper three parallel set containment query algorithms are proposed based on OpenMP.These algorithms use inverted indexes as underlying structure and execute queries in parallel by paralleling for-cycles.To improve the efficiency of these algorithms,two efficient parallel data structures,PVEC and CountArr,are designed.PVEC is used to store the query results and CountArr is designed for set superset query to count the elements in related inverted lists.Each thread access disjoint part of these two structures and all threads run in parallel asynchronously.Experiments are carried out on two real datasets,MSWEB and DBLP,the results show that the three algorithms can achieve a speedup more than 4X over corresponding serial algorithms on MSWEB dataset.

     

/

返回文章
返回