Abstract:Hash tables are well known for their access efficiency and time complexity O(1). As an algorithm and data structure available for efficient access to large-scale data, hash tables have been widely adopted by big data applications. For example, they are applicable to various kinds of workloads and scenarios in the emerging high-performance computing (HPC) domain and the database domain. As the hardware performance of the high-performance coprocessor graphics processing unit (GPU) improves continuously, parallel hash table optimization for high-performance GPUs has attracted a lot of researchers. According to our previous research survey, most of the current methods and solutions for GPU-based hash table optimization focus on the large-scale thread scheduling and high memory bandwidth of GPUs to improve the high-concurrency processing of hash table transactions and fast key-value access to data. However, since existing research on GPU hash table structures generally ignores the effective management of GPU resources, no methods are available for making full use of GPU thread resources and memory resources. Moreover, due to the limitation of the GPU memory size, the space for storing data on hash table structures is limited and therefore unable to handle hash table structures of larger scales. Thus, technical challenges remain for the scalability and performance optimization of the hash table designs for GPUs. This study proposes a hash table technology that can process massive concurrent transactions for GPUs and names it Starfish. Starfish includes a novel “swap layer” technology based on asynchronous GPU streams to support the dynamic hash table outside the GPU memory and also guarantee the high indexing performance of GPU hash tables. To solve the massive hash transaction conflict overhead resulting from access of large-scale GPU threads, we design a class of compact data structures and a pageable memory distribution method, which not only provides the high performance of a static hash method for the GPU-based hash table technology but also supports the high scalability of dynamic hash. Our experimental performance evaluation shows that Starfish is significantly superior to other GPU-based hash table technologies including cudpp-Hash and SlabHash.