stroke 发表于 2025-3-30 11:02:02

Accelerating FFT Using NEC SX-Aurora Vector Enginef maximizing the vector length usage of the algorithm and that adapting the algorithm to replace memory instructions with register shuffling operations can boost the performance of FFT-like computational kernels.

Urea508 发表于 2025-3-30 15:57:17

https://doi.org/10.1007/978-3-658-06850-9cus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering stra

Ligneous 发表于 2025-3-30 18:01:53

Die Geschichte der Kinderheilkundeacts the size and the frequency of data transfers in an application and visualizes them as a communication matrix. To demonstrate the tool in action, we present communication matrices and some statistics for two applications coming from machine translation and image classification domains.

积习已深 发表于 2025-3-31 00:10:11

http://reply.papertrans.cn/32/3166/316547/316547_54.png

intention 发表于 2025-3-31 01:47:13

http://reply.papertrans.cn/32/3166/316547/316547_55.png

Preamble 发表于 2025-3-31 08:18:09

http://reply.papertrans.cn/32/3166/316547/316547_56.png

arthroscopy 发表于 2025-3-31 12:13:13

Locality-Aware Scheduling of Independent Tasks for Runtime Systemsrators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient a
页: 1 2 3 4 5 [6]
查看完整版本: Titlebook: Euro-Par 2021: Parallel Processing Workshops; Euro-Par 2021 Intern Ricardo Chaves,Dora B. Heras,Laura Ricci Conference proceedings 2022 Spr