Cpr951 发表于 2025-3-30 12:10:43
http://reply.papertrans.cn/24/2331/233090/233090_51.pngDecongestant 发表于 2025-3-30 14:12:04
http://reply.papertrans.cn/24/2331/233090/233090_52.png压倒性胜利 发表于 2025-3-30 19:22:59
Enabling Hardware Affinity in JVM-Based Applications: A Case Study for Big Data JVM-based applications and gathering information about the underlying hardware topology. To demonstrate the functionality and benefits of our proposal, we have extended Flame-MR, our Java-based MapReduce framework, to provide support for setting CPU affinities through .. The experimental evaluation新鲜 发表于 2025-3-30 20:43:01
An Optimizing Multi-platform Source-to-source Compiler Framework for the NEURON MODeling Language. When comparing NMODL-generated kernels with NEURON we observe a speedup of up to 20., resulting in overall speedups of two different production simulations by .. When compared to SIMD optimized kernels that heavily relied on auto-vectorization by the compiler still a speedup of up to . is observedSPASM 发表于 2025-3-31 04:15:19
A Massively Parallel Algorithm for the Three-Dimensional Navier-Stokes-Boussinesq Simulations of thety of the code shows that increasing the number of sub-domains and processors from 4 to 1024, where each processor processes the subdomain of . internal points (. box), results in the increase of the total computational time from 120 s to 178 s for a single time step. Thus, we can perform a single tWITH 发表于 2025-3-31 06:15:27
Cache-Aware Matrix PolynomialsWe evaluate our approach on three different hardware platforms and for a wide range of different matrices and demonstrate that our approach achieves time savings of up to 50% for a large number of matrices. This is especially the case on platforms with large caches, significantly increasing the perf昏睡中 发表于 2025-3-31 09:28:10
http://reply.papertrans.cn/24/2331/233090/233090_57.png先行 发表于 2025-3-31 14:26:31
Improving Performance of the Hypre Iterative Solver for Uintah Combustion Codes on Manycore Architect MPI rank. This approach minimized OpenMP synchronization overhead, avoided slowdowns, performed as fast or (up to 1.5x) faster than Hypre’s MPI only version, and allowed the rest of Uintah to be optimized using OpenMP. Profiling of the GPU version of Hypre showed the bottleneck to be the launch ov没收 发表于 2025-3-31 17:59:40
http://reply.papertrans.cn/24/2331/233090/233090_59.pngnocturia 发表于 2025-4-1 00:45:09
Enabling EASEY Deployment of Containerized Applications for Future HPC Systemshereby enhancing specific characteristics of their codes. We introduce the framework with a Charliecloud-based solution, showcasing the LULESH benchmark on the upper layers of our framework. Our approach can automatically deploy optimized container computations with negligible overhead and at the sa