致词 发表于 2025-3-26 22:08:34
Environmental Microbial Evolutionto a static scheduling algorithm for a streaming task graph application with parallelizable tasks and solve the resulting combined optimization problem by an integer linear program (ILP). We demonstrate the improvements by our strategy with ARM big and LITTLE soft cores and synthetic task graphs.allergy 发表于 2025-3-27 04:16:37
Combining Design Space Exploration with Task Scheduling of Moldable Streaming Tasks on Reconfigurablto a static scheduling algorithm for a streaming task graph application with parallelizable tasks and solve the resulting combined optimization problem by an integer linear program (ILP). We demonstrate the improvements by our strategy with ARM big and LITTLE soft cores and synthetic task graphs.道学气 发表于 2025-3-27 06:43:50
Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Acceleratort throughput gains compared to existing solutions. With achieved throughputs exceeding 300 Million items/s, we report average speedups of 20x compared to typical software implementations, 1.5x compared to GPU-accelerated implementations, and 1.8x compared to the fastest FPGA implementation.思考而得 发表于 2025-3-27 09:47:11
Exploiting 3D Memory for Accelerated In-Network Processing of Hash Joins in Distributed Databasese system. As the hash-join algorithm used for high performance needs to maintain a large state, it would overtax the capabilities of conventional software-programmable switches..The paper shows that across eight 10G Ethernet ports, the single HBM-FPGA in our prototype can not only keep up with the dcognizant 发表于 2025-3-27 15:05:40
Timing Optimization for Virtual FPGA Configurations the operating frequency of a . or a . ZUMA architecture of up to . and . for individual benchmarks, and by . and . on average. Our results would also scale accordingly should future research uncover new potential to reduce the area cost further.消瘦 发表于 2025-3-27 18:01:52
Hardware Based Loop Optimization for CGRA Architecturesrom various application domains, the design could achieve a maximum of 1.9. and an average of 1.5. speed-up against the conventional approach. The total number of instructions executed is reduced to half for almost all the kernels with an area and power consumption overhead of 2.6% and 0.8% respectiAbominate 发表于 2025-3-28 01:36:51
Supporting On-Chip Dynamic Parallelism for Task-Based Hardware Acceleratorsit also encompasses the efficient on-chip exchange of parameter values and task results between parent and child accelerator tasks. Our solution is able to handle recursive task structures and is shown to have latency reductions of over 35x compared to the prior approaches.胆大 发表于 2025-3-28 02:33:44
Multi-layered NoCs with Adaptive Routing for Mixed Criticality Systemson shorter or longer hop paths. An adaptive congestion avoidance feature is integrated. Without congestion awareness, the proposed algorithm which utilizes multiple layers has upto 38% decrease in latency and with congestion awareness has upto 56% decrease in latency compared to the popular XY routi小步舞 发表于 2025-3-28 06:24:09
http://reply.papertrans.cn/17/1601/160096/160096_39.pngenchant 发表于 2025-3-28 11:36:55
StreamGrid - An AXI-Stream-Compliant Overlay Architecturency. The fastest configuration of the overlay architecture has a maximum clock frequency of 752 MHz on a Xilinx Alveo U280 FPGA Card. Furthermore, a case study of a database query engine is evaluated and compared to a static design with the same functionality. The raw execution performance is compar