Date of Award

Spring 1-1-2011

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Elizabeth R. Jessup

Second Advisor

Jeremy G. Siek

Third Advisor

Xiao-Chuan Cai

Abstract

On modern processors, data transfer exceeds floating-point operations as the predominant cost in many linear algebra computations. For these memory-bound calculations, reducing data movement is often the only way to significantly increase their speed. One tuning technique that focuses on reducing memory accesses is loop fusion. However, determining the optimum amount of loop fusion to apply to a routine is difficult as fusion can both positively and negatively impact memory traffic. In this thesis, we perform an in depth analysis of how loop fusion affects data movement throughout the memory hierarchy. The results of this analysis are used to create a memory model for fused linear algebra calculations. The model predicts data movement throughout the memory hierarchy. Included in its design are runtime and accuracy tradeoffs based on our fusion research. The model's memory traffic predictions are converted to runtime estimates that can be used to compare loop fusion variants on serial and shared memory parallel machines. We integrate our model into a compiler where its predictions often reduce compile times by 99% or more. The kernel produced by the compiler with the model turned on are usually the same as the optimal kernel for the target architecture found by exhaustively testing all possible loop fusion combinations.

Share

COinS