Date of Award

Spring 1-1-2019

Document Type


Degree Name

Doctor of Philosophy (PhD)

First Advisor

Per-Gunnar Martinsson

Second Advisor

Stephen Becker

Third Advisor

Gregory Beylkin

Fourth Advisor

Gregorio Quintana-Ortí

Fifth Advisor

Christian Ketelsen


This thesis describes a set of randomized algorithms for computing rank revealing factorizations of matrices. These algorithms are designed specifically to minimize the amount of data movement required, which is essential to high practical performance on modern computing hardware. The work presented builds on existing randomized algorithms for computing low-rank approximations to matrices, but essentially ex- tends the range of applicability of these methods by allowing for the efficient decomposition of matrices of any numerical rank, including full rank matrices. In contrast, existing methods worked well only when the numerical rank was substantially smaller than the dimensions of the matrix.

The thesis describes algorithms for computing two of the most popular rank-revealing matrix decom- positions: the column pivoted QR (CPQR) decomposition, and the so called UTV decomposition that factors a given matrix A as A = UTV∗, where U and V have orthonormal columns and T is triangular. For each algorithm, the thesis presents algorithms that are tailored for different computing environments, including multicore shared memory processors, GPUs, distributed memory machines, and matrices that are stored on hard drives (“out of core”).

The first chapter of the thesis consists of an introduction that provides context, reviews previous work in the field, and summarizes the key contributions. Beside the introduction, the thesis contains six additional chapters:

Chapter 2 introduces a fully blocked algorithm HQRRP for computing a QR factorization with col- umn pivoting. The key to the full blocking of the algorithm lies in using randomized projections to create a low dimensional sketch of the data, where multiple good pivot columns may be cheaply computed. Nu- merical experiments show that HQRRP is several times faster than the classical algorithm for computing a column pivoted QR on a multicore machine, and the acceleration factor increases with the number of cores.

Chapter 3 introduces randUTV, a randomized algorithm for computing a rank-revealing factorization

of the form A = UTV∗, where U and V are orthogonal and T is upper triangular. RandUTV uses random- ized methods to efficiently build U and V as approximations of the column and row spaces of A. The result is an algorithm that reveals rank nearly as well as the SVD and costs at most as much as a column pivoted QR.

Chapter 4 provides optimized implementations for shared and distributed memory architectures. For shared memory, we show that formulating randUTV as an algorithm-by-blocks increases its efficiency in parallel. The fifth chapter implements randUTV on the GPU and augments the algorithm with an over- sampling technique to further increase the low rank approximation properties of the resulting factorization. Chapter 6 implements both randUTV and HQRRP for use with matrices stored out of core. It is shown that reorganizing HQRRP as a left-looking algorithm to reduce the number of writes to the drive is in the tested cases necessary for the scalability of the algorithm when using spinning disk storage. Finally, chapter 7 discusses an alternative use for randUTV as a nuclear norm estimator and measures the acceleration gained from trimming down the algorithm when only singular value estimates are required.