Date of Award

Spring 1-1-2012

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Aaron Clauset

Second Advisor

Michael Mozer

Third Advisor

Vanja Dukic

Abstract

Many man-made and natural phenomenon, including the intensity of earthquakes, population of cities, and sizes of wars, are believed to follow power-law distributions, and the detection of these patterns has significant consequences for our understanding of the underlying mechanisms. However, the large fluctuations in the tail of these distributions makes it difficult to provide clear evidence for or against the power-law hypothesis, particularly when the empirical data have been binned. Clauset, Shalizi and Newman recently provided a statistically principled framework for identifying and testing power-law distributions in continuous or discrete valued data, based on maximum-likelihood fitting, goodness-of-fit test based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios for model comparison. We adapt these techniques to the less common but important case of binned empirical data. We evaluate the effectiveness of our techniques on synthetic data with known structure and apply them to ten real-world data sets with heavy-tailed patterns.

Share

COinS