Date of Award

Spring 1-1-2017

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Aaron Clauset

Second Advisor

Brian Keegan

Third Advisor

Hanna Wallach

Fourth Advisor

Michael Mozer

Fifth Advisor

Jordan Boyd-Graber


As social behavior moves increasingly online, the study of social behavior has followed. Online traces of social systems, whether to study online behavior directly or the online traces of offline activity, have made possible previously unavailable empirical analyses of people, groups and organizations. However, practically observing any social system is nontrivial: even if we can directly instrument and measure the social constructs we wish to study, we will still observe this through the lens of the system itself. We inherit effects due to the design and history of the platform, the ecology of other online systems, the measurement tool and pre-processing of our data, and the assumptions of our models. At the same time, organizations represent a fundamental unit of human social behavior. Then, to understand social behavior, we must understand how the size, boundaries, and context of organizations impact social relationships within them. I focus on this boundary of online systems and offline activity in organizations. We exploit heterogeneities across populations of social networks to explore the boundary of online systems, online social behavior, and offline activity across different organizations. I discuss empirical work exploring how offline behavior is reflected in online systems, and conversely, how an online system relates to offline outcomes. We then turn to the relationship between the measurement of networks from online data and past work on network structure and evolution.

In this dissertation, I develop a comparative structural perspective to tease apart the roles of these exogenous and endogenous processes on network structure. Using populations of comparable networks, I explore the roles of individual social strategies, organizational environments, and network construction on network structure. First, I explore how the unique timing and setting of Facebook's initial expansion to universities afforded a natural experiment, revealing differences in social strategies and network growth, and we explore empirical network scaling in this population of networks. We find that the social strategies employed by students who only interacted online differed from those who had interacted in the offline world. Second, I explore a vaunted tradition of organization theory---relating a firm's informal network structure to firm performance---using a novel email network data set across a population of large firms. In this setting, I explore the previously untested heterogeneity of firms and the relationships between organization size, organization context and social network structure. There, we find a surprising amount of heterogeneity across firm types, and a lack of relationship between network structure and firm performance. We find novel scaling results, including a lack of relationship between the size of a firm and an individual's number of contacts, but find that the formal geographic structure of an organization increases bottlenecks in communication across firms. Finally, reflecting on the challenges of working with social networks drawn from interaction data, I explore the connections between network construction and network evolution. To put these connections in perspective, I visit the theory of weak ties, network stability and network densification using this lens. We find evidence to confirm, reject, and suggest novel hypotheses in this literature. We find, for example, that network densification can appear as an artifact of total activity within the observed system.

The comparative approach is uncontroversial but novel in the empirical study of networks, organization theory, and computational social science. In this context, the comparative approach allows us to compare empirical scaling properties to results from random graph theory. Using networks bounded by organizations and platforms, we can leverage the boundaries of online systems to relate covariates at the platform-, organization-, or network-l