Date of Award
Doctor of Philosophy (PhD)
Michael C. Mozer
Content-style decompositions, or CSDs, decompose entities into content, defined by the entity's class, and style, defined as the remaining within-class variation. Content is typically defined in terms of some task. For example, in a face recognition task, identity is the content; in an emotion recognition task, expression is the content. CSDs have many applications: they can provide insight into domains where we have little prior knowledge of the sources of within- and between-class variation, and content-style recombinations are interesting as a creative exercise or for data set augmentation. Our approach is to decompose CSD discovery into two sub-problems: (1) to find useful representations of content that capture the class structure of the domain, and (2) to use those content-representations to discover CSDs. We make contributions to both sub-problems. First, we propose the F-statistic loss, a new method for discovering content representations that uses statistics of class separation on isolated embedding dimensions within a minibatch to determine when to terminate training. In addition to state-of-the-art performance on few-shot learning, we find that the method leads to factorial (also known as disentangled) representations of content when applied with a novel form of weak supervision. Previous work on disentangling is either unsupervised or uses a factor-aware oracle, which provides similar/dissimilar judgments with respect to a named attribute/factor. We explore an intermediate form of supervision, an unnamed-factor oracle, which provides similarity judgments with respect to a random unnamed factor. We demonstrate that the F-statistic loss leads to better disentangling when compared with other deep-embeddings losses and β-VAE, a state-of-the-art unsupervised disentangling method. Second, we introduce a new loss for discovering CSDs that can arbitrarily recombine content and style, called leakage filtering. In contrast to previous research which attempts to separate content and style in two different representation vectors, leakage filtering allows for imperfectly disentangled representations but ensures that residual content information will not leak out of the style representation and vice versa. Leakage filtering is also distinguished by its ability to operate on novel content-classes and by its lack of dependency on style labels for training. The recombined images produced are of high quality and can be used to augment datasets for few-shot learning tasks, yielding significant generalization improvements.
Ridgeway, Karl F., "Content-Style Decomposition: Representation Discovery and Applications" (2018). Computer Science Graduate Theses & Dissertations. 193.