Date of Award

Spring 12-8-2018

Document Type

Thesis

Degree Name

Master of Science (MS)

First Advisor

Michael J. Paul

Second Advisor

Chenhao Tan

Third Advisor

Qin Lv

Abstract

Bayesian inference methods for probabilistic topic models can quantify uncertainty in the parameters, which have primarily been used to increase the robustness of parameter estimates. In this dissertation, we explore other rich information that can be obtained by analyzing the posterior distributions in topic models. Experimenting with latent Dirichlet allocation on several datasets, we propose ideas incorporating information about the posterior distributions at the topic level, word level and document level. At the topic level, we propose a metric called topic stability that measures the variability of the topic parameters under the posterior. We show that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models. At the word level, we experiment with different methods for adjusting individual word probabilities within topics based on their uncertainty. Humans prefer words ranked by our adjusted estimates nearly twice as often when compared to the traditional approach. At the document level, we incorporate topics' variability over documents into the LDA-based document representations and observe it can significantly improve the performance of representations on classification tasks.

Available for download on Sunday, October 10, 2021

Share

COinS