Date of Award
Master of Science (MS)
Michael J. Paul
Bayesian inference methods for probabilistic topic models can quantify uncertainty in the parameters, which have primarily been used to increase the robustness of parameter estimates. In this dissertation, we explore other rich information that can be obtained by analyzing the posterior distributions in topic models. Experimenting with latent Dirichlet allocation on several datasets, we propose ideas incorporating information about the posterior distributions at the topic level, word level and document level. At the topic level, we propose a metric called topic stability that measures the variability of the topic parameters under the posterior. We show that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models. At the word level, we experiment with different methods for adjusting individual word probabilities within topics based on their uncertainty. Humans prefer words ranked by our adjusted estimates nearly twice as often when compared to the traditional approach. At the document level, we incorporate topics' variability over documents into the LDA-based document representations and observe it can significantly improve the performance of representations on classification tasks.
Xing, Linzi, "Analyzing Posterior Variability in Topic Models" (2018). Computer Science Graduate Theses & Dissertations. 167.
Available for download on Sunday, October 10, 2021