Date of Award

Spring 1-1-2019

Document Type


Degree Name

Master of Science (MS)



First Advisor

Edward Van Wesep

Second Advisor

Diego Garcia

Third Advisor

J. Anthony Cookson


This project seeks to highlight the difficulties of employing required annual reports in analyses attempting to tie left-hand side outcomes, whether past or future, through the use of Natural Language Processing techniques to analyze firm discussion of regulators, regulation, laws, and other regulatory regimes in the context of required 10-K annual disclosures under U.S. public company reporting regimes governed by the U.S. Securities and Exchange Commission (SEC). While Natural Language Processing (NLP) techniques have gained popularity in turning text into data facilitating a multitude of varied new analytical projects in the fields of academic corporate research, using NLP mechanisms particularly for regulation-oriented corporate speech analysis presents relative uniformity across filers and industries which this small project seeks to highlight as a possible burden for NLP usage in this particular extension of the legal and financial reporting analyses. The small sample, simple word lists, and naive comparison employed here seeks to highlight, through simple methodology, that even an industry-by-industry analysis method may be less than meaningful in addressing regulatory, industry standard, and legal practice nuances defining regulatory discussion due, at least in part, to uniform reporting summaries (relative uniformity) across firms, not only those participating in a single industry, but across SEC annual filers generally. This project will conclude with a brief summary of possible causal factors in the highlighted report uniformities while refraining from any implication that regulatory or legal comparison is or should be among the comparative factors for which design and response uniformity guidelines for 10-K reports are generated or sought by the SEC for use by potential or current investors as an indicative factor of firm performance or future performance.

Included in

Finance Commons