Date of Award

Spring 1-10-2010

Document Type


Degree Name

Master of Science (MS)


Computer Science

First Advisor

James H. Martin

Second Advisor

Elizabeth R. Jessup

Third Advisor

James H. Curry


The complex interactions of software licensing and intellectual property prove daunting hurdles for many individuals and businesses looking to open source software solutions. The financial reproductions for misusing a piece of open source software is high, and require great attention. Many resources are required to determine a software packages copyright holders and licensing information.

The cost of such an analysis may become too costly to justify the use of the open source solution. The existing tools for analyzing software projects licenses and copyrights are lacking, and much hand vetting is required. If these tool could be improved then free and open source software would be more transparent and less costly to companies and individuals looking for open source alternative.

This thesis describes a new approach to automated software license analysis and copyright analysis, which results show are more accurate and easier to maintain than previous methods. The use of machine learning and information extraction result in algorithms that produce abstract models of software licenses and copyrights based on hand labelled data. We will show that these models are more general and robust than previous techniques, and result in better accuracy.