Graduate Thesis Or Dissertation

 

Algorithmic Construction and Stochastic Analysis of Optimal Automata for Generalized Strings Público Deposited

Contenido Descargable

Descargar PDF
https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/7w62f8498
Abstract
  • In many applications, the need arises to search a text for appearances of a given set of keywords. As an example, in bioinformatics one may wish to search a DNA sequence to find so-called biological motifs. A standard approach to this problem is to leverage a deterministic finite automaton---a graph structure which is traversed as letters of the text are read in. However, depending on the number and length of the keywords being sought in the text, the graph may be too large to fit in computer memory, making this approach fruitless.In this thesis, we first present a novel algorithm that, under the assumption that the keywords take the form of a so-called generalized string, constructs the minimal DFA recognizing those keywords. Importantly, the algorithm is iterative and allows one to build the automaton directly, without any use of buffer memory. Not only does this mean that the algorithm is efficient regarding memory consumption, but it also provides useful insight to help facilitate analysis for the size of such DFA.Using this new algorithm and pairing it with the assumption that the generalized strings are drawn at random from some class of probability distributions, we develop bounds on the size of the minimal automaton that are true with high probability. Furthermore, using synthetic data, we provide evidence that the size of the minimal automaton grows linearly in expectation for many cases.
Creator
Date Issued
  • 2018
Academic Affiliation
Advisor
Committee Member
Degree Grantor
Commencement Year
Subject
Última modificación
  • 2019-11-17
Resource Type
Declaración de derechos
Language

Relaciones

Elementos