Date of Award

Spring 1-1-2019

Document Type


Degree Name

Master of Science (MS)

First Advisor

Michael C. Mozer

Second Advisor

Stephen R. Becker

Third Advisor

William Kleiber


In the modern digital environment, many data sources can be characterized as event sequences. These event sequences describe a series of events and an associated time of occurrence. Examples of event sequences include: the call log from a cell phone, an online purchase history, or a trace of musical selections. The influx of data has led many researchers to develop deep architectures that are able to discover event sequence patterns and predict future sequences. Many of these have a tendency to either discard temporal data and treat the sequence as if all events are spaced equally (e.g. LSTM, GRU). There has also been previous work attempting to treat the temporal data as continuous, (e.g., CT-GRU), but this work was unable to show a benefit over the LSTM or GRU networks with temporal data appended to the input in prediction or classification.

We propose a Lifetime-Limited Memory (LLM) architecture that operates under the notion that all information within a sequence is relevant for only a finite time period. The age, then, is used to determine how much of the memory should be retained via a hierarchy of leaky integrators with log linear spaced time constants. As the network trains, each cell linearly mixes the information from the different timescales, and determines the most relevant time scales for each event. We believe that this architecture will be better equipped to handle this specific class of tasks then more traditional methods because it incorporates temporal dynamics into its neuron

activation functions and permits the storage and utilization of information at multiple time scales.

In this paper, we performed experiments on the LLM network alongside the LSTM net with the appended time data to determine strengths and weaknesses of the LLM net. We find that the LLM net to be better formulated to tasks associated with two natural datasets we tested on, the LSTM net to perform better on two other datasets, and that the networks performed similarly on three other datasets. We find potential upside to using this architecture, but are unable to show better performance across the board.