The Role of Time in Machine Perception

Iuzzolino, Michael Louis

Graduate Thesis Or Dissertation

The Role of Time in Machine Perception Public Deposited

Analytics

Citations

Citeable URL: https://scholar.colorado.edu/concern/graduate_thesis_or_dissertations/j67315166

Abstract

Artificial neural networks have become ubiquitous for machine perception, yielding unprecedented performance across an ever-increasing range of tasks and domains. However, the role of time in machine vision has been largely unexplored in machine learning and computer vision communities. To investigate the role of time in machine vision, we situate our work in the context of real-world information processing, where the passage of time is inherent and any adaptive system operating in this context is necessarily put in the position of making speed-accuracy trade offs. The task of providing predictions that improve with additional computation steps is primary to anytime prediction, and this dissertation addresses the role of time in anytime prediction, which requires models to continually produce output and for the quality of the output to improve with additional computation steps.In this work, we first describe mechanisms of temporal information processing from a neuroscientific perspective, followed by a survey of the distinct utilizations of time in current machine perception systems. Our survey is organized by delineating the space of methods along two dimensions: time and architecture. We specify three units of time --- internal, external, and computational --- and three architecture types --- feedforward, feedback, and cascaded.Our first thread of research, inspired by Helmholtz’s notion of unconscious inference--the theory that human vision operates on ambiguous data and perception therefore requires completion or interpretation to understand the world--investigates attractor networks for image completion and super-resolution tasks. We propose a novel convolutional bipartite network (CBAN) architecture that enables attractor networks to operate on higher-dimensional image data; specifically, we propose convolutional weight constraints, novel loss functions, and methods for preventing vanishing/exploding gradients. We demonstrate that CBAN achieves results on par with other state-of-the-art methods for image completion and super-resolution tasks.Next, we leverage the cascaded dynamics and massively parallel nature of biological brains to propose Cascaded Networks for anytime prediction, along with introducing a novel application of temporal difference (TD) learning to classification tasks. We demonstrate the superiority of the cascaded parallel architecture to serial architectures, indicating that parallelism can be exploited in a way not previously explored. We also show that the TD objective encourages the most accurate response as quickly as possible for both serial and cascaded regimes, with TD-trained cascaded models obtaining strictly superior speed-accuracy profiles compared to previously proposed anytime prediction models, all of which are based on a serial architecture.Motivated by resource-constrained environments, we then extend this work to the low-capacity model regime via cascaded distillation and show that cascaded distillation offers superior performance over standard distillation. Lastly, we leverage TD for fast pose estimation, demonstrating that iterative estimation via top-down recurrence coupled with TD yields significant improvements to low-capacity pose estimation models.All together, the three research threads outlined above all drive toward the same conclusion: designing model architectures and training objectives that account for the passage of time can yield speed ups for a given level of accuracy or accuracy improvements for a given amount of computational resources.

Creator