Date of Award

Spring 6-5-2020

Document Type

Thesis

Degree Name

Master of Science (MS)

First Advisor

Vanja Dukic

Second Advisor

Jem Corcoran

Third Advisor

Manuel Lladser

Abstract

This paper focuses on constructing and analyzing different statistical models with respect to an Opendoor dataset from Atlanta during the second half of 2017. Opendoor is one of the iBuyers, an investment company that utilizes technologies along with decades of real estate human-experience to offer homeowners cash for their houses. They typically do minor repairs and maintenance, and then try to quickly re-list the home to sell it for a profit.[1] In this paper we analyzed several regression models including the Simple Linear Regression (SLR), Generalized Linear Model (GLM), and the Generalized Additive Model (GAM), to assess the effects of various house features on profit. The predictors include the preparation days for house listing on market, calendar quarters, zip code, and the square footage for houses. After comparing these models in different situations, we found the GAM with a linear function of square foot and a smoothing function of preparation days produced the best result. Secondly, we changed the response to be qualitative by converting the listed to sold days of houses into binary or binomial based on months. Then, we performed the general Logistic Regression and the GAM logistic model with respect to the binary response and fitted the multinomial logistic regression for the multiple categorical response. Unfortunately, we didn't get ideal results due to lack of observations. However, multinomial logistic regression is definitely a good approach to be discussed in the future with more observations of data. The third section is an introduction to survival analysis, where the attribute of bought to sold days was treated as the survival time and the covariates were square foot, bought prices and quarters. We mainly generated the Cox proportional-hazards model since the Gamma parametric survival model cannot be fulfilled for the three covariates. Unfortunately, the Cox model didn't present ideal results since the observations are terribly influential individually and some problematic outliers are poorly predicted by the model. Overall, although each model has its own features and advantages/disadvantages, we still need more analysis in the future if a larger set of data is provided so that the models might be improved.

Available for download on Friday, June 05, 2020

Share

COinS