I recently added an entry to CoreLogic’s Insights blog which examined realtor comments and property listings. The analysis was fairly high level, but provided an introduction regarding how information contained in listing agent comments could improve house price estimation via hedonic regression. Specifically, I regressed the log of house price against the number of bedrooms, bathrooms, living space, and a bag of words based on the realtor comments. More information can be found on CoreLogic’s Insights blog page.
The R code estimating the regression (using glmnet) and generating word clouds of the most important words can be found here.
I also created code to identify listings containing a specific word, and then print the entire realtor comment containing the word. This was useful in examining the context in which a word was used (i.e. “swamp” was used describe a cooling method– swamp coolers). Code can be found here.
comments powered by Disqus