Model

This page details the “quirks” of the model used for this website.

Predictions

This sections talks about the predictions made by the model.

The predictions were created with the add_epred_draws() function from the tidybayes package. While still having some inaccuracy, this function is generally more accurate than its counterparts.

Here are some graphs detailing the predictions:

epred_mean is the predicted house prices.

Models

Model 1

This section talks about both models that are used in this website.

The first (and more accurate) model is a BRM model (Bayesian Regression Model). This is the model that is used for all of the graphs. The model uses a formula that includes all the variables in the data to predict how much each variable affects the total price of the house. The formula is \[ \begin{align*} \text{SalePrice} = \mathcal{N}(\mu_i, \sigma^2) \\ \mu_i = \beta_0 + \beta_1 \cdot \text{LotFrontage}_i + \beta_2 \cdot \text{LotArea}_i + \beta_3 \cdot \text{OverallQual}_i + \beta_4 \cdot \text{OverallCond}_i + \beta_5 \cdot \text{YearBuilt}_i + \beta_6 \cdot \text{YearRemodAdd}_i \\ + \beta_7 \cdot \text{MasVnrArea}_i + \beta_8 \cdot \text{BsmtFinSF1}_i + \beta_9 \cdot \text{BsmtFinSF2}_i + \beta_{10} \cdot \text{BsmtUnfSF}_i + \beta_{11} \cdot \text{TotalBsmtSF}_i \\ + \beta_{12} \cdot \text{X1stFlrSF}_i + \beta_{13} \cdot \text{X2ndFlrSF}_i + \beta_{14} \cdot \text{GrLivArea}_i + \beta_{15} \cdot \text{BsmtFullBath}_i + \beta_{16} \cdot \text{BsmtHalfBath}_i \\ + \beta_{17} \cdot \text{FullBath}_i + \beta_{18} \cdot \text{HalfBath}_i + \beta_{19} \cdot \text{BedroomAbvGr}_i + \beta_{20} \cdot \text{KitchenAbvGr}_i + \beta_{21} \cdot \text{TotRmsAbvGrd}_i \\ + \beta_{22} \cdot \text{Fireplaces}_i + \beta_{23} \cdot \text{GarageYrBlt}_i + \beta_{24} \cdot \text{GarageCars}_i + \beta_{25} \cdot \text{GarageArea}_i \\ + \beta_{26} \cdot \text{WoodDeckSF}_i + \beta_{27} \cdot \text{OpenPorchSF}_i + \beta_{28} \cdot \text{EnclosedPorch}_i + \beta_{29} \cdot \text{X3SsnPorch}_i + \beta_{30} \cdot \text{ScreenPorch}_i \\ + \beta_{31} \cdot \text{PoolArea}_i + \beta_{32} \cdot \text{MiscVal}_i \end{align*} \] Where:

\(\text{SalePrice}_i\) is the price of the house.

\(u_i\) is the linear predictor for the \(i\)-th house.

\(\beta_{0},\beta_{1},\beta_{2}...\beta_{32}\) are the coefficients.

\(\sigma^2\) is the variance of the Gaussian distribution.

Here’s a summary of the model:

Characteristic

Beta

95% CI

1
LotFrontage -0.81 -20, 18
LotArea 0.24 -0.01, 0.48
OverallQual 1.9 -18, 22
OverallCond 0.21 -19, 20
YearBuilt 32 13, 52
YearRemodAdd 24 3.4, 43
MasVnrArea 36 25, 48
BsmtFinSF1 23 12, 34
BsmtFinSF2 -0.23 -14, 13
BsmtUnfSF 10 -1.1, 21
TotalBsmtSF 32 20, 44
X1stFlrSF 23 10, 36
X2ndFlrSF 27 16, 39
GrLivArea 40 29, 52
BsmtFullBath 0.08 -19, 20
BsmtHalfBath 0.22 -19, 20
FullBath 0.40 -19, 21
HalfBath 0.16 -19, 20
BedroomAbvGr -0.28 -20, 19
KitchenAbvGr -0.21 -21, 20
TotRmsAbvGrd 0.03 -20, 20
Fireplaces 0.41 -20, 21
GarageYrBlt 22 2.3, 41
GarageCars 0.43 -20, 20
GarageArea 65 54, 76
WoodDeckSF 30 15, 44
OpenPorchSF 13 -4.5, 30
EnclosedPorch -16 -33, 1.2
X3SsnPorch 2.8 -16, 22
ScreenPorch 9.2 -8.2, 26
PoolArea -7.8 -26, 11
MiscVal -1.3 -5.8, 3.2
1

CI = Credible Interval

And these are the values of the coefficients:

                   Estimate    Est.Error          Q2.5         Q97.5
Intercept     -1.583418e+05 3.304375e+04 -2.217716e+05 -9.187376e+04
LotFrontage   -8.130507e-01 9.750489e+00 -1.979193e+01  1.832144e+01
LotArea        2.369383e-01 1.256924e-01 -1.052083e-02  4.849382e-01
OverallQual    1.913311e+00 9.955828e+00 -1.829744e+01  2.212766e+01
OverallCond    2.098226e-01 9.838119e+00 -1.935090e+01  1.961939e+01
YearBuilt      3.191768e+01 9.919258e+00  1.252290e+01  5.170135e+01
YearRemodAdd   2.367911e+01 9.868182e+00  3.433188e+00  4.277861e+01
MasVnrArea     3.600668e+01 5.870588e+00  2.463411e+01  4.757421e+01
BsmtFinSF1     2.302873e+01 5.716336e+00  1.214155e+01  3.428110e+01
BsmtFinSF2    -2.302255e-01 6.823848e+00 -1.378719e+01  1.294171e+01
BsmtUnfSF      1.004742e+01 5.650464e+00 -1.085197e+00  2.089712e+01
TotalBsmtSF    3.215961e+01 6.041360e+00  2.040138e+01  4.405331e+01
X1stFlrSF      2.277109e+01 6.614810e+00  1.028781e+01  3.562506e+01
X2ndFlrSF      2.744346e+01 6.073054e+00  1.557852e+01  3.936512e+01
GrLivArea      4.015883e+01 6.032323e+00  2.852445e+01  5.188612e+01
BsmtFullBath   8.142487e-02 9.905329e+00 -1.885224e+01  1.956437e+01
BsmtHalfBath   2.163349e-01 9.898350e+00 -1.926754e+01  2.002975e+01
FullBath       3.964888e-01 1.031029e+01 -1.912795e+01  2.090468e+01
HalfBath       1.588439e-01 9.933301e+00 -1.911647e+01  1.957041e+01
BedroomAbvGr  -2.820032e-01 1.008818e+01 -2.005821e+01  1.930324e+01
KitchenAbvGr  -2.149270e-01 1.040331e+01 -2.086868e+01  1.975379e+01
TotRmsAbvGrd   2.708342e-02 1.009653e+01 -1.955919e+01  1.962530e+01
Fireplaces     4.105798e-01 1.017830e+01 -1.997695e+01  2.062880e+01
GarageYrBlt    2.151385e+01 9.930975e+00  2.256832e+00  4.090899e+01
GarageCars     4.308454e-01 1.010864e+01 -1.952893e+01  2.017322e+01
GarageArea     6.493535e+01 5.656480e+00  5.394206e+01  7.598057e+01
WoodDeckSF     2.965804e+01 7.309662e+00  1.513341e+01  4.416553e+01
OpenPorchSF    1.303416e+01 8.808714e+00 -4.490927e+00  2.965228e+01
EnclosedPorch -1.595668e+01 8.871181e+00 -3.325431e+01  1.244544e+00
X3SsnPorch     2.757090e+00 9.732680e+00 -1.624005e+01  2.190627e+01
ScreenPorch    9.165835e+00 8.905417e+00 -8.157776e+00  2.597299e+01
PoolArea      -7.777164e+00 9.509116e+00 -2.561103e+01  1.130821e+01
MiscVal       -1.341129e+00 2.287116e+00 -5.790779e+00  3.192435e+00

I’ve created a few graphs for this model, to show the “tech” stuff behind it.

Residuals are the difference between predicted and actual values.

Model 2

The other (also BRM) model that was used was for only the House price estimator. This too was a BRM model, but it had fewer variables, thus making it less accurate. This was due to needing to simplify the model so the variables would be easier to type in (4 convenient inputs is much better than 32 estranged ones). However, the math is still the same pattern. Observe: \[ \begin{align*} \text{SalePrice}_i = \mathcal{N}(\mu_i, \sigma^2) \\ \mu_i = \beta_0 + \beta_1 \cdot \text{LotArea}_i + \beta_2 \cdot \text{OverallQual}_i + \beta_3 \cdot \text{GrLivArea}_i + \beta_4 \cdot \text{YearBuilt}_i \end{align*} \] Where:

\(\text{SalePrice}_i\) is the price of the house.

\(u_i\) is the linear predictor for the \(i\)-th house.

\(\beta_{0}...\beta_{4}\) are the coefficients.

\(\sigma^2\) is the variance of the Gaussian distribution.

Here’s a summary of the model:

Characteristic

Beta

95% CI

1
LotArea 0.77 0.48, 1.1
OverallQual 2.1 -18, 22
GrLivArea 96 90, 101
YearBuilt 43 24, 62
1

CI = Credible Interval

And these are the values of the coefficients:

                 Estimate    Est.Error          Q2.5         Q97.5
Intercept   -5.726317e+04 1.961196e+04 -9.456801e+04 -17781.804972
LotArea      7.741071e-01 1.503027e-01  4.767223e-01      1.060128
OverallQual  2.076136e+00 1.011661e+01 -1.777991e+01     21.996001
GrLivArea    9.554055e+01 2.727789e+00  9.023605e+01    100.788161
YearBuilt    4.322867e+01 9.724108e+00  2.411546e+01     61.841288