# Uncertainty in choice modeling

Jump to: navigation, search

# Introduction

Econometric models of consumer choice have been used to drive engineering optimization models toward profitable designs. However, there is uncertainty in every model that must be accounted for. There are various statistical methods available for quantifying the uncertainty in these choice models in order to determine how the uncertainty affects design decisions. The method of determining uncertainty depends on the method used to optimize the econometric model. For most choice models, the method of maximum likelihood (ML) is used. However, an introduction into the uncertainty associated with simpler ordinary least squares (OLS) methods yields insight into statistical methods such as hypothesis testing and determining confidence intervals, which are also useful for maximum likelihood estimation.

# Ordinary Least Squares Estimation

A simple two variable linear regression equation is modeled by the parameters $\hat\beta_1$ and $\hat\beta_2$ such that,

$Y_i=\hat \beta_1 + \hat\beta_2X_i + \hat u_i$

where $\hat u_i$ is considered to be the error in the estimate. We seek to find the solution that minimizes the square of the error term, ${\hat u_i}^2$. In other words, we want to find the $\hat \beta$ values that satisfy: $min(\sum{{\hat u_i}^2})$, which can also be written as:

$min(\sum({Y_i - \hat\beta_1 - \hat\beta_2X_i})^2)$

Because the goal is to create the best model by minimizing the square of the error term, this method is called ordinary least squares (OLS) estimation. The uncertainty in the model is measured in terms of the individual uncertainties of the $\hat \beta$ values found. The uncertainty in the $\hat \beta$ values is measured in terms of standard error, calculated by the equations below, for a two variable linear regression model.

$se(\hat\beta_1)=\sqrt{\frac{\sum{X_i}^2}{n\sum{x_i}^2}}\sigma$
$se(\hat\beta_2)=\frac{\sigma}{\sqrt{\sum{x_i}^2}}$

## Hypothesis Testing

Once the standard errors are found, hypothesis testing can be used to determine the validity of $\hat\beta_1$ and $\hat\beta_2$. The t-statistic is most widely used as a means of quantifying the uncertainty in choice modeling. The reason the t-statistic is used is due to the fact that it is applicable for small sample sizes, where the assumption of normality and the associated z-test is incorrect. Additionally, the t distribution approaches the normal distribution as the number of degrees of freedom increases. For $\hat\beta_2$, the t-statistic is given by:

$t = \frac{\hat\beta_2 - \beta_2}{se(\hat\beta_2)}$,

where β2 is called the null hypothesis. It is common practice to set β2 = 0 and solve for t. This value of t is compared to tα / 2, which comes from the t-distribution and is usually looked up in a table of t-statistics based on the confidence level desired (1 - α) and the degrees of freedom in the problem. The degrees of freedom is equivalent to n-k, where n is the number of data points available and k is the number of parameters used to estimate the model. Since the discussion above is limited to a two variable model, k=2 in this case. If | t | > tα / 2, then the null hypothesis is rejected and β2 is considered to be "statistically significant". If | t | < tα / 2, then the null hypothesis is not rejected and β2 = 0 is a sufficient hypothesis so β2 is considered to be "statistically insignificant." The t-test can be used in the same way to measure the validity of each parameter in the model. Hypothesis testing is basically a way of determing whether the parameter being tested should be included in the model or not.

## Confidence Intervals

Another way to quantify the error in a model is to use confidence intervals to determine the true range into which the value of each parameter should fall. Under the assumption that the population error term is normally distributed, the estimated standard errors can be used to create confidence intervals and conduct hypothesis tests about the population parameters β1 and β2. A 100(1-α)% confidence interval for β is measured by: $\beta \pm t_{\alpha/2}se(\beta)$.

A confidence interval is a measure of the probability that the parameters obtained from a sample distribution, $\hat{\beta}$, can predict the true value of the paramter, β, within a range that is specified from the CDF of the t distribution. It is common practice to present the data in the form of a 95% confidence interval. Below is a quick derivation from the calculation of the t statistic to the construction of a 95% confidence interval.

$t = \frac{\hat{\beta}-\beta}{se(\hat{\beta})}$
$.95 =P\left(-t_{.025} < \frac{\hat{\beta}-\beta}{se(\hat{\beta})}
$.95 =P\left(-t_{.025}se(\hat{\beta}) < \hat{\beta}-\beta< t_{.025}se(\hat{\beta})\right)$
$.95 =P\left(-\hat{\beta}-t_{.025}se(\hat{\beta}) < -\beta < -\hat{\beta}+t_{.025}se(\hat{\beta})\right)$
$.95 =P\left(\hat{\beta}-t_{.025}se(\hat{\beta}) < \beta < \hat{\beta}+t_{.025}se(\hat{\beta})\right)$

The width of the confidence interval is dependent on the standard error of the parameter. It is important to point out that the larger the standard error, the wider the confidence interval is and the less confident we are in our value for the parameter. Hypothesis testing can also be done with the use of confidence intervals. If the null hypothesis falls outside of the range specified by the confidence then it is rejected. As previously mentioned, the null hypothesis is often times set equal to zero to determine the statistical significance of a parameter in the model. So, if zero does not fall within the range of the confidence interval, then the null hypothesis is rejected and the parameter in question is considered to be statistically significant to the model.

# Maximum Likelihood Estimation

A Maximum likelihood estimator (MLE) is a statistical method to fit an assumed functional form in a probabilistic model to observed data.

The maximum likelihood approach is commonly used to fit simple discrete choice models such as the logit model; however, it can be impractical for fitting discrete choice models with greater complexity, and Bayesian estimation is typically called upon for such cases. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide an optimum fit. The biggest benefit of the MLE method over OLS techniques is that MLE is not limited to just linear models. For many discrete choice modeling studies, the MLE method is preferred.

The Likelihood equation is given by:
$L(\theta) = f(y_i ; \theta) = \prod_{i=1}^{N} f(y_i ; \theta)$,
where θ is the parameter used to describe the model and f(yi;θ) describes the probability density function for a random variable y, conditioned on the set of parameters θ. The likelihood equation L(θ), is then joint density of n independent and identically distributed observations, or the product of the individual densities. Because it is mathematically simpler, the log of the likelihood equation is used. This formula is given by:
$LL = \log L (\theta) = \sum_{i=1}^{N} LL_i (\theta)$

## Asymptotic Properties of Maximum Likelihood Estimation

The MLE is asymptotically unbiased, i.e., its bias tends to approach zero as the number of samples increases to infinity.
The MLE is asymptotically efficient, i.e., it achieves the Cramer Rao Lower Bound[1] when the number of samples tends to infinity. This means that, asymptotically, no unbiased estimator has lower mean squared error than the MLE.
The MLE is asymptotically normal. As the number of samples increases, the distribution of the MLE tends to the Gaussian distribution with mean θ and covariance matrix equal to the inverse of the Fisher information matrix[2].

## Uncertainty in Maximum Likelihood Estimation

The Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ upon which the likelihood function of θ, L(θ) = f(X;θ), depends. The likelihood function is the joint probability of the data, the Xs, conditional on the value of θ, as a function of θ. Since the expected value of the score [3] is zero, the variance is simply the second moment of the score, the derivative of the log of the likelihood function with respect to θ. Hence the Fisher information matrix can be written

$\mathcal[I(\theta)]=\mathrm{E}\left[\frac{\partial^2\ln L(\theta)}{\partial\theta \partial\theta'}\right].$

If the following condition is met,

$\int \frac{\partial^2}{\partial \theta^2}f(X ; \theta ) \, dx = 0,$

then the information matrix can be written as

$\mathcal[I(\theta)]=-\mathrm{E}\left[\frac{\partial^2\ln L(\theta)}{\partial\theta \partial\theta'}\right].$

For N parameters in the model, so that θ is a Nx1 vector $\theta = \begin{bmatrix}\theta_{1}, \theta_{2}, \cdots , \theta_{N} \end{bmatrix},$ the Fisher information takes the form of an NxN matrix as shown below.

$\mathcal[I(\theta)]= -\mathrm{E} \begin{bmatrix} \frac{\partial^2 \ln L(\theta)}{\partial {\theta_1}^2} & \frac{\partial^2 \ln L(\theta)}{\partial \theta_1 \partial \theta_2} & \cdots & \frac{\partial^2 \ln L(\theta)}{\partial \theta_1 \partial \theta_N} \\ \\ \frac{\partial^2 \ln L(\theta)}{\partial \theta_2 \partial \theta_1} & \frac{\partial^2 \ln L(\theta)}{\partial {\theta_2}^2} & \cdots & \frac{\partial^2 \ln L(\theta)}{\partial \theta_2 \partial \theta_N} \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \frac{\partial^2 \ln L(\theta)}{\partial \theta_N \partial \theta_1} & \frac{\partial^2 \ln L(\theta)}{\partial \theta_N \partial \theta_2} & \cdots & \frac{\partial^2 \ln L(\theta)}{\partial {\theta_N}^2} \end{bmatrix}.$

Like in the OLS method, uncertainty for MLE is measured by first finding the standard error of each parameter in the model. Due to the asymptotic normality property of MLE, the covariance matrix[4] is equal to the inverse of the Fisher information matrix. The covariance matrix can be written as
$[I(\theta)]^{-1}=\left(-E\left[\frac{\partial^2 \ln L(\theta)}{\partial\theta \partial\theta'}\right]\right)^{-1}$

By pulling the variances from the covariance matrix and taking the square root, the standard errors of the parameters can be found. The variances come from the diagonal of the covariance matrix, such that Vari) is the element in the ith row and ith column of the covariance matrix. Once the variances are found, hypothesis testing and confidence intervals can be determined using the standard errors of the parameters as in the case of ordinary least squares regression.

# References

• Greene, W.H., 2003, Econometric analysis, Prentice Hall, Upper Saddle River, NJ
• Gujarati, D.H., 2004, Basic Econometrics, McGraw Hill, Boston, MA
• Train, K. (2003) Discrete Choice Methods with Simulation, Cambridge University Press. [5]