The chi-square goodness-of-fit test compares the theoretical, frequencies of categories from a population distribution to the observed, or actual, frequencies from a distribution to determine whether there is a difference between what was expected and what was observed. For example, airline industry officials might theorize that the ages of airline ticket purchasers are distributed in a particular way. To validate or reject this expected distribution, an actual sample of ticket purchaser ages can be gathered randomly, and the observed results can be compared to the expected results with the chi-square goodness-of-fit test.
This test also can be used to determine whether the observed arrivals at teller windows at a bank are Poisson distributed, as might be expected. In the paper industry, manufacturers can use the chi-square goodness-of-fit test to determine whether the demand for paper follows a uniform distribution throughout the year. Karl Pearson introduced the chi-square test in 1900. The chi-square distribution is the sum of the squares of k independent random variables and therefore can never be less than zero; it extends indefinitely in the positive direction.
Actually the chi-square distributions constitute a family, with each distribution defined by the degrees of freedom (df) associated with it. For small df values the chi-square distribution is skewed considerably to the right (positive values). As the df increase, the chi-square distribution begins to approach the normal curve. The chi-square goodness-of-fit test is used to analyze the distribution of frequencies for categories of one variable, such as age or number of bank arrivals, to determine whether the distribution of these frequencies is the same as some hypothesized or expected distribution.
However, the goodness-of-fit test cannot be used to analyze two variables simultaneously. A different chi-square test, the chi-square test of independence, can be used to analyze the frequencies of two variables with multiple categories to determine whether the two variables are independent. Many times this type of analysis is desirable. The chi-square test of independence can be used to analyze any level of data measurement, but it is particularly useful in analyzing nominal data. Suppose a business researcher is interested in determining whether geographic region is independent of type of financial investment.