Joel H. Levine, Extended Correlation: Not Necessarily Quadratic or Quantitative, Sociological Methods & Research 2005 34: 31-75.

February 7, 2010

What is the correlation between two variables? Traditional answers offer summary assessments such as Pearson’s r and regression coefficients. But new computing techniques make it possible to construct conceptually simple hypotheses that describe the full joint distribution of two variables, making it possible to “mine” the correlation for information that was previously unused. This article begins with evidence of systematic anomalies in the empirical joint distribution of height-weight data and follows with a hypothesis that explains these anomalies in terms of a theoretical joint distribution relative to a linear equation. The hypothesis has serious consequences because even in traditional examples, while it offers an improved fit to the data, its estimates of the linear center do not correspond to traditional least squares estimates of the linear relation for the same two variables.

Key Words: correlation • association • regression • least squares • categorical data