Michael Greenacre and Rafael Pardo, Subset Correspondence Analysis: Visualizing Relationships Among a Selected Set of Response Categories From a Questionnaire Survey, Sociological Methods & Research 2006 35: 193-218.

February 7, 2010
This study shows how correspondence analysis may be applied to a subset of response categories from a questionnaire survey (e.g., the subset of undecided responses or the subset of responses for a particular category across several questions). The idea is to maintain the original relative frequencies of the categories and not reexpress them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square distances assigned to the subset of categories are the same as those in the correspondence analysis of the whole data set, which leads to a decomposition of total variance into parts if the whole data set is subdivided into disjoint subsets. This variant of the method, called subset correspondence analysis, is illustrated on data from the International Social Survey Programme’s Family and Changing Gender Roles survey.

Key Words: categorical data • correspondence analysis • missing data • principal component analysis • questionnaire survey • singular value decomposition

Joel H. Levine, Extended Correlation: Not Necessarily Quadratic or Quantitative, Sociological Methods & Research 2005 34: 31-75.

February 7, 2010

What is the correlation between two variables? Traditional answers offer summary assessments such as Pearson’s r and regression coefficients. But new computing techniques make it possible to construct conceptually simple hypotheses that describe the full joint distribution of two variables, making it possible to “mine” the correlation for information that was previously unused. This article begins with evidence of systematic anomalies in the empirical joint distribution of height-weight data and follows with a hypothesis that explains these anomalies in terms of a theoretical joint distribution relative to a linear equation. The hypothesis has serious consequences because even in traditional examples, while it offers an improved fit to the data, its estimates of the linear center do not correspond to traditional least squares estimates of the linear relation for the same two variables.

Key Words: correlation • association • regression • least squares • categorical data