Factorial Methods: Core Concepts 1

This article is Part 2 in a 3-Part series called Factorial Analysis Intro.

Introduction

The first post in this series gave us a brief introduction into the different types of factorial analyses that are related to PCA, with a snapshot of where they fit. In this post, we are going to familiarise ourselves with the general structure or composition of a factorial analysis using PCA as a general framework.

Consequently, I will use the term Analysis to refer generally to any factorial method that a given concept refers to. This is important because, as we previously discovered, different analyses have different input data requirements. For convenience and brevity, the terms “Principal Component”, “Component” or any other description of the products of an Analysis are simply referred to as PC.

By way of reminder, the general principle behind this series of posts is that I aim to obtain a firm grasp of the concepts behind a technique use it. Otherwise I cannot use it properly and fully interpret its results. This guiding principle is embodied by the following proverb (paraphrased):

“Without vision the people perish, but blessed is the one who heeds wisdom’s instruction1” (Proverbs 29:18)

1 or “cast off restraint”

With all this in mind, let us begin exploration :smile:

Analysis computation (PCA)

This section was one that I found necessary to tackle before fleshing out the other ideas in this post. This was because I realised that I had to have a good conceptual grasp of how Analyses are calculated. Here, I aim to use PCA as a lens to understand the inner workings of factorial analyses to alay potential confusion about their interpretation.

During the background reading that I did for the previous post, I discovered that PCA can be computed via a number of means. I identified four different methods from looking at the various sections of the Wikipedia entry of the PCA Wikipedia page and also by checking out this answer at stackexchange that in turn cited this paper. In addition, three of these methods are summarised here. Ignoring the fact that each method potentially manifests as “classical” or “kernel” forms, the methods are:

The common theme: decomposition of either the correlation, covariance or data matrix by some sort of matrix decomposition approach to obtain pairs of entities that describe the principal components, such that each principal component is described by a pair of entities consisting of one value and one vector. As far as I can determine:

  • EVD create eigenvalues and eigenvectors to be used for PCA from a square correlation matrix.
  • SVD seems to be an extension of EVD to rectanglar matrices such that produces singlar values and singlular vectors that are related to the outputs of EVD.

Presently, I am not sure whethere the NIPALS and Power methods produce eigenvalues and eigenvectors. However, NIPALS seems to be able to handle missing values as stated here, and discussed here). Therefore, it seems to me that the computation of a PCA, regardless of the method employed, can be generally described as producing pairs of the following elements:

  • matrix decomposition value: represent the proportion of the variability (variance) explained by a given principal component.
  • matrix decomposition vector: represent the relative importance of each variable in a given principal component.

However, I will be using the terms eigenvalues and eigenvectors as the general term for simplicity.

Incidentally, this might be simplistic, but it seemed to me that the eigenvalue seems to be a scalar that is the greatest common divisor of all of the elements in the eigenvector. I mention it because the idea made some sense to me as a neat rationalisation for why eigenvalues exist, because the eigenvalue is potentially derived from the unit scaling of the eivenvector. That said, I’m not sure whether this is the actual mathematical explanation.

Analysis terminology

key references used:

First got the idea of checking out terminology here, however, my subsequent search for a clear and standardised interpretation of these elements on the interwebs (sic) was more confusing than I hoped. Consequently, I reverted to my academic bias and sought to collect a small number of authoritative sources that communicate the concepts that I need to acquire with consistent semantics.

To this end, I decided to read through the work of Abdi and Williams (2010), Risvik (2007) and Wold et. al. (1987) as a first step to familiarising myself with the essential characteristics of PCA and other factorial methods.

Again, the overall aim is to distill these concepts into their most basic definitions as a reference point for further exploration.

1) The basics

a. data matrix:

representation of the input dataset as a matrix object (n × p) consisting of:

  • n rows of observations
  • p columns of variables

b. inertia:

The inertia of a variable is a representation of the amount of the variance that is explained by this variable.

  • variables in question are the PCs
  • inertia embodied by the the eigenvalue
  • total inertia = inertia of the whole dataset
  • conceptually: inertia = eigenvalue = sum of squares of all observation scores
  • also represented as a % of the total inertia

Basically, inertia measures the amount of information contained within a variable (as far as I can tell). Further, as noted here, if a few PCs account for most of the inertia in the dataset:

  • there is a lot of correlation between the input variables
  • i.e. there is a lot of redundant information

c. centre of gravity:

Represents the centre of gravity (mean) of the rows:

  • of the original data matrix
  • synonyms: barycentre or centroid
  • a vector of length p
  • can be used to represent group centres (e.g. here)?

d. loadings:

  1. loading:
    • the correlation between a given variable and a particular PC
    • represents how much of a variable was used to create that PC
  2. loading matrix:
    • a matrix (p × p) of the loadings that represents the construction of the PCs by the Analysis
    • rows (loading vectors): represent computed PCs (≤ p)
    • columns: represent the p input variable loadings for each PC

e. Scores

A vector that represents transformation of the individual observations (rows) so that they can be plotted on the PC cartesian space:

  • values of the PC variables for each row
  • Vector contains n elements (one element per observation)
  • 1 vector of scores per PC

2) Contributions:

This section attempts to deal specifically with components that relate one aspect of the Analysis to another. Therefore, in this section the terms importance, contribution, influence and impact are seen as conceptually interchangeable. Importantly, these contributions relate substantially to the interpretation of an Analysis.

a. variable contribution (loadings2)

  • the contribution (%) of a given variable to a particular PC
    • proportion of a variable’s variance that is explained
    • derived from loadings2

b. observation contributions:

  • The importance of an specific observation i to a particular PC (l)
  • calculated using:
    • the PC’s eivenvalue ($\lambda_l$)
    • observation’s score, which we’ll call $o_{i,l}$ (observation i for component l)

Note: $\lambda_l$ = $\sum^n_1 o^2_{i,l}$

c. PC contributions (cos2):

  • The importance of an specific PC to a particular observation
    • measured by the square cosine (cos2)
    • i.e. correlation between an observation and a PC
  • indicates which PCs are important to interpreting an observation

3) Active vs. Passive

The calculation and interpretation of a given Analysis potentially involve two types of inputs: active and passive. Therefore, it is useful to have a clear idea of what these constituents are.

Related to this, with regards to variables, I use contrast the terms compatible and incompatible. This indicates whether a variable can be used in the computation of PCs by the Analysis in question.

  • compatible: e.g. quantitative variables as input to PCA
  • incompatible: e.g. PCA input variables cannot be qualitative

a. active inputs:

Inputs used to calculate the PCs for a given Analysis:

  • Active variables:
    • Input variables compatible for computation of the PCs
    • $\sum loadings^2 = 1$
  • Active individuals:
    • Input observation rows for computation of the PCs
    • each row contains measurements of the p active variables

b. supplementary inputs:

Inputs included after the calculation of PCs by the Analysis. Also referred to as passive:

  • passive observations
    • not used to compute the PCs
    • row contains measurements of the same p active variables
    • scores for calculated using the loading matrix calculated from the active variables
    • Machine learning note: passive observations must be preprocessed using the data centroids (means) and/or standard deviations used to preprocess the active variables (as explained in week 2 of this course).
  • passive variables
    • Additional variables measured for active (and possibly passive) observations
    • General aim: add further insight to the Analysis
      • add new insight to active variables
      • find where passive variables fit in the active analysis context
    • a) compatible variables:
      • Measures correlation between active PCs and variables
      • Loadings computed between passive variable(s) and active PCs
        • i.e. correlation coefficients
        • elable plotting of these variables
        • Supplementary $\sum loadings^2 \ne 1$
    • b) incompatible variables:
      • some analyses have input variables restrictions:
        • PCA: quantitative ONLY
        • MCA: qualitative ONLY
      • indirect integration:
        • cannot be computed directly by the Analysis
        • but can be superimposed onto the analysis e.g plotting:
          • colour by variable
          • shape by variable
          • group by varible (e.g. circle scores by group as shown here)
        • consider versatile Analysis alternative:
          • if seeking to superimpose many passive variables
          • options for mixed data: FAMD or MFA

Conclusion

Previously, we were introduced to the main basic types of factorial analysis. Following this, we have delved a little bit into how these methods work, using the PCA as a general framework. Personally, I found this process to be instructive and have gained significant insight and confidence that I feel that I can apply to correctly interpreting factorial methods. In the next post we will cover a few more areas before diving into examples :smile:.

This article is Part 2 in a 3-Part series called Factorial Analysis Intro.

Written on April 8, 2017