Beautiful Tables for Exploratory Factor Analysis in R

The psych package for R provides great utilities for exploratory factor analysis (EFA). However, the way psych displays the results does not take advantage of visual cues to make grasping the factor solutions easier, nor is it straightforward to display the results in a way that can be shared easily with others. Several solutions to the problem have been proposed, such as the LaTeX-based psych::df2latex() or an implementation in the sjPlot package sjPlot::fa_tab(). Because these solutions have some drawbacks, here’s another take on the issue.

The solution I present here is consistent with the tidyverse and builds on the highly flexible and powerful gt package to provide beautiful and highly customizable tables that can be used in R Markdown approaches.

Factor Analysis Example

We need several packages to get started:

library(psych)
library(tidyverse)
library(gt)

As an example we use the IPIP Big Five Inventory from the psych package. First we run a factor analysis; I will not go into details here as these are discussed in-depth in the psych package and EFA best practice papers.

res <- fa(psychTools::bfi[1:25],5)

The standard way to display the EFA results are not very clean or legible inside a markdown output.

print(res)
## Factor Analysis using method =  minres
## Call: fa(r = psychTools::bfi[1:25], nfactors = 5)
## Standardized loadings (pattern matrix) based upon correlation matrix
##      MR2   MR1   MR3   MR5   MR4   h2   u2 com
## A1  0.21  0.17  0.07 -0.41 -0.06 0.19 0.81 2.0
## A2 -0.02  0.00  0.08  0.64  0.03 0.45 0.55 1.0
## A3 -0.03  0.12  0.02  0.66  0.03 0.52 0.48 1.1
## A4 -0.06  0.06  0.19  0.43 -0.15 0.28 0.72 1.7
## A5 -0.11  0.23  0.01  0.53  0.04 0.46 0.54 1.5
## C1  0.07 -0.03  0.55 -0.02  0.15 0.33 0.67 1.2
## C2  0.15 -0.09  0.67  0.08  0.04 0.45 0.55 1.2
## C3  0.03 -0.06  0.57  0.09 -0.07 0.32 0.68 1.1
## C4  0.17  0.00 -0.61  0.04 -0.05 0.45 0.55 1.2
## C5  0.19 -0.14 -0.55  0.02  0.09 0.43 0.57 1.4
## E1 -0.06 -0.56  0.11 -0.08 -0.10 0.35 0.65 1.2
## E2  0.10 -0.68 -0.02 -0.05 -0.06 0.54 0.46 1.1
## E3  0.08  0.42  0.00  0.25  0.28 0.44 0.56 2.6
## E4  0.01  0.59  0.02  0.29 -0.08 0.53 0.47 1.5
## E5  0.15  0.42  0.27  0.05  0.21 0.40 0.60 2.6
## N1  0.81  0.10  0.00 -0.11 -0.05 0.65 0.35 1.1
## N2  0.78  0.04  0.01 -0.09  0.01 0.60 0.40 1.0
## N3  0.71 -0.10 -0.04  0.08  0.02 0.55 0.45 1.1
## N4  0.47 -0.39 -0.14  0.09  0.08 0.49 0.51 2.3
## N5  0.49 -0.20  0.00  0.21 -0.15 0.35 0.65 2.0
## O1  0.02  0.10  0.07  0.02  0.51 0.31 0.69 1.1
## O2  0.19  0.06 -0.08  0.16 -0.46 0.26 0.74 1.7
## O3  0.03  0.15  0.02  0.08  0.61 0.46 0.54 1.2
## O4  0.13 -0.32 -0.02  0.17  0.37 0.25 0.75 2.7
## O5  0.13  0.10 -0.03  0.04 -0.54 0.30 0.70 1.2
## 
##                        MR2  MR1  MR3  MR5  MR4
## SS loadings           2.57 2.20 2.03 1.99 1.59
## Proportion Var        0.10 0.09 0.08 0.08 0.06
## Cumulative Var        0.10 0.19 0.27 0.35 0.41
## Proportion Explained  0.25 0.21 0.20 0.19 0.15
## Cumulative Proportion 0.25 0.46 0.66 0.85 1.00
## 
##  With factor correlations of 
##       MR2   MR1   MR3   MR5   MR4
## MR2  1.00 -0.21 -0.19 -0.04 -0.01
## MR1 -0.21  1.00  0.23  0.33  0.17
## MR3 -0.19  0.23  1.00  0.20  0.19
## MR5 -0.04  0.33  0.20  1.00  0.19
## MR4 -0.01  0.17  0.19  0.19  1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 5 factors are sufficient.
## 
## The degrees of freedom for the null model are  300  and the objective function was  7.23 with Chi Square of  20163.79
## The degrees of freedom for the model are 185  and the objective function was  0.65 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.04 
## 
## The harmonic number of observations is  2762 with the empirical chi square  1392.16  with prob <  5.6e-184 
## The total number of observations was  2800  with Likelihood Chi Square =  1808.94  with prob <  4.3e-264 
## 
## Tucker Lewis Index of factoring reliability =  0.867
## RMSEA index =  0.056  and the 90 % confidence intervals are  0.054 0.058
## BIC =  340.53
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    MR2  MR1  MR3  MR5  MR4
## Correlation of (regression) scores with factors   0.92 0.89 0.88 0.88 0.84
## Multiple R square of scores with factors          0.85 0.79 0.77 0.77 0.71
## Minimum correlation of possible factor scores     0.70 0.59 0.54 0.54 0.42

Using gt() to build a beautiful EFA results table

Below is the fa_table function that takes the factor analysis object from psych::fa() as its input and returns a clean table. Some code elements were adapted from sjPlot::fa_tab() and a blogpost by Anthony Schmidt.

fa_table <- function(x, varlabels = NULL, title = "Factor analysis results", diffuse = .10, small = .30, cross = .20, sort = TRUE) {
  #get sorted loadings
  require(dplyr)
  require(purrr)
  require(tibble)
  require(gt)
  if(sort == TRUE) {
    x <- psych::fa.sort(x)
  }
  if(!is.null(varlabels)) {
    if(length(varlabels) != nrow(x$loadings)) { warning("Number of variable labels and number of variables are unequal. Check your input!",
                                                        call. = FALSE) }
    if(sort == TRUE) {
      varlabels <- varlabels[x$order]
      }
  }
  if(is.null(varlabels)) {varlabels <- rownames(x$loadings)}

  loadings <- data.frame(unclass(x$loadings))
  
  #make nice names
  factornamer <- function(nfactors) {
    paste0("Factor_", 1:nfactors)}
  
  nfactors <- ncol(loadings)
  fnames <- factornamer(nfactors)
  names(loadings) <- fnames
  
  # prepare locations
  factorindex <- apply(loadings, 1, function(x) which.max(abs(x)))
  
  # adapted from from sjplot: getremovableitems
  getRemovableItems <- function(dataframe, fctr.load.tlrn = diffuse) {
    # clear vector
    removers <- vector(length = nrow(dataframe))
    # iterate each row of the data frame. each row represents
    # one item with its factor loadings
    for (i in seq_along(removers)) {
      # get factor loadings for each item
      rowval <- as.numeric(abs(dataframe[i, ]))
      # retrieve highest loading
      maxload <- max(rowval)
      # retrieve 2. highest loading
      max2load <- sort(rowval, TRUE)[2]
      # check difference between both
      if (abs(maxload - max2load) < fctr.load.tlrn) {
        # if difference is below the tolerance,
        # remeber row-ID so we can remove that items
        # for further PCA with updated data frame
        removers[i] <- TRUE
      }
    }
    # return a vector with index numbers indicating which items
    # have unclear loadings
    return(removers)
  }
 if(nfactors > 1) {
   removable <- getRemovableItems(loadings)
   cross_loadings <- purrr::map2(fnames, seq_along(fnames), function(f, i) {
     (abs(loadings[,f] > cross)) & (factorindex != i) 
   })
 }

  small_loadings <- purrr::map(fnames, function(f) {
    abs(loadings[,f]) < small
  })
  
  ind_table <- dplyr::tibble(varlabels, loadings) %>%
    dplyr::rename(Indicator = varlabels) %>% 
    dplyr::mutate(Communality = x$communality, Uniqueness = x$uniquenesses, Complexity = x$complexity) %>% 
    dplyr::mutate(across(starts_with("Factor"), round, 3))  %>%
    dplyr::mutate(across(c(Communality, Uniqueness, Complexity), round, 2))
                    
  
  ind_table <- ind_table %>% gt(rowname_col = "Indicator") %>% tab_header(title = title)
  # mark small loadiongs
  for(f in seq_along(fnames)) {
    ind_table <- ind_table %>%  tab_style(style = cell_text(color = "#D3D3D3", style = "italic"),
                             locations = cells_body(columns = fnames[f], rows = small_loadings[[f]]))
  }
  # mark cross loadings
  
  if (nfactors > 1) {
    for (f in seq_along(fnames)) {
      ind_table <-
        ind_table %>%  tab_style(
          style = cell_text(style = "italic"),
          locations = cells_body(columns = fnames[f], rows = cross_loadings[[f]])
        )
    }
    # mark non-assignable indicators
    ind_table <-
      ind_table %>%  tab_style(style = cell_fill(color = "#D93B3B"),
                               locations = cells_body(rows = removable))
  }
  
  # adapted from https://www.anthonyschmidt.co/post/2020-09-27-efa-tables-in-r/
  Vaccounted <- x[["Vaccounted"]]
  colnames(Vaccounted) <- fnames 
  if (nfactors > 1) {
  Phi <- x[["Phi"]]
  rownames(Phi) <- fnames
  colnames(Phi) <- fnames
  f_table <- rbind(Vaccounted, Phi) %>%
    as.data.frame() %>% 
    rownames_to_column("Property") %>%
    mutate(across(where(is.numeric), round, 3)) %>%
    gt() %>% tab_header(title = "Eigenvalues, Variance Explained, and Factor Correlations for Rotated Factor Solution")
  }
  else if(nfactors == 1) {
    f_table <- rbind(Vaccounted) %>%
      as.data.frame() %>% 
      rownames_to_column("Property") %>%
      mutate(across(where(is.numeric), round, 3)) %>%
      gt() %>% tab_header(title = "Eigenvalues, Variance Explained, and Factor Correlations for Rotated Factor Solution")
  }

  return(list("ind_table" = ind_table, "f_table" = f_table))
  
}

We run the function on the results:

tables <- fa_table(res)
tables$ind_table
Factor analysis results
Factor_1Factor_2Factor_3Factor_4Factor_5CommunalityUniquenessComplexity
N10.8150.1030.004-0.111-0.0470.650.351.08
N20.7770.0400.011-0.0940.0150.600.401.04
N30.706-0.100-0.0350.0790.0230.550.451.07
N50.486-0.202-0.0040.207-0.1500.350.651.96
N40.474-0.386-0.1350.0950.0800.490.512.27
E20.099-0.676-0.016-0.048-0.0580.540.461.07
E40.0140.5910.0240.287-0.0770.530.471.49
E1-0.059-0.5570.106-0.083-0.1030.350.651.21
E50.1520.4210.2710.0520.2060.400.602.60
E30.0830.418-0.0010.2460.2830.440.562.55
C20.149-0.0850.6660.0810.0390.450.551.17
C40.1740.002-0.6140.040-0.0480.450.551.18
C30.034-0.0610.5670.092-0.0680.320.681.11
C50.189-0.142-0.5530.0180.0920.430.571.44
C10.069-0.0270.546-0.0230.1480.330.671.19
A3-0.0290.1160.0250.6600.0310.520.481.07
A2-0.023-0.0020.0770.6400.0320.450.551.04
A5-0.1120.2330.0060.5320.0440.460.541.49
A4-0.0570.0640.1930.433-0.1480.280.721.74
A10.2130.1660.067-0.414-0.0580.190.811.97
O30.0310.1520.0170.0830.6090.460.541.17
O50.1320.098-0.0250.043-0.5420.300.701.21
O10.0180.1030.0730.0150.5080.310.691.13
O20.1950.057-0.0780.163-0.4560.260.741.75
O40.126-0.323-0.0240.1740.3710.250.752.69
tables$f_table
Eigenvalues, Variance Explained, and Factor Correlations for Rotated Factor Solution
PropertyFactor_1Factor_2Factor_3Factor_4Factor_5
SS loadings2.5702.1992.0291.9851.586
Proportion Var0.1030.0880.0810.0790.063
Cumulative Var0.1030.1910.2720.3510.415
Proportion Explained0.2480.2120.1960.1910.153
Cumulative Proportion0.2480.4600.6560.8471.000
Factor_11.000-0.213-0.187-0.038-0.011
Factor_2-0.2131.0000.2300.3290.167
Factor_3-0.1870.2301.0000.2030.195
Factor_4-0.0380.3290.2031.0000.193
Factor_5-0.0110.1670.1950.1931.000

The function returns two tables: ind_table for the factor (pattern) loadings, and f_table for aspects at the factor-level. The former table displays the relevant information for each indicator; the use of italics and colors makes it easy to grasp the structure of factor (pattern) loadings. Highlighted in red, we see that N4 and 04 indicators are not represented well in the EFA because they load diffusely over multiple factors.

You can supply custom values for the conditional formatting of the table:

  • diffuse specifies the minimum difference an indicator needs to have between factor loadings in order to indicate a clear loading on just one factor, and not diffuse loadings over multiple factors. Diffuse indicators are labelled as red in the table.
  • small specifies when a loading is considered small; these loadings are printed in light gray.
  • cross specifies when a loading is considered to be a cross-loading; these loadings are printed in oblique black.

The fa_table() function is a work in progress and the latest version can be found at: https://github.com/franciscowilhelm/r-collection/blob/master/fa_table.R

Import it to R via: source("https://raw.githubusercontent.com/franciscowilhelm/r-collection/master/fa_table.R")

Francisco Wilhelm
Francisco Wilhelm
Career Researcher

I study the psychological processes underlying career development.