Factor Analysis

Description

Factor is a program developed to fit the Exploratory Factor Analysis model. Below we describe the methods used.

Missing values

Missing values in the dataset are allowed. Multiple Imputation in exploratory factor analysis is implemented based on Lorenzo-Seva & Van Ginkel (2016) proposal. Missing values must be identified using a numerical code.

Univariate and multivariate descriptives of variables:

Univariate mean, variance, skewness, and kurtosis.
Multivariate skewness and kurtosis (Mardia, 1970).
Var charts for ordinal variables.

Dispersion matrices:

User-defined matrix.
Covariance matrix.
Pearson correlation matrix.
Tetrachoric and Polychoric correlation matrix: In the first stage the thresholds are estimated from the marginal counts of the contingency table and taken as fixed and known (e.g. Olsson, 1979a, 1979b). In the second stage the PCC point estimate is obtained on the basis of the fixed thresholds by using a unified Bayes modal estimation (MAP) approach. The estimation is carried out using a two-step approach: first, correlation coefficient is estimated using an ML/non-informative-prior MAP using 40 nodes; and second, the estimate obtained in step one is used to obtain the final estimate based on MAP with a strong prior (again with 40 nodes). In bootstrap analysis, the sample estimate is already used as the first estimate for the bootstrap sample. The advantage of this approach is that it is non-iterative and so free from non-convergence problems. If the resulting matrix is non-positive definite, then with smoothing algorithm (Devlin, Gnanadesikan, & Kettenring, 1975; Devlin, Gnanadesikan, & Kettenring, 1981).
In order to allow robust factor analysis, asymptotic variance covariance matrix for correlation coefficients is computed based on (a) analytical estimates, or (b) bootstrap sampling.
hen bootstrap sampling is allowed by the user, confidence interval of correlation indices between variables is reported. Confidence intervals can be 90% and 95% based on direct percentile, and bias-corrected percentile.

Procedures for determining the number of factors/components to be retained:

MAP: Minimum Average Partial Test (Velicer, 1976).
PA: Parallel Analysis (Horn, 1965).
Optimal PA. It is an implementation of Parallel Analysis where it is computed based on the same type of correlation matrix (i.e., Pearson or polychoric correlation) and the same type of underlying dimensions (i.e., components of factor) as defined for the whole analysis (Timmerman & Lorenzo-Seva, 2011).
Hull method for selecting the number of common factors: this method aims to find a model with an optimal balance between model fit and number of parameters (Lorenzo-Seva & Timmerman, 2011). The implementation of HULL has been reviewed in order to incorporate robust goodness-of-fit indices based on corrected chi-square index.
IC dimensionality test: Schwarz’s Bayesian Information Criterion is computed for a number of factors models, so that the model with the optimal number of factors (i.e., the model that corresponds to a lower BIC value) is detected.

Factor and component analysis:

PCA: Principal Component Analysis.
ULS: Unweighted Least Squares factor analysis (also MINRES and PAF).
EML: Exploratory Maximum Likelihood factor analysis.
MRFA: Minimum Rank Factor Analysis (ten Berge, & Kiers, 1991).
ULS: Robust Unweighted Least Squares factor analysis.
RML: Robust exploratory Maximum Likelihood factor analysis.
DWLS: Diagonally Weighted Least Squares factor analysis.
Bootstrap confidence intervals for loading values, and inter-factor correlations are reported. The confidence intervals for loading values can be computed based on direct percentiles, and on bias-corrected and accelerated (BCA) bootstrap (Lambert, Wildt, & Durand, 1991).
Item Response Theory (IRT) parameterization of factor solution, and the corresponding Bootstrap confidence intervals.
Semi-confirmatory factor analysis based on orthogonal and oblique rotation to a (partially) specified target (Browne, 1972a, 1972b).
Schmid-Leiman second-order solution (1957).
Factor scores for continuous and graded data are computed based on Bayes expected a-posteriori (EAP) estimation of latent trait scores (Ferrando & Lorenzo-Seva, 2016). The appropriate implementation of EAP score estimation in an oblique model involves: (a) obtaining point estimates that make use of the full prior information, (b) complementing the point estimates with measures of the reliability of these estimates: PSDs, confidence/credibility intervals and individual reliabilities, and (c) reporting marginal reliability estimates.
Person fit indices (Ferrando, 2009): Personal Correlation (rp) and Weighted Mean-Squared Index (WMSI) indices are computed using optimal threshold values to detect aberrant responses (Ferrando, Vigil-Colet, & Lorenzo-Seva, 2017).

In ULS factor analysis, the Heywood case correction described in Mulaik (1972, page 153) is included: when an update has sum of squares larger than the observed variance of the variable, that row is updated by constrained regression using the procedure proposed by ten Berge and Nevels (1977).

Some of the rotation methods to obtain simplicity are:

Quartimax (Neuhaus & Wrigley, 1954).
Varimax (Kaiser, 1958).
Weighted Varimax (Cureton & Mulaik, 1975).
Orthomin (Bentler, 1977).
Direct Oblimin (Clarkson & Jennrich, 1988).
Weighted Oblimin (Lorenzo-Seva, 2000).
Promax (Hendrickson & White, 1964).
Promaj (Trendafilov, 1994).
Promin (Lorenzo-Seva, 1999).
Simplimax (Kiers, 1994).

Some of the indices used in the analysis are:

Test on the dispersion matrix: Determinant, Bartlett's test and Kaiser-Meyer-Olkin (KMO).
Goodness of fit statistics: Chi-Square Non-Normed Fit Index (NNFI; Tucker & Lewis); Comparative Fit Index (CFI); Goodness of Fit Index (GFI); Adjusted Goodness of Fit Index (AGFI); Root Mean Square Error of Approximation (RMSEA); Estimated Non-Centrality Parameter (NCP); and Schwarz’s Bayesian Information Criterion (BIC). Robust goodness-of-fit indices based on corrected chi-square index: Robust Mean-Scaled Chi Square, and Robust Mean and Variance-Adjusted Chi Square. When bootstrap analyses are computed, bootstrap confidence intervals are computed for the goodness-of-fit indices.
Reliabilities of rotated components (ten Berge & Hofstee, 1999).
Simplicity indices: Bentler’s Simplicity index (1977) and Loading Simplicity index (Lorenzo-Seva, 2003)
Mean, variance and histogram of fitted and standardized residuals. Automatic detection of large standardized residuals.
The greatest lower bound (glb) to reliability (Woodhouse & Jackson, 1977). The greatest lower bound (glb) to reliability represents the smallest reliability possible given observed covariance matrix under the restriction that the sum of error variances is maximized for errors that correlate 0 with other variables (Ten Berge, Snijders, & Zegers, 1981).
McDonald's Omega. Omega can be interpreted as the square of the correlation between the scale score and the latent common to all the indicators in the infinite universe of indicators of which the scale indicators are a subset (McDonald, 1999, page 89).
Congruence index to assess the congruence between the rotated loading matrix and the user provided target matrix (Lorenzo-Seva, & ten Berge, 2006).