Validating the Results of Factor Analysis
Finally, it is important to validate the results of the factor analysis by checking the assumptions of the technique, such as normality and linearity. Additionally, it is important to examine the factor structure for different subsets of the data to ensure that the results are consistent and stable.
R
# examine factor structure for # different subsets of the data subset1 <- subset (iris[,1:4], iris$Sepal.Length < mean (iris$Sepal.Length)) fa1 <- fa (subset1, nfactors = 4) print (fa1) |
Output:
Factor Analysis using method = minres Call: fa(r = subset1, nfactors = 4) Standardized loadings (pattern matrix) based upon correlation matrix MR1 MR2 MR3 MR4 h2 u2 com Sepal.Length 0.66 0.61 -0.12 0 0.82 0.178 2.1 Sepal.Width -0.68 0.61 0.11 0 0.85 0.150 2.0 Petal.Length 1.00 0.00 0.00 0 1.00 0.005 1.0 Petal.Width 0.97 0.01 0.16 0 0.97 0.031 1.1 MR1 MR2 MR3 MR4 SS loadings 2.85 0.74 0.05 0.00 Proportion Var 0.71 0.18 0.01 0.00 Cumulative Var 0.71 0.90 0.91 0.91 Proportion Explained 0.78 0.20 0.01 0.00 Cumulative Proportion 0.78 0.99 1.00 1.00 Mean item complexity = 1.5 Test of the hypothesis that 4 factors are sufficient. The degrees of freedom for the null model are 6 and the objective function was 4.57 with Chi Square of 351.02 The degrees of freedom for the model are -4 and the objective function was 0 The root mean square of the residuals (RMSR) is 0 The df corrected root mean square of the residuals is NA The harmonic number of observations is 80 with the empirical chi square 0 with prob < NA The total number of observations was 80 with Likelihood Chi Square = 0 with prob < NA Tucker Lewis Index of factoring reliability = 1.018 Fit based upon off diagonal values = 1 Measures of factor score adequacy MR1 MR2 MR3 MR4 Correlation of (regression) scores with factors 1.00 0.91 0.69 0 Multiple R square of scores with factors 1.00 0.82 0.47 0 Minimum correlation of possible factor scores 0.99 0.64 -0.05 -1
R
subset2 <- subset (iris[,1:4], iris$Sepal.Length >= mean (iris$Sepal.Length)) fa2 <- fa (subset2, nfactors = 4) print (fa2) |
Output:
Factor Analysis using method = minres Call: fa(r = subset2, nfactors = 4) Standardized loadings (pattern matrix) based upon correlation matrix MR1 MR2 MR3 MR4 h2 u2 com Sepal.Length 0.76 -0.37 0.26 0 0.78 0.222 1.7 Sepal.Width 0.50 0.36 0.34 0 0.49 0.507 2.6 Petal.Length 0.95 -0.23 -0.22 0 1.00 0.005 1.2 Petal.Width 0.82 0.39 -0.20 0 0.86 0.144 1.6 MR1 MR2 MR3 MR4 SS loadings 2.39 0.46 0.27 0.00 Proportion Var 0.60 0.12 0.07 0.00 Cumulative Var 0.60 0.71 0.78 0.78 Proportion Explained 0.76 0.15 0.09 0.00 Cumulative Proportion 0.76 0.91 1.00 1.00 Mean item complexity = 1.8 Test of the hypothesis that 4 factors are sufficient. The degrees of freedom for the null model are 6 and the objective function was 1.97 with Chi Square of 131.96 The degrees of freedom for the model are -4 and the objective function was 0 The root mean square of the residuals (RMSR) is 0 The df corrected root mean square of the residuals is NA The harmonic number of observations is 70 with the empirical chi square 0 with prob < NA The total number of observations was 70 with Likelihood Chi Square = 0 with prob < NA Tucker Lewis Index of factoring reliability = 1.05 Fit based upon off diagonal values = 1 Measures of factor score adequacy MR1 MR2 MR3 MR4 Correlation of (regression) scores with factors 0.98 0.86 0.75 0 Multiple R square of scores with factors 0.96 0.75 0.57 0 Minimum correlation of possible factor scores 0.92 0.49 0.14 -1
R
# display variance explained by each factor print (fa$Vaccounted) |
Output:
MR1 MR2 MR3 MR4 SS loadings 2.8853608 0.5816336 0.09819492 4.000000e-30 Proportion Var 0.7213402 0.1454084 0.02454873 1.000000e-30 Cumulative Var 0.7213402 0.8667486 0.89129733 8.912973e-01 Proportion Explained 0.8093149 0.1631424 0.02754269 1.121960e-30 Cumulative Proportion 0.8093149 0.9724573 1.00000000 1.000000e+00
Factor Analysis using factanal( ) function:
The factanal() function is used to perform factor analysis on a data set. The factanal() function takes several arguments described below
Syntax:
factanal(x, factors, rotation, scores, covmat)
where,
- x – The data set to be analyzed.
- factors – The number of factors to extract.
- rotation – The rotation method to use. Popular rotation methods include varimax, oblimin, and promax.
- scores – Whether to compute factor scores for each observation.
- covmat – A covariance matrix to use instead of the default correlation matrix.
The output of factanal() function includes several pieces of information, including:
- Uniquenesses: The amount of variance in each variable that is not accounted for by the factors.
- Loadings: The correlations between each variable and each factor.
- Communalities: The amount of variance in each variable that is accounted for by the factors.
- Eigenvalues: The amount of variance explained by each factor.
- Factor Correlations: The correlations between the factors.
Here is an example code snippet that demonstrates how to use factanal() function in R:
R
# Install the required package install.packages ( "psych" ) # Load the psych package for # data analysis and visualization library (psych) # Load the mtcars dataset data (mtcars) # Perform factor analysis on the mtcars dataset factor_analysis <- factanal (mtcars, factors = 3, rotation = "varimax" ) # Print the results print (factor_analysis) |
Output:
Call: factanal(x = mtcars, factors = 3, rotation = "varimax") Uniquenesses: mpg cyl disp hp drat wt qsec vs am gear carb 0.135 0.055 0.090 0.127 0.290 0.060 0.051 0.223 0.208 0.125 0.158 Loadings: Factor1 Factor2 Factor3 mpg 0.643 -0.478 -0.473 cyl -0.618 0.703 0.261 disp -0.719 0.537 0.323 hp -0.291 0.725 0.513 drat 0.804 -0.241 wt -0.778 0.248 0.524 qsec -0.177 -0.946 -0.151 vs 0.295 -0.805 -0.204 am 0.880 gear 0.908 0.224 carb 0.114 0.559 0.719 Factor1 Factor2 Factor3 SS loadings 4.380 3.520 1.578 Proportion Var 0.398 0.320 0.143 Cumulative Var 0.398 0.718 0.862 Test of the hypothesis that 3 factors are sufficient. The chi square statistic is 30.53 on 25 degrees of freedom. The p-value is 0.205
In this example, we load the psych package, which provides functions for data analysis and visualization, and the mtcars data set, which contains information about different car models. We then use the factanal() function to perform factor analysis on the mtcars data set, specifying that we want to extract three factors and use the varimax rotation method. Finally, we print the results of the factor analysis.
Conclusion
In conclusion, factor analysis is a useful statistical technique for identifying underlying factors or latent variables that explain the correlations among a set of observed variables. In R programming, the psych package provides a range of functions for conducting factor analysis, which can be used to extract meaningful insights from complex datasets.
Factor Analysis in R programming
Factor Analysis (FA) is a statistical method that is used to analyze the underlying structure of a set of variables. It is a method of data reduction that seeks to explain the correlations among many variables in terms of a smaller number of unobservable (latent) variables, known as factors. In R Programming Language, the psych package provides a variety of functions for performing factor analysis.
Factor analysis involves several steps:
- Data preparation: The data are usually standardized (i.e., scaled) to make sure that the variables are on a common scale and have equal weight in the analysis.
- Factor Extraction: The factors are identified based on their ability to explain the variance in the data. There are several methods for extracting factors, including principal components analysis (PCA), maximum likelihood estimate(MLE), and minimum residuals (MR).
- Factor Rotation: The factors are usually rotated to make their interpretation easier. The most common method of rotation is Varimax rotation, which tries to maximize the variance of the factor loadings.
- Factor interpretation: The final step involves interpreting the factors and their loadings (i.e., the correlation between each variable and each factor). The loadings represent the degree to which each variable is associated with each factor.
Loading the Data
First, we need to load the data that we want to analyze. For this example, we will use the iris dataset that comes with R. This dataset contains measurements of the sepal length, sepal width, petal length, and petal width of three different species of iris flowers.
R
# Load the dataset data (iris) # View the first few rows of the dataset head (iris) |
Output:
Data Preparation
Before conducting factor analysis, we need to prepare the data by scaling the variables to have a mean of zero and a standard deviation of one. This is important because factor analysis is sensitive to differences in scale between variables.
R
# Scale the data iris_scaled <- scale (iris[,1:4]) |
Determining the Number of Factors
The next step is to determine the number of factors to extract from the data. This can be done using a variety of methods, such as the Kaiser criterion, scree plot, or parallel analysis. In this example, we will use the Kaiser criterion, which suggests extracting factors with eigenvalues greater than one.
R
# Perform factor analysis library (psych) fa <- fa (r = iris_scaled, nfactors = 4, rotate = "varimax" ) summary (fa) |
Output:
Factor analysis with Call: fa(r = iris_scaled, nfactors = 4, rotate = "varimax") Test of the hypothesis that 4 factors are sufficient. The degrees of freedom for the model is -4 and the objective function was 0 The number of observations was 150 with Chi Square = 0 with prob < NA The root mean square of the residuals (RMSA) is 0 The df corrected root mean square of the residuals is NA Tucker Lewis Index of factoring reliability = 1.009
The output of the summary() function shows the results of the factor analysis, including the number of factors extracted, the eigenvalues for each factor, and the percentage of variance explained by each factor.
This summary shows that the factor analysis extracted 2 factors, and provides the standardized loadings (or factor loadings) for each variable on each factor. It also shows the eigenvalues and proportion of variance explained by each factor, as well as the results of a test of the hypothesis that 2 factors are sufficient. The goodness of fit statistic is also reported.
Contact Us