Cluster Robust Standard Errors in Lavaan (SEM, CFA, Path Analysis)
by Arndt Regorz, MSc.
July 21, 2024
One assumption of lavaan’s ML estimation is the independence of residuals. In other words, there must not be nested (hierarchical, multilevel) data. Otherwise you will get wrong (often two small) standard errors and thereby wrong p-values leading to false conclusions about your hypotheses.
Nested data can arise in cross-sectional designs (e.g., students nested within schools) or in longitudinal designs with repeated measurement (e.g., time points nested within persons).
In lavaan there are different options available for a nested data structure.
You could use multilevel SEM, multilevel CFA or multilevel path analysis. Currently, lavaan supports models with two levels and random intercepts.
But there is an easier way to deal with nested data in a lavaan model: Cluster robust standard errors.
What Are Cluster Robust Standard Errors?
Cluster robust standard errors (CRSE) are used to adjust the standard errors of regression coefficients to account for potential correlations within clusters of data. These within-cluster correlations can lead to too small standard errors if the clustering ist ignored. CRSE correct for this by allowing for arbitrary correlation patterns within clusters, thus providing more reliable standard errors and thereby more reliable p-values.
Using Cluster Robust Standard Errors in Lavaan
It is very easy to incorporate cluster robust standard errors in a lavaan model estimation. The model definition is the same as in a model without nested data. You simply add the parameter cluster to the estimation function (e.g., cfa() or sem()). The model definition is the same that you would use without nested data.
Here is an example where the participants of the study are nested in classes, and the clustering variable in the example dataframe has the name “class”:
fit <- sem(path_model, data = popular2, cluster = "class")
If you use this additional argument for the sem() function or the cfa() function you can see at the top of the parameter estimates that a cluster-robust estimator has been used:
Parameter Estimates:
Standard errors Robust.cluster
Information Observed
Observed information based on Hessian
And the standard errors, z-statistics and p-values you get are corrected for clustering.
You can use cluster-robust standard errors with complete data but also with missing values using the additional parameter missing = “fiml”.
Citation
Regorz, A. (2024, July 21). Cluster robust standard errors in lavaan (SEM, CFA, path analysis). Regorz Statistik. https://www.regorz-statistik.de/blog/lavaan_cluster_robust_se.html
Other Blog Posts You Might Find Interesting
SEM/CFA: Checking the Linearity Assumption in R/lavaan