Latent Profile Analysis (LPA) in R
Arndt Regorz, Dipl. Kfm. & M.Sc. Psychology, 08/25/2023
In the realm of statistical analysis, researchers often grapple with the challenge of unveiling concealed structures within intricate datasets. Latent Profile Analysis (LPA) emerges as a potent tool for addressing this challenge, allowing researchers to discern latent subgroups or profiles within a population based on observed variables. This tutorial shows you how to run a LPA in R.
(Note: When you click on this video you are using a service offered by YouTube.)
Understanding Latent Profile Analysis
Conceptual Framework:
Latent Profile Analysis is grounded in latent variable modeling, a statistical approach seeking to uncover unobservable (latent) constructs influencing observed variables. LPA specifically focuses on identifying distinct groups within a population, assuming that individuals within each group share similar response patterns across a set of observed variables.
Key Assumptions:
LPA relies on assumptions of homogeneity within groups, heterogeneity between groups, and normally distributed residuals representing any unaccounted variability.
Methodological Steps
Data Preparation:
Select a set of observed variables relevant to the underlying constructs of interest, ensuring they are continuous and reflect meaningful aspects of the phenomenon under investigation.
Model Specification:
Determine the number of latent profiles (classes) to be extracted from the data and choose an appropriate model for the observed variables, such as Gaussian or non-Gaussian distributions.
Parameter Estimation:
Estimate model parameters. Refine the model iteratively based on fit indices.
Model Fit Evaluation:
Assess goodness-of-fit using indices like AIC and BIC. Consider practical interpretability and theoretical relevance of the identified profiles.
Practical Applications
Psychology and Behavioral Sciences:
Uncover distinct personality profiles or behavior patterns within a population. Explore underlying subgroups in clinical samples for personalized treatment strategies.
Education:
Identify different learning profiles among students and tailor educational interventions based on the identified profiles.
Market Research:
Segment consumers based on purchasing behavior and preferences. Customize marketing strategies for each identified consumer segment.
R Code for a Latent Profile Analysis
Here is the R code for the Youtube tutorial about LPA:
library(tidyLPA)
library(dplyr)
pisaUSA15_complete <- pisaUSA15[complete.cases(pisaUSA15), ]
summary(pisaUSA15_complete)
# 1. Basic Estimation
# Estimation of profiles
prof1 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:6)
prof1
compare_solutions(prof1, statistics = c("AIC", "BIC"))
prof2 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(6:8)
prof2
compare_solutions(prof2, statistics = c("AIC", "BIC"))
# More fit indices
get_fit(prof1)
# Plot of the profiles
plot_profiles(prof1, rawdata=FALSE)
plot_profiles(prof1[[6]], rawdata=FALSE)
# Distribution plot for a specific profile (density), e.g. 3 class solution
plot_density(prof1[[3]])
plot_bivariate(prof1[[3]])
# Estimates for the variables
get_estimates(prof1[[3]])
# Extracting data, e.g. 4 class solution
get_data(prof1) %>%
filter(classes_number == 4) %>%
arrange(id)
# Example with unclear grouping
get_data(prof1) %>%
filter(classes_number == 4, id==6)
# Preparing further analyses
validation_data <- get_data(prof1) %>%
filter(classes_number == 4) %>%
filter(Class_prob==1) %>%
transmute(id,factor(Class), broad_interest, enjoyment, self_efficacy) %>%
arrange(id)
head(validation_data, 10)
# Comparison with other sample
prof3 <- pisaUSA15_complete[501:1000, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:8)
prof3
compare_solutions(prof3, statistics = c("AIC", "BIC"))
# 2. Additional parameters
# Standardized variables
prof_z <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
scale() %>%
estimate_profiles(1:6)
prof_z
# Imputation for missing data
prof_imp <- pisaUSA15[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
single_imputation() %>%
estimate_profiles(1:6)
prof_imp
# Model specification
# model 1: Equal variances, covariances = 0
# model 2: Unequal variances, covariances = 0
# model 3: Equal variances, equal covariances
# model 6: Unequal variances, unequal covariances
prof4 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:6, models=c(1,2,3,6))
warnings()
prof4
compare_solutions(prof4, statistics = c("AIC", "BIC"))