Latent Profile Analysis (LPA) in R

Arndt Regorz, Dipl. Kfm. & M.Sc. Psychology, 08/25/2023

In the realm of statistical analysis, researchers often grapple with the challenge of unveiling concealed structures within intricate datasets. Latent Profile Analysis (LPA) emerges as a potent tool for addressing this challenge, allowing researchers to discern latent subgroups or profiles within a population based on observed variables. This tutorial shows you how to run a LPA in R.

(Note: When you click on this video you are using a service offered by YouTube.)

Understanding Latent Profile Analysis

Conceptual Framework:
Latent Profile Analysis is grounded in latent variable modeling, a statistical approach seeking to uncover unobservable (latent) constructs influencing observed variables. LPA specifically focuses on identifying distinct groups within a population, assuming that individuals within each group share similar response patterns across a set of observed variables.

Key Assumptions:
LPA relies on assumptions of homogeneity within groups, heterogeneity between groups, and normally distributed residuals representing any unaccounted variability.

Methodological Steps

Data Preparation:
Select a set of observed variables relevant to the underlying constructs of interest, ensuring they are continuous and reflect meaningful aspects of the phenomenon under investigation.

Model Specification:
Determine the number of latent profiles (classes) to be extracted from the data and choose an appropriate model for the observed variables, such as Gaussian or non-Gaussian distributions.

Parameter Estimation:
Estimate model parameters. Refine the model iteratively based on fit indices.

Model Fit Evaluation:
Assess goodness-of-fit using indices like AIC and BIC. Consider practical interpretability and theoretical relevance of the identified profiles.

Practical Applications

Psychology and Behavioral Sciences:
Uncover distinct personality profiles or behavior patterns within a population. Explore underlying subgroups in clinical samples for personalized treatment strategies.

Education:
Identify different learning profiles among students and tailor educational interventions based on the identified profiles.

Market Research:
Segment consumers based on purchasing behavior and preferences. Customize marketing strategies for each identified consumer segment.

R Code for a Latent Profile Analysis

Here is the R code for the Youtube tutorial about LPA:

library(tidyLPA)
library(dplyr)

pisaUSA15_complete <- pisaUSA15[complete.cases(pisaUSA15), ]
summary(pisaUSA15_complete)

# 1. Basic Estimation

# Estimation of profiles
prof1 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:6)
prof1
compare_solutions(prof1, statistics = c("AIC", "BIC"))

prof2 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(6:8)
prof2
compare_solutions(prof2, statistics = c("AIC", "BIC"))

# More fit indices
get_fit(prof1)

# Plot of the profiles
plot_profiles(prof1, rawdata=FALSE)
plot_profiles(prof1[[6]], rawdata=FALSE)

# Distribution plot for a specific profile (density), e.g. 3 class solution
plot_density(prof1[[3]])
plot_bivariate(prof1[[3]])

# Estimates for the variables
get_estimates(prof1[[3]])

# Extracting data, e.g. 4 class solution
get_data(prof1) %>%
filter(classes_number == 4) %>%
arrange(id)

# Example with unclear grouping
get_data(prof1) %>%
filter(classes_number == 4, id==6)

# Preparing further analyses
validation_data <- get_data(prof1) %>%
filter(classes_number == 4) %>%
filter(Class_prob==1) %>%
transmute(id,factor(Class), broad_interest, enjoyment, self_efficacy) %>%
arrange(id)
head(validation_data, 10)

# Comparison with other sample
prof3 <- pisaUSA15_complete[501:1000, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:8)
prof3
compare_solutions(prof3, statistics = c("AIC", "BIC"))

# 2. Additional parameters

# Standardized variables
prof_z <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
scale() %>%
estimate_profiles(1:6)
prof_z

# Imputation for missing data
prof_imp <- pisaUSA15[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
single_imputation() %>%
estimate_profiles(1:6)
prof_imp

# Model specification

# model 1: Equal variances, covariances = 0
# model 2: Unequal variances, covariances = 0
# model 3: Equal variances, equal covariances
# model 6: Unequal variances, unequal covariances

prof4 <- pisaUSA15_complete[1:500, ] %>%
select(broad_interest, enjoyment, self_efficacy) %>%
estimate_profiles(1:6, models=c(1,2,3,6))
warnings()
prof4
compare_solutions(prof4, statistics = c("AIC", "BIC"))