Co-occurrence networks analyze binary indicator data—situations where multiple states can be active (1) or inactive (0) simultaneously. Co-occurrence networks differ from transition networks in that the former methods capture contemporaneous relationships (i.e., which states tend to occur together?) whereas the latter capture sequences (i.e., which states occur after one another?)
Nestimate provides two methods for binary data:
method = "co_occurrence" or "cna") count how
often pairs of states are both active at the same time pointmethod = "ising") use
L1-regularized logistic regression to estimate conditional dependencies,
producing sparse networksThis vignette demonstrates both methods using the
learning_activities dataset—binary indicators of 6 learning
activities across 200 students and 30 time points.
The learning_activities dataset contains 6,000
observations (200 students x 30 time points). At each time point, each
of 6 learning activities is either active (1) or inactive (0).
library(Nestimate)
data(learning_activities)
head(learning_activities, 10)
#> student Reading Video Forum Quiz Coding Review
#> 1 1 1 0 0 1 0 0
#> 2 1 1 0 0 1 1 0
#> 3 1 1 0 0 0 1 1
#> 4 1 1 0 0 1 1 1
#> 5 1 1 1 0 0 1 0
#> 6 1 1 1 1 0 1 1
#> 7 1 1 1 1 0 1 1
#> 8 1 1 1 1 0 1 1
#> 9 1 1 1 1 0 1 1
#> 10 1 0 1 1 0 1 1The 6 activities are:
activities <- c("Reading", "Video", "Forum", "Quiz", "Coding", "Review")
colSums(learning_activities[, activities])
#> Reading Video Forum Quiz Coding Review
#> 2542 2622 2424 2273 2488 2594Multiple activities can be active simultaneously:
Co-occurrence networks count how often pairs of states are both active at the same time point. An edge between A and B indicates they frequently co-occur.
net_cna <- build_network(learning_activities,
method = "co_occurrence",
codes = activities,
actor = "student")
net_cna
#> Co-occurrence Network [undirected]
#> Weights: [2681.000, 3290.000] | mean: 3047.333
#>
#> Weight matrix:
#> Reading Video Forum Quiz Coding Review
#> Reading 6100 3169 3211 2891 3138 3113
#> Video 3169 6262 3183 2903 3278 3290
#> Forum 3211 3183 5692 2942 2958 3045
#> Quiz 2891 2903 2942 5379 2681 2725
#> Coding 3138 3278 2958 2681 5886 3183
#> Review 3113 3290 3045 2725 3183 6186Interpretation: Edge weights represent raw co-occurrence counts summed across all students and time points. Higher weights mean those activities frequently happen together.
Raw counts can be misleading—frequent activities will have high co-occurrence simply because they’re common. Normalizing by expected co-occurrence (under independence) reveals associations beyond base rates.
# View the co-occurrence matrix
round(net_cna$weights, 0)
#> Reading Video Forum Quiz Coding Review
#> Reading 6100 3169 3211 2891 3138 3113
#> Video 3169 6262 3183 2903 3278 3290
#> Forum 3211 3183 5692 2942 2958 3045
#> Quiz 2891 2903 2942 5379 2681 2725
#> Coding 3138 3278 2958 2681 5886 3183
#> Review 3113 3290 3045 2725 3183 6186The diagonal shows how often each activity occurs (self-co-occurrence = frequency).
You can aggregate across time windows before computing co-occurrence. This captures activities that occur in the same temporal neighborhood, not just the exact same time point:
net_cna_windowed <- build_network(learning_activities,
method = "co_occurrence",
codes = activities,
actor = "student",
window_size = 10)
net_cna_windowed
#> Co-occurrence Network [undirected]
#> Weights: [9096.000, 11181.000] | mean: 10230.667
#>
#> Weight matrix:
#> Reading Video Forum Quiz Coding Review
#> Reading 16894 10639 10603 9735 10493 10699
#> Video 10639 17012 10511 9710 10981 11181
#> Forum 10603 10511 15090 9777 9992 10353
#> Quiz 9735 9710 9777 14495 9123 9096
#> Coding 10493 10981 9992 9123 15956 10567
#> Review 10699 11181 10353 9096 10567 16696Larger windows capture broader temporal associations at the cost of temporal precision.
Ising networks use L1-regularized logistic regression to estimate conditional dependencies between binary variables. Each variable is regressed on all others, and the resulting coefficients form the network edges.
Key advantages over simple co-occurrence:
# Aggregate to student-level summaries for Ising (requires cross-sectional data)
student_summary <- aggregate(learning_activities[, activities],
by = list(student = learning_activities$student),
FUN = function(x) as.integer(mean(x) > 0.5))
student_summary <- student_summary[, -1] # Remove student column
net_ising <- build_network(student_summary,
method = "ising",
params = list(gamma = 0.25))
net_ising
#> Ising Model Network [undirected]
#> Sample size: 200
#>
#> Weight matrix:
#> Reading Video Forum Quiz Coding Review
#> Reading 0 0 0 0 0 0
#> Video 0 0 0 0 0 0
#> Forum 0 0 0 0 0 0
#> Quiz 0 0 0 0 0 0
#> Coding 0 0 0 0 0 0
#> Review 0 0 0 0 0 0
#>
#> Gamma: 0.25 | Rule: AND
#> Thresholds: [-0.895, -0.532]The gamma parameter controls sparsity via EBIC model
selection:
gamma = 0: Less sparse (BIC-like selection)gamma = 0.25: Moderate sparsity (default)gamma = 0.5: More sparseIsing estimation produces asymmetric coefficients (A predicting B may
differ from B predicting A). The rule parameter controls
symmetrization:
"AND" (default): Keep edge only if both directions are
non-zero"OR": Keep edge if either direction is non-zeronet_and <- build_network(student_summary, method = "ising",
params = list(gamma = 0.25, rule = "AND"))
net_or <- build_network(student_summary, method = "ising",
params = list(gamma = 0.25, rule = "OR"))
cat("AND rule edges:", sum(net_and$weights != 0), "\n")
#> AND rule edges: 0
cat("OR rule edges:", sum(net_or$weights != 0), "\n")
#> OR rule edges: 0"AND" is more conservative; "OR" retains
more edges.