Getting Started with the SkipTrack Package

Model Overview

Welcome to the SkipTrack Package!

SkipTrack is a Bayesian hierarchical model for self-reported menstrual cycle length data on mobile health apps. The model is an extension of the hierarchical model presented in Li et al. (2022) that focuses on predicting an individual’s next menstrual cycle start date while accounting for cycle length inaccuracies introduced by non-adherence in user self-tracked data.

Li et al. (2022) notes that apps designed to help users track their menstrual cycles “are subject to adherence artifacts that may obscure health-related conclusions: if a user forgets to track their period, their cycle length computations are inflated.” This is visualized in the image below in which the numbers represent days after the initial bleeding day is recorded in the app, \(\color{red}{\text{red}}\) days are bleeding days recorded by the user, and \(\color{blue}{\text{blue}}\) days are bleeding days not recorded by the user.

\[\overbrace{\underbrace{\color{red}{1, 2, 3, 4}, 5, \dots, 29}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 29 Days}, \overbrace{\underbrace{\color{red}{30, 31, 32, 33}, 34, \dots, 61}_\text{True Cycle, 32 Days}, \underbrace{\color{blue}{62, 63, 64, 65}, 66, \dots, 90}_\text{True Cycle, 29 Days}}^\text{Recorded Cycle, 61 Days}\]

The SkipTrack model extends the model given by Li et al. (2022) by specifying parameters for each individuals for cycle length regularity, as well as their cycle length mean, and weakening assumptions made by Li et al. on the probability of failing to track a cycle.

In short, the modeling framework assumed by SkipTrack is as follows. The observed cycle lengths are represented with \(y_{ij}\) where \(1 \leq i \leq n\) represents an individual who has contributed \(n_i\) observations, with \(1 \leq j \leq n_i\). We assume that

\[ y_{ij} \sim \text{LogNormal}\big(\mu_i + \log(c_{ij}), \tau_i\big), \] where \(\mu_i\) is an individual level mean parameter, \(\tau_i\) is an individual level precision parameter, and \(c_{ij}\) is an integer-valued parameter representing the number of true cycles present in the observed cycle \(y_{ij}\). That is, if \(c_{ij} = 1\) then \(y_{ij}\) is a true cycle, if \(c_{ij} = 2\) then \(y_{ij}\) gives the length of two true cycles added together, and so on.

We then assume

\[ \mu_i \sim \text{Normal}(\mu, \rho) \mspace{100mu}\tau_i \sim \text{Gamma}(\theta, \phi) \]

where \(\rho\) is a precision parameter, and the Gamma distribution above is parameterized by mean (\(\theta\)) and rate \(\phi\).

This is a fully interpretable model that allows for the identification of skipping in cycle tracking, while allowing for different individual’s regularities, and accounting for uncertainty in the model. A paper discussing the full model details will be published soon.

Package Usage

The SkipTrack package provides functions for fitting the SkipTrack model, evaluating model run diagnostics, retrieving and visualizing model results, and simulating related data. We begin our tutorial by examining some simulated data.

library(skipTrack)

First, we simulate data on 100 individuals from the SkipTrack model where each observed \(y_{ij}\) value has a 75% probability of being a true cycle, a 20% probability of being two true cycles recorded as one, and a 5% probability of being three true cycles recorded as one.

#Simulate data
dat <- skipTrack.simulate(n = 100, model = 'skipTrack', skipProb = c(.75, .2, .05))

names(dat)
#> [1] "Y"          "cluster"    "X"          "Z"          "Beta"      
#> [6] "Gamma"      "NumTrue"    "Underlying"

The result of the simulation function is simply a named list with various components. The (currently) important components are

Looking at the histogram of dat$Y, we can see a clear mixture of at least two distributions, one centered around 30 days, and another centered near 60 days (corresponding to the true cycles and observed cycles containing two true cycles respectively), which is what we expect based on our generation.

#Histogram of observed outcomes
hist(dat$Y, breaks = 10:150)

Fitting the SkipTrack model using this simulated data requires a call to the function skipTrack.fit. Note that because this is a Bayesian model and is fit with an MCMC algorithm, it can take some time with large datasets and a high number of MCMC reps and chains.

In this code we ask for 4 chains, each with 1000 iterations, run sequentially. Note that we recommend allowing the sampler to run longer than this (usually at least 5000 iterations per chain), but we use a short run here to save time.

If useParallel = TRUE, the MCMC chains will be evaluated in parallel, which helps with longer runs.

ft <- skipTrack.fit(Y = dat$Y, cluster = dat$cluster,
                    reps = 1000, chains = 4, useParallel = FALSE)

Once we have the model results we are able to examine model diagnostics, visualize results from the model, and view a model summary.

Diagnostics

Multivariate, multichain MCMC diagnostics, including traceplots, Gelman-Rubin diagnostics, and effective sample size, are all available for various parameters from the model fit. These are supplied using the genMCMCDiag package, see that packages’ documentation for details.

Here we show the output of the diagnostics on the \(c_{ij}\) parameters, which show that (at least for the \(c_{ij}\) values) the algorithm is mixing effectively (or will be, once the algorithm runs a little longer).

skipTrack.diagnostics(ft, param = 'cijs')

#> ----------------------------------------------------
#> Generalized MCMC Diagnostics using lanfear Method 
#> ----------------------------------------------------
#> 
#> |Effective Sample Size: 
#> |---------------------------
#> | Chain 1| Chain 2| Chain 3| Chain 4|     Sum|
#> |-------:|-------:|-------:|-------:|-------:|
#> |  86.077|    81.6|  91.054| 114.178| 372.909|
#> 
#> |Gelman-Rubin Diagnostic: 
#> |---------------------------
#> | Point est.| Upper C.I.|
#> |----------:|----------:|
#> |      1.001|      1.005|

Visualization

In order to see some important plots for the SkipTrack model fit, you can simply use plot(ft), and the plots are directly accessible using skipTrack.visualize(ft).

plot(ft)

Summary

A summary is available for the SkipTrack model fit with summary(ft), with more detailed results accessible through skipTrack.results(ft). Importantly, these results are based on a default chain burn-in value of 750 draws. This can be changed using the parameter burnIn for either function.

summary(ft)
#> ----------------------------------------------------
#> Summary of skipTrack.fit using skipTrack model
#> ----------------------------------------------------
#> Mean Coefficients: 
#> 
#>             Estimate       95% CI Lower 95% CI Upper
#> (Intercept)    3.406              3.376        3.436
#> 
#> ----------------------------------------------------
#> Precision Coefficients: 
#> 
#>             Estimate       95% CI Lower 95% CI Upper
#> (Intercept)     5.36              5.134        5.593
#> 
#> ----------------------------------------------------
#> Diagnostics: 
#> 
#>        Effective Sample Size       Gelman-Rubin
#> Betas                 4004.0                  1
#> Gammas                  21.8                  1
#> cijs                   351.1                  1
#> 
#> ----------------------------------------------------

summary(ft, burnIn = 500)
#> ----------------------------------------------------
#> Summary of skipTrack.fit using skipTrack model
#> ----------------------------------------------------
#> Mean Coefficients: 
#> 
#>             Estimate       95% CI Lower 95% CI Upper
#> (Intercept)    3.407              3.378        3.437
#> 
#> ----------------------------------------------------
#> Precision Coefficients: 
#> 
#>             Estimate       95% CI Lower 95% CI Upper
#> (Intercept)    5.342              5.125        5.569
#> 
#> ----------------------------------------------------
#> Diagnostics: 
#> 
#>        Effective Sample Size       Gelman-Rubin
#> Betas                4004.00                  1
#> Gammas                 21.77                  1
#> cijs                  460.23                  1
#> 
#> ----------------------------------------------------

This introduction provides enough information to start fitting the SkipTrack model. For further information regarding different methods of simulating data, additional model fitting, and tuning parameters for fitting the model, please see the help pages. Additional vignettes are forthcoming.

Bibliography

Li, Kathy, Iñigo Urteaga, Amanda Shea, Virginia J Vitzthum, Chris H Wiggins, and Noémie Elhadad. 2022. “A Predictive Model for Next Cycle Start Date That Accounts for Adherence in Menstrual Self-Tracking.” Journal of the American Medical Informatics Association 29 (1): 3–11.