PK(PD) dataset assembly with the apmx library

Prepare workspace and load data

This package contains randomly-generated source data for instructional purposes.

library(apmx)
library(dplyr)
library(tidyr)

EX <- as.data.frame(EX)
PC <- as.data.frame(PC)
DM <- as.data.frame(DM)
LB <- as.data.frame(LB)

Background

Clinical trial data is not collected in a way that automatically suits population pharmacometric work. Trial data is organized in a collection of datasets, one dataset per data type. These datasets are often called “domains”.

The FDA and other regulatory agencies require domains be formatted per CDISC standards for submission. There are two main types of CDISC datasets:

SDTM: study data tabulation model (a simple, organized, line-listing of each data point) https://www.cdisc.org/standards/foundational/sdtm
ADaM: analysis data model (datasets with derived values for analysis purposes) https://www.cdisc.org/standards/foundational/adam

Here are some examples of common CDISC SDTM domains (as they relate to pharmacometrics):

ex: exposure (data about administered and planned doses)
pc: pharmacokinetics (data about pharmacokinetic samples)
dm: demographics (general metadata about the subject)
lb: laboratory (chemistry, hematology, lipid, and other lab panel results)
vs: vital signs (height, weight, BMI, and other clinical tests)
cm: conconmitant medications (additional medications taken prior to, during, and/or after treatment)
ae: adverse events (any untoward medical event that occurs after signing informed consent while on trial)
eg: EKG (ECG) readings
tr: tumor response (RECIST 1.1 or other tumor measurements)
rs: response (other response measurements, such as OS, PFS, etc.)

There are many other types of SDTM domains. Technically, there are an infinite number of domains since you can create your own custom domains.

For every SDTM domain, there is usually an ADaM equivalent. All ADaM domains start with ad__, followed by the domain name:

adex: ADaM version of ex

There are some ADaM domains that are specific to the ADaM:

adsl: subject-level (a compilation of many important variables, one row per subject)

Even though this data is well organized, there is no CDISC format for use in NONMEM or other population pharmacometric softwares. That is why we have built an R package, apmx, to provide tools to help build population PK(PD) datasets.

This training will walk you through the R package and help you learn about pharmacometric data. The data loaded above are randomly-generated SDTM-like datasets to support training. They are based on a simple study design:

IP: ABC999, oral tablet formulation
Study drug administered twice, on D1 and D15
Serial PK samples collected following both doses
Additional domains DM and LB are also provided

Currently, the package is limited to PK and PKPD datasets for analysis in NONMEM only. Additional tools for PK(PD) datasets, plus tools for other analysis types (TTE, logistic regression, QTC analysis) are under development and not available at this time. Datasets for analysis with other softwares, such as Monolix, are also unavailable at this time.

Dose event preparation

PK dataset assembly starts with preparing dose events. Dose events require several columns for assembly. Below are the apmx standard names, along with the typical SDTM name equivalent when applicable. Other variables, like DUR (infusion duration), may be required based on the analysis.

USUBJID: subject ID [character]
DTIM (EXSTDTC): date-time of dose administration [character]
VISIT: character visit label [character]
NDAY (EXSTDY): study day [numeric]
TPTC (EXTPT): dose timepoint label [character]
TPT (EXTPTNUM): dose timepoint [numeric]
CMT: assigned compartment for dose events [numeric]
AMT (EXDOSE): amount of drug administered [numeric]
DVID (EXTRT): dose event label [character]
ROUTE (EXROUTE): route of administration [character]
FRQ (EXDOSFRQ): dose frequency [character]
DVIDU (EXDOSU): dose units [character]

The analyst must confirm the ex domain contains all of this information for the package to work. This dataset contains all of the information we need except the compartment. CMT must always be programmed by the user based on the model design. In this case, CMT = 1 for the dose depot. We will also select only the columns that we need for the analysis, dropping the others.

ex <- EX %>%
  dplyr::mutate(CMT = 1) %>%
  dplyr::select(USUBJID, STUDYID, EXSTDTC, VISIT, EXSTDY, EXTPTNUM, EXDOSE,
                CMT, EXTRT, EXTPT, EXROUTE, EXDOSFRQ, EXDOSU)

That’s all we have to do to prepare the dose events for assembly.

PK observation event preparation

Now, we are going to prepare the PK observations. Observation events require several columns for assembly:

USUBJID: subject ID [character]
DTIM (PCDTC): date-time of observation [character]
VISIT: character visit label [character]
NDAY: study day [numeric]
TPTC (PCTPT): observation timepoint label [character]
TPT: observation timepoint [numeric]
CMT: assigned compartment for observation events [numeric]
ODV (PCSTRESN): observation value in original units [numeric]
LLOQ (PCLLOQ): observation lower limit of quantification [numeric]
DVID (PCTEST): observation label [character]
DVIDU (PCTESTU): observation units [character]

The PC domain may have multiple DVIDs and CMTs, perhaps for multiple analytes. Once again, we need to confirm our dataset has all of this information. Are any variables missing?

CMT = 2 for central compartment
There is no numeric timepoint (TPT)
We will have to calculate both ourselves

pc <- PC %>%
  dplyr::filter(PCSTAT=="Y") %>%
  dplyr::mutate(CMT = 2,
                TPT = dplyr::case_when(PCTPT=="<1 hour Pre-dose" ~ 0,
                                       PCTPT=="30 minutes post-dose" ~ 0.5/24,
                                       PCTPT=="1 hour post-dose" ~ 1/24,
                                       PCTPT=="2 hours post-dose" ~ 2/24,
                                       PCTPT=="4 hours post-dose" ~ 4/24,
                                       PCTPT=="6 hours post-dose" ~ 6/24,
                                       PCTPT=="8 hours post-dose" ~ 8/24,
                                       PCTPT=="12 hours post-dose" ~ 12/24,
                                       PCTPT=="24 hours post-dose" ~ 24/24,
                                       PCTPT=="48 hours post-dose" ~ 48/24)) %>%
  dplyr::select(USUBJID, PCDTC, PCDY, VISIT, TPT, PCSTRESN,
                PCLLOQ, CMT, PCTEST, PCTPT, PCSTRESU)

That’s all we have to do to prepare the observation events for assembly.

Simple dataset assembly

We have all of the information we need to build a simple PK dataset. Building a dataset is easy to do with apmx. Just feed the ex and pc domains into apmx::pk_build()!

df_simple <- apmx::pk_build(ex = ex, pc = pc)

This function does a lot! Let’s break down the new variables:

C: this flag comments out problematic records flagged by PDOSEF, TIMEF, AMTF, or DUPF
NSTUDY: numeric version of STUDYID
SUBJID: numeric version of USUBJID
ID: numeric version of USUBJID (counting from 1)
ATFD: actual time since first dose
ATLD: actual time since last dose
NTFD: nominal time since first dose
NTLC: nominal time since last cycle
NTLD: nominal time since last dose
EVID: event ID (NONMEM-required)
MDV: missing dependent variable (NONMEM-required)
DVID: numeric version of DVID
LDV: log-transformed ODV
BLQ: below-limit of quantification flag
DOSENUM: dose number (counting from 1)
DOSEA: most recent administered dose amount
NROUTE: numeric version of ROUTE
NFRQ: numeric version of FRQ
PDOSEF: flag for records that occur prior to the first dose
TIMEF: flag for records where ATFD = NA
AMTF: flag for dose events where AMT = NA
DUPF: flag for duplicated records (same USUBJID, ATFD, EVID, and CMT)
NOEXF: flag for subjects with no dose events
NODV1F: flag for subjects with no observations where DVID = 1
SDF: flag for single-dose subjects
PLBOF: flag for placebo records
SPARSEF: flag for records associating with sparse sampling
TREXF: flag for dose records occurring after the last observation
IMPEX: flag for records impacted by a dose event with imputed time
IMPDV: flag for an observation record with an imputed time
LINE: dataset row number
NSTUDYC: character version of STUDYID
DOMAIN: original domain of event
DVIDC: character version of DVID
TIMEU: time units of time variables
NROUTEC: character version of ROUTE
NFRQC: character version of FRQ
FDOSE: date-time of first dose
VERSN: apmx package version
BUILD: date of dataset creation

pk_build() has optional parameters that can customize the output dataset. Here are all of the options that will affect a simple dataset. Here they are presented in their default state:

df_simple <- apmx::pk_build(ex = ex, #dataframe of prepared dose events
                            pc = pc, #dataframe of prepared pc observation events
                            time.units = "days", #can be set to days or hours.
                            #NOTE: units of TPT in ex and pc should match this unit
                            cycle.length = NA, #must be in units of days, will reset NTLC to 0
                            na = -999, #replaces missing nominal times and covariates with a numeric value
                            time.rnd = NULL, #rounds all time values to x decimal places
                            amt.rnd = NULL, #rounds calculated dose values to x decimal places
                            dv.rnd = NULL, #rounds observation columns to x decimal places
                            impute = NA, #imputation method for missing times
                            sparse = 3) #threshold for calculating sparse/serial distinctions

I recommend setting time.rnd = 3 to make the dataset easier to read.

df_simple <- apmx::pk_build(ex, pc, time.rnd = 3)

Sometimes, you will want a more complicated dataset. Let’s explore additional functionalities of pk_build().

Covariate preparation

For the most part, all covariates can be divided into four categories:

Subject-level, categorical covariates
Subject-level, continuous covariates
Time-varying, categorical covariates
Time-varying, continuous covariates

apmx has a few requirements to help keep track of different kinds of covariates. When you program covariates, you have to follow these rules:

Categorical covariates must be programmed as character-type.
Continuous covariates must be programmed as numeric-type.
Continuous covariates also require a unit variable (character-type).

Let’s start by preparing some subject-level covariates from dm and lb. All subject-level covariate data frames require a USUBJID column. There must only be one row per subject. Covariate names should be clear and easy to interpret.

dm <- DM %>%
  dplyr::select(USUBJID, AGE, SEX, RACE, ETHNIC) %>%
  dplyr::mutate(AGEU = "years") #AGE is continuous and requires a unit

lb <- LB %>% #select the desired labs
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBVST %in% c("Baseline (D1)", "Screening")) %>%
  dplyr::filter(LBPARAMCD %in% c("ALB", "AST", "ALT", "BILI", "CREAT")) %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES))

lb <- lb %>% #select the lab collected immediately prior to first dose
  dplyr::arrange(USUBJID, LBPARAMCD, LBDT) %>%
  dplyr::group_by(USUBJID, LBPARAMCD) %>%
  dplyr::filter(row_number()==max(row_number())) %>%
  dplyr::ungroup()

lb <- lb %>% #finish formatting and add units since all labs are continuous
  dplyr::select(USUBJID, LBPARAMCD, LBORRES) %>%
  tidyr::pivot_wider(names_from = "LBPARAMCD", values_from = "LBORRES") %>%
  dplyr::mutate(ALBU = "g/dL",
                ASTU = "IU/L",
                ALTU = "IU/L",
                BILIU = "mg/dL",
                CREATU = "mg/dL")

Next, let’s prepare some time-varying covariates from lb. All time-varying covariate data frames require a USUBJID and DTIM column.

tast <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="AST") %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, DTIM = LBDT, AST = LBORRES) %>%
  dplyr::mutate(ASTU = "IU/L")

talt <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="ALT") %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, DTIM = LBDT, ALT = LBORRES) %>%
  dplyr::mutate(ALTU = "IU/L")

PD observation preparation

You may want to add PD observations to your dataset. PD observations have the same requirements as pc observations. Unfortunately, apmx does not recognize SDTM/ADaM language for PD observations. That is because there are many types of pd events, with many types of possible formats. You must convert all column names to apmx column names.

For this analysis, we will pretend glucose observations from lb are a meaningful biomarker. Let’s set CMT = 3 for the PD compartment.

pd <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAM=="glucose") %>%
  dplyr::mutate(DTIM = paste(LBDT, "00:00"),
                VISIT = LBVST,
                NDAY = case_when(VISIT=="Screening" ~ -15,
                                 VISIT=="Baseline (D1)" ~ 1,
                                 VISIT=="Visit 2 (D8)" ~ 8,
                                 VISIT=="Visit 3 (D15)" ~ 15,
                                 VISIT=="Visit 4 (D29)" ~ 29,
                                 VISIT=="End of Treatment" ~ 45),
                TPT = 0,
                TPTC = LBTPT,
                ODV = as.numeric(LBORRES),
                DVIDU = LBORRESU,
                LLOQ = NA,
                CMT = 3,
                DVID = LBPARAM) %>%
  dplyr::select(USUBJID, DTIM, NDAY, VISIT, TPT,
                ODV, LLOQ, CMT, DVID, TPTC, DVIDU)

Full dataset assembly

Let’s add all of the new events and covariates to the dataset.

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

First, you’ll notice a warning was issued in the console. We will re-visit the warnings later in this document. Instead, let’s focus on the dataset itself.

There is a new type of row where EVID = 2.

unique(df_simple$EVID)
#> [1] 0 1
unique(df_full$EVID)
#> [1] 2 0 1

These rows capture the date-time and values of time-varying covariates. Sometimes, we want to retain the exact date-time of each time-varying covariate.

The DVID column changed since the last visit.

unique(df_simple$DVID)
#> [1]  1 NA
unique(df_full$DVID)
#> [1] NA  2  1

unique(df_simple$DVIDC)
#> [1] "ABC999"
unique(df_full$DVIDC)
#> [1] NA        "glucose" "ABC999"

There are now two observation events, ABC999 and glucose. The NA rows are for dose and other events.

You’ll notice that all of the covariate names changed a bit. They all received a prefix, and some received a suffix. Why do we do this? Prefixes and suffixes can identify the type of covariate:

Prefix N: categorical, subject-level
Prefix B: continuous, subject-level (baseline)
Prefix T: categorical or continuous, time-varying
Suffix C: character-type for categorical variables
Suffix U: units for continuous variables

If you can’t remember the prefixes and suffixes, that’s OK! We have an additional function to help with that. apmx::cov_find() will return all covariates of particular types in a PK dataset.

apmx::cov_find(df_full, cov = "categorical", type = "numeric")
#> [1] "NSTUDY"  "NROUTE"  "NFRQ"    "NSEX"    "NRACE"   "NETHNIC"
apmx::cov_find(df_full, cov = "categorical", type = "character")
#> [1] "NSTUDYC"  "NROUTEC"  "NFRQC"    "NSEXC"    "NRACEC"   "NETHNICC"
apmx::cov_find(df_full, cov = "continuous", type = "numeric")
#> [1] "BAGE"   "BALB"   "BALT"   "BAST"   "BBILI"  "BCREAT" "TAST"   "TALT"
apmx::cov_find(df_full, cov = "units", type = "character")
#> [1] "BAGEU"   "BALBU"   "BALTU"   "BASTU"   "BBILIU"  "BCREATU" "TASTU"  
#> [8] "TALTU"

Let’s explore the rest of the optional parameters in pk_build().

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3,
                          cov.rnd = NULL, #rounds observation columns to x decimal places
                          BDV = FALSE, #calculates baseline dependent variable for PD events
                          DDV = FALSE, #calculates change (delta) from baseline for PD events
                          PDV = FALSE, #calculates percent change from baseline for PD events
                          demo.map = TRUE, #adds specific numeric mapping for SEX, RACE, and ETHNIC variables
                          tv.cov.fill = "downup", #fill pattern for time-varying covariates
                          keep.other = TRUE) #keep or drop all EVID = 2 rows
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

The dataset is a bit easier to read if we drop the other events. We will do that moving forward for the rest of the tutorial.

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3, dv.rnd = 3,
                          BDV = TRUE, DDV = TRUE, PDV = TRUE,
                          keep.other = FALSE)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

Other covariate methods

Time-varying covariates can be challenging to work with. The pk_build() function can only fill them by date-time. What if date-time is not available in the source data?

The apmx::cov_apply() function will add covariates to a dataset built by pk_build(). It will add time-varying covariates by any time variable, including:

DTIM
ATFD
ATLD
NTFD
NTLC
NTLD
NDAY

Let’s add TAST (time-varying AST) by nominal time instead of actual time.

tast <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="AST") %>%
  dplyr::mutate(NTFD = case_when(LBVST=="Screening" ~ -15, #calculate NTFD from visit code
                                 LBVST=="Baseline (D1)" ~ 1,
                                 LBVST=="Visit 2 (D8)" ~ 8,
                                 LBVST=="Visit 3 (D15)" ~ 15,
                                 LBVST=="Visit 4 (D29)" ~ 29,
                                 LBVST=="End of Treatment" ~ 45)) %>%
  dplyr::mutate(AST = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, NTFD, AST, ASTU = LBORRESU)

df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               sl.cov = list(dm, lb),
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(tast, time.by = "NTFD")

cov_apply() can also add subject-level covariates by any subject identifier.

df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(dm) %>%
  apmx::cov_apply(lb) %>%
  apmx::cov_apply(talt, time.by = "DTIM") %>%
  apmx::cov_apply(tast, time.by = "NTFD")

cov_apply() can also add empirical bayes estimates or exposure metrics. Notice these also get their own prefixes.

Prefix C: exposure metric
Prefix I: empirical bayes estimate

cov_apply() cannot handle units for these parameters at this time.

Let’s try adding exposure metrics and parameter estimates to the dataset. First, we will generate dummy exposures and parameter estimates.

exposure <- data.frame(ID = 1:22, #exposure metrics
                       MAX = 1001:1022,
                       MIN = 101:122,
                       AVG = 501:522)

parameters <- data.frame(ID = 1:22, #individual clearance and central volume estimates
                         CL = seq(0.1, 2.2, 0.1),
                         VC = seq(1, 11.5, 0.5))

df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(dm) %>%
  apmx::cov_apply(lb) %>%
  apmx::cov_apply(talt, time.by = "DTIM", keep.other = FALSE) %>%
  apmx::cov_apply(tast, time.by = "NTFD", keep.other = FALSE) %>%
  apmx::cov_apply(exposure, id.by = "ID", exp = TRUE) %>%
  apmx::cov_apply(parameters, id.by = "ID", ebe = TRUE)

It is recommended you always use pk_build() or cov_apply() to add covariates instead of adding them in yourself. That ensures cov_find() always finds the covariates correctly.

apmx::cov_find(df_cov_apply, cov = "categorical", type = "numeric")
#> [1] "NSTUDY"  "NROUTE"  "NFRQ"    "NSEX"    "NRACE"   "NETHNIC"
apmx::cov_find(df_cov_apply, cov = "categorical", type = "character")
#> [1] "NSTUDYC"  "NROUTEC"  "NFRQC"    "NSEXC"    "NRACEC"   "NETHNICC"
apmx::cov_find(df_cov_apply, cov = "continuous", type = "numeric")
#> [1] "BAGE"   "BALB"   "BALT"   "BAST"   "BBILI"  "BCREAT" "TALT"   "TAST"
apmx::cov_find(df_cov_apply, cov = "units", type = "character")
#> [1] "BAGEU"   "BALBU"   "BALTU"   "BASTU"   "BBILIU"  "BCREATU" "TALTU"  
#> [8] "TASTU"
apmx::cov_find(df_cov_apply, cov = "exposure", type = "numeric")
#> [1] "CMAX" "CMIN" "CAVG"
apmx::cov_find(df_cov_apply, cov = "empirical bayes estimate", type = "numeric")
#> [1] "ICL" "IVC"

Errors and warnings

pk_build() and other apmx functions issue errors/warnings for problematic data. What is the warning we have been receiving this whole time? First, let’s filter our dataset to the one subject triggering the warning:

warning <- df_full %>%
  dplyr::filter(USUBJID=="ABC102-01-005")

nrow(warning)
#> [1] 1
warning$DVIDC
#> [1] "glucose"

This subject has 1 PD observation, no dose or PK observations. Because there is no dose, you cannot calculate ATFD (actual time since first dose). The warning informs you which subjects have this particular problem. This helps you diagnose potential problems with your data. Notice in this instance, the record is flagged by C and TIMEF.

warning$C
#> [1] "C"
warning$TIMEF
#> [1] 1

There are other errors and warnings to help you diagnose your data as well. There is a key difference between the two:

Errors inform you the input data cannot be used to build a dataset. This will require you to review the data and re-format it.
Warnings inform you the data can be used to build a dataset, but there may be problems with it. You should review the data to determine why the warnings are occurring. You don’t need to make them all disappear for the dataset to work in NONMEM successfully.

Errors

What if you are missing a required column in your input domain?

ex_error <- ex[, -5]

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): Column NDAY is missing from the ex dataset.

What if the variable types are incorrect?

ex_error <- ex
ex_error$USUBJID <- 1:42

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): Column USUBJID in ex is not character type.

What if a required value is missing?

ex_error <- ex
ex_error$USUBJID[5] <- NA

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): USUBJID missing in ex for at least 1 row.

What if we program ADDL but not II for dose events?

ex_error <- ex
ex_error$ADDL <- 1

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): If ex contains ADDL, it must contain II

What if date-time is not formatted correctly?

ex_error <- ex
ex_error$EXSTDTC <- substr(ex_error$EXSTDTC, 1, 10)

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): DTIM in ex is not ISO 8601 format.

What if the baseline nominal day NDAY == 0 instead of 1?

ex_error <- ex
ex_error$EXSTDY <- 0

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): NDAY in ex has a 0 measurement. Please confirm day of first dose is nominal day 1 and the day prior to first dose is nominal day -1.

Nominal days can be tricky. The day a patient takes their first dose is day 1. The day before their first dose is day -1. Therefore, there is no study day 0.

What if ADDL and II are both present, but one of them is NA?

ex_error <- ex
ex_error$ADDL <- 1
ex_error$II <- c(rep(1, 41), NA)

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): At least one row in ex has a documented ADDL when II is NA.

What if you only enter a dose domain?

apmx::pk_build(ex)
#> Error in apmx::pk_build(ex): Please enter a pc or pd domain.

What if a pc observation is 0 or negative?

pc_error <- pc
pc_error$PCSTRESN[10] <- 0

apmx::pk_build(ex, pc_error)
#> Error in apmx::pk_build(ex, pc_error): At least one dependent variable in PC is less than or equal to 0.

What if the study code is not included in ex or sl.cov? Note that you can pass the study code variable through sl.cov or ex.

ex_error <- ex %>%
  select(-STUDYID)

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): STUDY column must be included in ex or sl.cov.

What if you have multiple values for a subject-level covariate within one subject?

dm_error <- dm
dm_error$USUBJID[2] <- "ABC102-01-001"

apmx::pk_build(ex, pc, sl.cov=dm_error)
#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): sl.cov has duplicate USUBJID rows.

What if you select a time unit not supported by pk_build?

apmx::pk_build(ex, pc, time.units="minutes")
#> Error in apmx::pk_build(ex, pc, time.units = "minutes"): time.units parameter must be in days or hours.

What if you program DDV and/or PDV without calculating BDV?

apmx::pk_build(ex, pc, pd, DDV=TRUE, PDV==TRUE)
#> Error in apmx::pk_build(ex, pc, pd, DDV = TRUE, PDV == TRUE): object 'PDV' not found

What if you pass the same covariate through multiple dataframes?

ex_error <- ex
ex_error$NSEX <- 0

apmx::pk_build(ex_error, pc, sl.cov = dm)
#> Error in apmx::pk_build(ex_error, pc, sl.cov = dm): NSEX column is duplicated in sl.cov and another dataset. Please include this column in one dataset only.

Note you are allowed to pass other columns through the ex, pc, and pd domains. For example, try adding the column SEX instead of NSEX. If you pass an extra column through ex, pc, or pd, it will not be impacted by the function.

What if you provide a continuous covariate but forget to provide units?

dm_error <- dm %>%
  select(-AGEU)

apmx::pk_build(ex, pc, sl.cov = dm_error)
#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): All numerical covariates in sl.cov need units.

Warnings

These datasets will build, but pk_build() will inform you of potential problems. What if a subject has no covariates, but others do?

dm_warning <- dm
dm_warning <- dm_warning[1:4,]

df_warning <- apmx::pk_build(ex, pc, sl.cov=dm_warning)
#> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following
#> USUBJID(s) have PKPD events but are not in sl.cov: ABC102-01-006,
#> ABC102-02-001, ABC102-02-002, ABC102-02-003, ABC102-02-004, ABC102-03-001,
#> ABC102-03-002, ABC102-03-003, ABC102-03-004, ABC102-04-001, ABC102-04-002,
#> ABC102-04-003, ABC102-04-004, ABC102-04-005, ABC102-04-006, ABC102-04-007,
#> ABC102-04-008

df_warning <- apmx::pk_build(ex, pc, sl.cov = list(dm_warning, lb))

Notice the warning is only triggered if a subject has NO covariates. In the second case, all subjects are included in lb, while only some are in dm. The warning does not issue if the subject has at least 1 covariate. All missing covariate are filled with the missing parameter, default -999.

What if a subject does not have any baseline PD events and BDV|DDV|PDV == TRUE? Notice the warning is only issued if BDV, DDV, or PDV are calculated.

pd_warning <- pd
pd_warning <- pd[3:nrow(pd_warning), ]

df_warning <- apmx::pk_build(ex, pc, pd_warning, BDV=TRUE)
#> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) do not have a baseline glucose observation at or prior to first dose
#> (BDV, DDV, PDV not calculated): ABC102-01-001
#> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-005

df_warning <- apmx::pk_build(ex, pc, pd_warning)
#> Warning in apmx::pk_build(ex, pc, pd_warning): The following USUBJID(s) have at
#> least one event with missing ATFD: ABC102-01-005

What if the source data events occurred out of order? You’ll notice the NTFD of the first observation falls after the next event.

pc_warning <- pc
pc_warning$TPT[1] <- 0.07

df_warning <- apmx::pk_build(ex, pc_warning,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one event that occurred out of protocol order (NTFD is
#> not strictly increasing): ABC102-01-001

What if a dose event is missing AMT? The record is automatically C-flagged and a warning is issued. Note that the PK records for this subject are not C-flagged.

ex_warning <- ex
ex_warning$EXDOSE[1] <- NA

df_warning <- apmx::pk_build(ex_warning, pc,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex_warning, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one dose event with missing AMT: ABC102-01-001

What if there are two events that occur at the same time? Notice how the duplicated events are C-flagged and a warning is issued.

pc_warning <- pc
pc_warning[2, ] <- pc_warning[1, ]
pc_warning$PCSTRESN[2] <- 1400

df_warning <- apmx::pk_build(ex, pc_warning,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one duplicate event: ABC102-01-001

What if you have a long column names? This warning informs you some column names are longer than 8 characters. This will prevent you from converting the dataset to a .xpt file if desired.

dm_warning <- dm %>%
  rename(ETHNICITY = ETHNIC)

df_warning <- apmx::pk_build(ex, pc, sl.cov = dm_warning)
#> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following column
#> name(s) are longer than 8 characters: NETHNICITY, NETHNICITYC

What if your baseline covariates and time-varying covariates are not equivalent at baseline? In theory, all baseline covariates and time-varying covarites should agree at NTFD == 0.

lb_warning <- lb
lb_warning$ALT[1] <- 31

df_warning <- apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt)
#> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-005
#> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): BALT and
#> TALT are not equivalent at first dose (baseline).

Time imputations

Some of our errors and warnings discuss problems with date/time elements of ex and pc. What do you do when you have an event, but the date/time information is missing? pk_build provides two methods for imputing missing times:

Method 1: imputes the nominal time as the actual time. This method is good for simple imputations or for pre-clinical records when date-time was never collected.
Method 2: imputes an estimate of ATFD relative to other events occurring at the same visit. This method is good for phase I/II/III trials

Let’s experiment with these two methods. First, we will drop some date-times from pc and replace them with NA.

pc_impute <- pc
pc_impute$PCDTC[c(4, 39, 73, 128)] <- NA

df_impute <- apmx::pk_build(ex, pc_impute,
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002

This triggers the warning for missing ATFD as expected. Now, let’s try impute method 1.

df_impute_1 <- apmx::pk_build(ex, pc_impute,
                              time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-004

First, notice we have a new warning. We’ll come back to that later. You should also notice that all events have times and the time warning disappeared. The imputation is notated with the IMPEX and IMPDV columns.

nrow(df_impute_1[is.na(df_impute_1$ATFD),]) #number of rows with missing ATFD
#> [1] 0

imputed_events_1 <- df_impute_1 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

IMPDV will flag observation records with an imputed time. IMPEX will flag all records impacted by an imputed dose. You’ll notice we still have a warning for one subject. Let’s find out why.

times_check_1 <- df_impute_1 %>%
  dplyr::filter(USUBJID=="ABC102-01-004")

Notice row 12 has an imputed time ATFD = 14.042. That is because NTFD = 14.042 for that record. However, the dose for this visit was administered a few days late, at time ATFD = 16.053. This imputation puts the post-dose sample two days ahead of the dose. Impute method 1 a poor assumption for this missing date.

Let’s try method 2 to see if that assumption is better. Method 2 takes the late dose into account by estimating the time of the sample relative to the other events that day.

df_impute_2 <- apmx::pk_build(ex, pc_impute,
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

You’ll notice the warning disappears. Let’s check that subject again.

times_check_2 <- df_impute_2 %>%
  dplyr::filter(USUBJID=="ABC102-01-004")

You’ll notice that under this method, when NTFD = 14.042, ATFD = 16.094. Why?

Method 2 will compare the NTFD of the record with missing time to the NTFD of the most recent dose or post-dose observation with a known date/time.
ATFD for the missing event is estimated as the ATFD of the most recent dose + the difference between their NTFD.
For example, the most recent dose occurs at NTFD = 14, ATFD = 16.053
For the imputation at NTFD = 14.042, ATFD = 16.053 + (14.042 - 14) = 16.094 (the number may round a thousandth of a day off)
This is why method 2 is the better method for large studies, phase II/III

What if we are missing a date/time for a dose event? Let’s repeat the experiment.

ex_impute <- ex
ex_impute$EXSTDTC[2] <- NA

df_impute <- apmx::pk_build(ex_impute, pc, #no imputation method
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001

df_impute_1 <- apmx::pk_build(ex_impute, pc, #imputation method 1
                              time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-001

imputed_events_1 <- df_impute_1 %>% #imputed records
  dplyr::filter(IMPDV==1 | IMPEX==1)

Now, a lot of records for subject 1 have IMPEX == 1. This is because all of these observations are associated with a dose with an imputed time. Is method 1 a good assumption?

Imputation method 1 assigns the dose time as ATFD = NTFD = 14.
However, the PK observation times start around ATFD = 12.9.
Because the dose event is out of order, the ATLD is calculated incorrectly.
This assumption places the dose too late and is a poor assumption.

Let’s try method 2 to see the difference. You’ll notice the events are in the correct order and times are imputed successfully.

df_impute_2 <- apmx::pk_build(ex_impute, pc,
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

What if the first dose is missing instead of the second dose? Let’s repeat the experiment, this time with method 2 only since we can assume method 1 won’t work well in this scenario.

ex_impute <- ex
ex_impute$EXSTDTC[1] <- NA

df_impute <- apmx::pk_build(ex_impute, pc, # No imputation method, expect a warning
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001

df_impute_2 <- apmx::pk_build(ex_impute, pc, #imputation method 2
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>% #imputed events
  dplyr::filter(IMPDV==1 | IMPEX==1 | IMPFEX==1)

Notice an extra column was created, IMPFEX.

IMPFEX: imputed time of first dose.
It is only activated when a first dose has an imputed time.
It is applied to all records within a subject.
IMPEX will only apply to all records until the next dose with a known date-time.

One final experiment - what if we are missing date-times from ex and pc? Note all times are imputed successfully and all warnings disappear.

ex_impute <- ex
ex_impute$EXSTDTC[1:2] <- NA

df_impute <- apmx::pk_build(ex = ex_impute, pc = pc_impute, #no impuation method
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex_impute, pc = pc_impute, time.rnd = 3): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002

df_impute_2 <- apmx::pk_build(ex = ex_impute, pc = pc_impute, #imputation method 2
                              time.rnd = 3, impute = 2)

Dataset combination

What if we have multiple studies we want to analyze at once? We could create one large ex, pc, etc. input with each study, or we could use apmx::pk_combine() to combine two datasets built by pk_build().

Let’s create a copy of df_full and change it slightly. We’ll pretend it’s built from a second study, ABC103.

df_full2 <- df_full %>%
  dplyr::filter(DOMAIN!="PD") %>% #remove glucose observations
  dplyr::filter(ID<19) %>% #remove subject 19
  dplyr::group_by(ID) %>%
  dplyr::mutate(NSTUDYC = "ABC103", #update study ID
                USUBJID = gsub("ABC102", "ABC103", USUBJID),
                BAGE = round(rnorm(1, 45, 10)), #re-create all continuous covariates
                BALB = round(rnorm(1, 4, 0.5), 1),
                BALT = round(rnorm(1, 30, 5)),
                BAST = round(rnorm(1, 33, 5)),
                BBILI = round(rnorm(1, 0.7, 0.2), 3),
                BCREAT = round(rnorm(1, 0.85, 0.2), 3),
                TAST = ifelse(NTFD==0, BAST, round(rnorm(1, 33, 5))),
                TALT = ifelse(NTFD==0, BALT, round(rnorm(1, 30, 5)))) %>%
  dplyr::ungroup()

Now, we can combine these two studies together.

df_combine <- apmx::pk_combine(df_full, df_full2)
#> Warning in apmx::pk_combine(df_full, df_full2): Datasets have different number
#> of DVIDs.
#> Warning in apmx::pk_combine(df_full, df_full2): CMT = 3 not included in df2

You’ll notice we have a few more warnings issued with this function. That is because our DVID assignments are different.

unique(df_full$DVID)
#> [1]  2  1 NA
unique(df_full2$DVID)
#> [1]  1 NA

If you forgot to add pd events for study 2, this warning will remind you. For thits tutorial, we will continue to exclude them.

Once we are done creating our dataset, we can read it out with the function apmx::pk_write(). This ensures the dataset is read out in a NONMEM-usable format.

name <- "PK_ABC101_V01.csv"
apmx::pk_write(df_combine, file.path(tempdir(), name))

Dataset documentation

Documenting a dataset is important when working with a team and when sharing work with outside organizations or regulatory agencies. For example, the FDA requires all population pharmacometric analysis datasets be accompanied with a definition file. apmx provides tools to help you document your dataset.

We will start by exploring the definition file feature. The definition file sources variable names from a dataframe of definitions created with apmx::variable_list_create(). It comes pre-filled with definitions for standard apmx variables, and gives you the ability to add your own for covariates and other custom variables. NOTE you do not have to add prefixes and suffixes to this list, just the root term of each covariate (SEX instead of NSEX and NSEXC).

vl <- apmx::variable_list_create(variable = c("SEX", "RACE", "ETHNIC", "AGE",
                                              "ALB", "ALT", "AST", "BILI", "CREAT"),
                           categorization = rep("Covariate", 9),
                           description = c("sex", "race", "ethnicity", "age",
                                           "albumin", "alanine aminotransferase",
                                           "aspartate aminotransferase",
                                           "total bilirubin", "serum creatinine"))

Now, let’s create the definition file.

define <- apmx::pk_define(df = df_combine,
                          variable.list=vl)

You can export the definition file to a word document using the file argument. The project and data parameters can be used to add a custom project name and dataset name to the header of the document. To use this feature, you must use a Word document template with the words “Project” and “Dataset” in the header. You can provide the template of the Word document with the template parameter.

define <- apmx::pk_define(df = df_combine,
                          file = file.path(tempdir(), "definition_file.docx"),
                          variable.list=vl,
                          project = "Sponsor Name",
                          data = "Dataset Name")

Next, let’s create a version log. Version logs are important when we have multiple datasets over a project duration. Datasets can be updated for all sorts of reasons:

Additional covariates added or removed
Additional studies, subjects or events added
Errors are corrected

Similar to the definition function, we can provide a template for formatting. You can also provide a comment to describe the source data. The version log is easiest to use when you read it out as a word document using the file parameter.

vrlg <- apmx::version_log(df = df_combine,
                          name = name,
                          file = file.path(tempdir(), "version_log.docx"),
                          src_data = "original test data")

Open the version log document and take a look around. Notice that there is a column called “Comments”. You can add a comment there in the Word document, and the function will not overwrite it. When you produce a new dataset, call apmx::version_log() again with the new dataset, the most recent dataset, the new dataset name, and the same filepath as the previous log. You will need to use comp_var to group the rows for comparison. For PKPD datasets, we recommend grouping by USUBJID, ATFD, EVID, and DVID. This function will update the version log by adding a new row to the Word document.

Lastly, apmx can help you produce summary tables of your datasets. apmx::pk_summarize() produces three types of summary tables:

BLQ summary
categorical covariate summary
continuous covariate summary

Tables can be stratified by any other categorical covariate in the dataset.

sum1 <- apmx::pk_summarize(df = df_combine)

The summary function has other parameters to help you document the dataset:

strat.by will stratify the dataset by any variable.
ignore.C will remove all C-flagged records from the analysis.
This parameter is on by default.
docx will produce word document versions of the summary tables
pptx will produce powerpoint slides of the summary tables. NOTE: pptx feature is still under development
ignore.request will filter out an expression passed through this parameter.

sum2 <- apmx::pk_summarize(df = df_combine,
                           strat.by = c("NSTUDYC", "NSEXC"),
                           ignore.request = "NRACE == 2")