Evaluating Treatment Tolerability Using the Toxicity Index With Patient-Reported Outcomes Data

Context. Summarizing longitudinal symptomatic adverse events during clinical trials is necessary for understanding treat-ment tolerability. The Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) provides insight for capturing treatment tolerability within trials. Tolerability summary measures, such as the maximum score, are often used to communicate the potential negative symptoms both in the medical literature and directly to patients. Commonly, the proportions of present and severe symptomatic adverse events are used and reported between treat-ment arms among adverse event types. The toxicity index is also a summary measure previously applied to clinician-reported CTCAE data. Objectives. Apply the toxicity index to PRO-CTCAE data from the COMET-2 trial alongside the maximum score, then pres-ent and discuss considerations for using the toxicity index as a summary measure for communicating tolerability to patients and clinicians


Introduction
Cancer clinical trials have utilized the National Cancer Institute's (NCI) Common Terminology Criteria for Adverse Events (CTCAE) for decades to facilitate a standardized process for clinicians to observe and rate therapeutic toxicities, or side effects, impacting patient health.In this setting, toxicity generally refers the level of damage an experimental treatment can have on the body's organs or entire system.Though the CTCAE is critical for tracking toxicity, clinicians may miss up to half of symptomatic burden related to treatment side effects compared to routine patient self-reporting. 1,2As such, a Patient-Reported Outcomes (PRO) version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) was developed.The PRO-CTCAE is a self-administered library of 124 items evaluating the frequency, severity, interference, or presence of 78 symptomatic adverse events.More broadly, patient reported outcomes (PROs) are reports directly made by the patient about their health status without inference from an observing clinician. 3In cases where PROs encompass symptoms related to treatment side effects such as the PRO-CTCAE, these PROs provide valuable information about toxicity.In a trial, information about toxicity is one of the contributing factors in understanding the extent to which overt adverse events affect a patient's willingness and ability to continue the treatment regimen. 4,5This is referred to as treatment tolerability.Implementing a trial-specific subset of the PRO-CTCAE library allows clinicians and investigators a more comprehensive understanding of tolerability from a patient's point of view, specifically the negative impacts of treatment so it can be weighed against a demonstrated efficacy.
Continued use of patient-centered health outcome measures motivated the NCI Cancer Moonshot Initiative to accelerate improvements in toxicity and tolerability reporting and analysis methods.Traditional methods of reporting clinical trial tolerability consist of aggregating a patient's overall adverse event experience into a single numeric value.Typically, these tolerability summary measures reflect the single most severe symptomatic adverse event during a trial period for each patient.Conveying tolerability by treatment arm can then be achieved by reporting the proportion of patients, whom at worst, responded with present or severe symptomatic adverse events (e.g., "45% of patients experienced grade 3 or higher pain severity while on Treatment X").These unambiguous dichotomizations of severity levels provide clinicians an easily interpretable metric for communicating with patients in a way they are likely to understand.This may enable patients to set appropriate expectations, empower them to take part in the treatment plan, and anticipate potential symptom management.
There is a tradeoff in using a simple proportion for the purposes of interpretability.From an analytical point of view, this single worst adverse event summary measure can fall short in reflecting the fluctuation of symptoms during treatment.This can be most evident when statistically discriminating tolerability between arms.Temporal profiles of symptomatic adverse event burden are often indistinguishable among acute, chronic, cumulative, cyclic, or late incipient treatment toxicity. 5Incorporating these longitudinal profiles can be critical for fully characterizing tolerability as both isolated-severe and persistent-moderate symptoms have been shown to correlate with decrements in quality of life. 6here are various graphical techniques and statistical strategies proposed in the literature that attempt to capture broader aspects of adverse event data beyond the maximum grade. 7Graphical approaches enable a deep dive into longitudinal profiles via visual inspection within individual adverse event categories.A more holistic interpretation of tolerability can be accessible this way; however, it is not suitable for a succinct reporting of all adverse events.Although necessary, even the more complex statistical strategies can be constrained by the difficult or narrow interpretation of the results.A summary measure that aims to overcome these challenges is the toxicity index.The toxicity index was designed to incorporate the most severe grade and the frequency of all lower grade adverse events, resulting in a single summary measure for each patient. 8Recent applications of the toxicity index using CTCAE data show potential gains in statistical power when using a probability index modeling approach. 9,10To date, the toxicity index has not been applied to PRO-CTCAE data.Given the high rate of symptoms reported by patients at baseline, methods for accounting for pre-existing symptoms are likely needed when applying the toxicity index to PRO-CTCAE data. 11n this study, we present an application of the toxicity index to PRO-CTCAE tolerability data alongside a typical application of the maximum score.We apply a standard baseline adjustment approach to account for pre-existing symptoms to both summary measures sideby-side with the unadjusted results.Finally, we discuss considerations for reporting and interpreting the toxicity index as a summary measure when applied to PRO-CTCAE tolerability data.

Study Data
The toxicity index summary measure was investigated using PRO-CTCAE data from the COMET-2 trial − a phase 3, 1:1 randomized, double-blind, placebocontrolled trial with a primary pain endpoint comparing cabozantinib and mitoxantrone-prednisone among men with previously treated symptomatic castrationresistant prostate cancer.In one arm, cabozantinib was administered as the experimental treatment with mitoxantrone-matched placebo infusion, plus prednisone-matched placebo.In the other arm, mitoxantrone was administered, plus prednisone and cabozantinibmatched placebo.Details on clinical findings and trial design are reported elsewhere. 12RO-CTCAE items were assessed at baseline, one and two months, and every two months thereafter over the study period.PRO-CTCAE items included constipation, decreased appetite, diarrhea, fatigue, insomnia, nausea, numbness or tingling in the hands or feet, pain, shortness of breath, and vomiting.Respective symptom item frequency, severity, and/or interference attributes were evaluated as specified per NCI PRO-CTCAE Item Library Version 1.0.PRO-CTCAE composite grades were computed from the individual items scores. 13The PRO-CTCAE composite grades create a single grade per PRO-CTCAE symptom item group on a scale akin to other common adverse event tools like CTCAE or MedDRA.

Evaluating Tolerability
The toxicity index is a summary measure aimed at ranking patients within a clinical trial by their respective adverse event experience over the trial.Those with more severe and frequent adverse events will have a higher toxicity index than those with less severe and infrequent adverse events.To construct the toxicity index for an individual patient, their observed adverse event grades over the study period are first ordered descending in severity, then Formula 1 is applied to these ordered data.
Formula 1. Toxicity Index Statistic Where m is the number of observed adverse events for a given patient, x i¼1 is the largest adverse event grade, x i¼2 is the second largest adverse event, and so on, up to the smallest adverse event grade, x i¼m .A detailed example of this calculation has been reported previously. 8The resulting toxicity index statistic has two components: an integer and decimal portion.The integer portion is the patient's maximum adverse event grade.The decimal portion is considered the additional adverse event experience that this summary measure seeks to capture and allows for the patients to be ranked as such.For example, Patient A with observed PRO-CTCAE pain severity scores of 3, 3, 4, and 2 will have a toxicity index of 4.775, and Patient B with pain severity scores of 2, 3, and 4 will have a toxicity index of 4.700 (example calculation shown in Table 1).Thus, Patient A is ranked as having worse pain severity over the trial than Patient B. With possible PRO-CTCAE item scores of 0, 1, 2, 3, or 4, and composite grades of 0, 1, 2, or 3, the toxicity index statistic for PRO-CTCAE items and composite grades range from 0 to 4.999. . .and 0 to 3.999. .., respectively.By design of the toxicity index, the accrual of additional adverse event experience will never result in the toxicity index increasing to the next whole unit score or grade above the maximum score or grade.This is convenient as interpreting a grade above the natural range of an adverse event measure (e.g., PRO-CTCAE, CTCAE, MedDRA) is not meaningful.As a result, PRO-CTCAE item and composite grade toxicity index estimates reported here are not rounded to their respective upper bounds (e.g., a toxicity index estimate of 4.999 will be reported as 4.99 in lieu of rounding to 5.00).Those reporting the toxicity index may consider following this rounding exception to avoid interpretation-related confusion.

Adjusting for Baseline Symptoms
A variety of methods are available to account for pretreatment symptomatic burden measured at baseline using summary measures. 14The typical approach is to compare the adverse event rates of only the most severe grade per patient after the baseline trial time point (post-baseline maximum).This approach incorporates all symptoms experienced during treatment, regardless of presumed pathology.Another method is to adjust this post-baseline maximum for pre-existing adverse events (baseline-adjusted maximum).Here emphasis is placed on adverse events worsening during treatment relative to the baseline trial time point and are deemed treatment emergent.The baseline-adjusted maximum is defined as the maximum grade post-baseline if there was at least a 1-grade increase in adverse event grade from baseline; otherwise the baseline-adjusted grade of 0 is given.With CTCAE grading, clinicians typically do not report an adverse event unless it is new or has worsened from baseline.Thus, this direct adjustment for patients' presenting symptomatic burden closely mimics how clinical adverse events are collected.This approach has previously been applied to these PRO-CTCAE data in the COMET-2 trial by Dueck and colleagues. 15long with the unadjusted toxicity index, novel postbaseline and baseline-adjusted versions of the toxicity index were evaluated.The post-baseline toxicity index only included observed PRO-CTCAE scores after the baseline trial time point.The baseline-adjusted toxicity index was defined as expressed in Formula 1 after including only those grades that were worse than the adverse event grade at baseline (similarly to the baseline-adjusted maximum).The baseline grade and subsequent grades that were not worse than the baseline value are excluded from the calculation of the baseline-adjusted toxicity index.An example of this calculation is shown in Table 2.

Statistical Analysis
The proportions of patients with post-baseline and baseline-adjusted maximum PRO-CTCAE score 0 or higher, and 3 or higher, were compared between treatment arms using Fisher's exact tests.Nonparametric methods (Wilcoxon rank-sum tests) were utilized to compare toxicity indexes between arms, due to its multimodal distribution and inherent rank nature.For the same reason, the median was chosen to convey central tendency when summarizing at the arm level.The distributions of the decimal portion of the toxicity index were evaluated within each integer portion using Kolmogorov-Smirnov tests where appropriate.All presented P values are unadjusted for multiple testing and are provided for reference only.The intention is sideby-side presentation of the toxicity index summary measure applied in various fashions.Analyses were performed using the statistical software SAS version 9.4 (SAS Institute Inc., Cary, NC).PRO-CTCAE composite grades and longitudinal bar charts presented later and in Supplemental 1 were created using the statistical software R and the ProAE package. 16

Results
The COMET-2 trial enrolled a total of 119 male participants randomized to study treatment (cabozantinib n=61 or mitoxantrone-prednisone n=58).Of those enrolled, 107 completed a baseline PRO-CTCAE evaluation and at least one follow-up PRO-CTCAE evaluation (cabozantinib n=53 and mitoxantrone-prednisone n=54).Results here reflect these 107 participants.Among them, the number of PRO-CTCAE questionnaires completed ranged from 2 to 17 per participant, with 75% of participants completing five or more questionnaires.Figures showing all PRO-CTCAE individual item and composite grade distributions across trial time points can be found in Supplemental 1, as well as violin plots with overlaid density histograms displaying the toxicity index summary measure distributions.Demographic and disease-related characteristics for the COMET-2 trial are available elsewhere. 12,15RO-CTCAE tolerability rates for present and severe adverse events (scores > 0 and scores ≥ 3, respectively) are reported by arm in Table 3.The mitoxantroneprednisone arm showed generally favorable tolerability among individual PRO-CTCAE items compared to cabozantinib in both present and severe PRO adverse event rates.Tolerability rates among PRO-CTCAE composite grades were significantly higher (worse) in the  cabozantinib arm for decreased appetite, diarrhea, nausea, and vomiting, for both post-baseline and baseline-adjusted rates.Differences in toxicity index between treatment arms shown in Table 4 were directionally consistent with tolerability rate comparisons, as expected.Among significantly different toxicity index distributions across PRO-CTCAE item groups, higher median toxicity indexes were observed in the cabozantinib arm for constipation, decreased apatite, diarrhea, numbness or tingling in hands or feet, and vomiting.Similar to rate comparisons, the cabozantinib arm had higher median toxicity index among composite grades for appetite, diarrhea, nausea, and vomiting, in both post-baseline and baseline-adjusted versions.
The median toxicity index was substantially reduced between post-baseline and baseline-adjustment methods within the constipation, fatigue, insomnia, and pain PRO-CTCAE item groups.For example, the postbaseline median pain severities were 3.50 and 3.67 for cabozantinib and mitoxantrone-prednisone, respectively.This indicates at least 50% of all participants reported multiple pain episodes with at least one being severe after the baseline visit.However, the baselineadjusted medians for pain severity are each 0, indicating that 50% or more of participants did not experience treatment emergent pain.Unsurprisingly, this differing impact of baseline adjustment methods is equivalently observed in Table 3 using the dichotomous tolerability rates (i.e., scores > 0 and scores ≥ 3).The post-baseline tolerability rates for pain severity with maximum score 3 or higher were 60% and 67% for cabozantinib and mitoxantrone-prednisone, respectively.Again, this is consistent with the toxicity index result as roughly 50% or more reported at least one severe pain episode after baseline.Among the baselineadjusted rates, the proportion of participants with maximum score greater than 0 were 19% and 30% for cabozantinib and mitoxantrone-prednisone, respectively (each arm below 50% incidence of treatment emergent pain).Reading the COMET-2 participants' symptomatic pain in this way shows that similar information in percentile description can be gathered from the toxicity index and PRO adverse event rate and are equivalently impacted by the baseline adjustment methods.Fig. 1 shows the longitudinal profiles of pain frequency, severity, and interference scores, as well as composite grade during the trial.Fig. 2 shows the distributions of the toxicity index summary measure.
To further evaluate the characteristics of the toxicity index, the distribution of the decimal portion among unadjusted toxicity index estimates was assessed within each integer portion (Fig. 3).The histograms in Fig. 3 incorporate 3210 toxicity index estimates (one estimate for each of 30 PRO-CTCAE items and composite grades among the 107 participants).This graphically demonstrates how the toxicity index accumulates the additional toxicity (decimal portion) at differing rates within each maximum score (integer portion).Specifically, the set of possible ranks varies within each integer portion.Since interpreting arm medians with differing integer portions may be precarious as the decimal portions are scaled differently, comparisons of decimal portion distributions between treatment arms were carried out individually within integer groups.In the COMET-2 trial, the only statistically significant differences seen in decimal portion between treatment arms were that of decreased appetite within the maximum score groups of 3 (for severity, interference, and composite) and 4 (for interference).

Discussion
In this study, the toxicity index was applied to PRO-CTCAE data for the first time with adjustment for each patient's pre-existing symptoms and evaluated as a tolerability outcome in univariate analyses.Broad agreement was observed between the toxicity index and more typical summary measures like maximum score.However, the median and range of the toxicity index reported by arm were often challenging to interpret.We see this when comparing values between Tables 3  and 4. Admittedly, care must be taken when reporting typical group estimates of the toxicity index directly (mean, median, etc.).Since the distribution of the decimal portion varies within each integer portion, a representation of the decimal portion such as the median toxicity index is not necessarily interpreted equivalently across integer portions.It remains unclear what group-level summary estimates of the toxicity index are most interpretable.
Capturing the longitudinal toxicity experience remains an emerging area of methodological research in treatment tolerability analysis.The toxicity index introduced by Rogatko and colleagues in 2004 is an innovative summary measure accounting for both the multiplicity and severity of adverse events.Rogatko demonstrated that it has useful potential in early-phase clinical trials by creating more sensitive dose limiting toxicity thresholds. 8Some purposive methods have since been demonstrated to highlight amenable approaches accommodating the rank and multimodal nature of the toxicity index.For example, using CTCAE grades, Gresham et al showed that the toxicity index has increased power when using a probability index modeling approach 9 and Razaee et al present a novel framework showing increased power when using their derived method testing for a difference in mean Poisson-limit vector parameters between treatment arms. 10 Each of these methodologies are valued additions to the adverse event literature and may be considered when statistically discriminating treatment arms is of paramount concern.However, specifically for the purposes of communicating comparative tolerability with patients and reporting to a wider scientific audience, we feel the trade-off between interpretability and statistical complexity for the sake of increased power is substantial.Simpler dichotomizations or categorizations of tolerability may have more practical communicative utility than the toxicity index.
It is evident that this ranking measure is statistically convenient where patients with similar adverse event profiles can be precisely ordered by rank.The toxicity index appears most useful for statistical comparisons between treatment arms or between subgroups where interpreting tolerability is of lesser importance relative to statistical power.Though, additional work is needed for more comprehensive applications of the toxicity index and to assess its ability to support clinical decision making.Direct interpretation and associated effect size recommendations need to be outlined.Considerations should also be defined for applying the toxicity index to serial versus episodic adverse event evaluations in the clinical trial setting.For example, PRO-CTCAE evaluations are more likely to record nonzero scores at scheduled visits, while CTCAE evaluations (when captured in a log-style format) typically record a single toxicity grade until the adverse event worsens or reoccurs after resolution.Approaches for handling missing data should also be investigated.A patient with any missing non-zero adverse event scores will have a lower toxicity index than if that data were observed.This implies that the existence of any missing symptomatic adverse event data will result in an underestimation of toxicity index.Simulations of the toxicity index's decimal portion accrual may illuminate these interests in repeated PRO-CTCAE evaluations, interpretable effect sizes, and missing data impacts.This evaluation of the toxicity index has some limitations, several stemming from the characteristics which make PRO-CTCAE unique from CTCAE.For example, the maximum score summary measure is computed from a single observation, whereas a summary measure like the toxicity index is computed from a series of observations.This raises computational questions when applying the baseline adjustment approach and whether it should be applied at the summary measure level or applied to raw data prior to the calculation of the summary measure.Adding to this, patients on study for longer periods or having more frequent serial PRO-CTCAE evaluations will inherently have more opportunity to accrue toxicity index (e.g., weekly versus monthly evaluations per annum).This coupled with the self-reported nature of PRO data, trial participants with more frequent visits and better adherence to fully completing PRO-CTCAE questionnaires may be biased towards a higher toxicity index; specifically, within the decimal portion of the statistic where toxicity is accrued.The impact of missing data on the toxicity index was also not evaluated here.As referred to previously, PRO-CTCAE and CTCAE evaluations observe the absence of symptomatic adverse events differently.This inconsistency in how the respective tools collect data also extends to the means by which missing data are generated.We believe these potential impacts do not jeopardize this study's evaluation of treatment tolerability using toxicity index and are not yet addressed in the literature.

Fig. 1 .
Fig. 1.Distribution of the PRO-CTCAE Pain item group at successive time points during the COMET-2 trial and maximum score post-baseline without and with baseline adjustment.

Fig. 2 .
Fig. 2. Violin plots with overlaid density histograms of the unadjusted toxicity index summary measure distribution for the PRO-CTCAE Pain item group.

Fig. 3 .
Fig. 3. Histograms of the toxicity index decimal portion by integer portion, across all patients and PRO-CTCAE items and composites.

Table 1
Example Calculation of the Toxicity Index

Table 2
Example of Baseline-Adjusted Toxicity Index Procedure (1) and Subsequent Calculation(2) PRO = Patient reported outcome.Within sub-table(2), adverse event scores are show here ordered descending in severity to further illustrate the calculation, with each associated time point in parentheses; i.e., score (time point).

Table 3
Rates of PRO-CTCAE Item Scores and Composite Grades Greater Than 0 and 3 or Higher, by Treatment Arm PRO-CTCAE = Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events.P values reflect Fisher's exact tests comparing frequencies between treatment arms.P values less than 0.05 are bolded.

Table 4
Toxicity Index for PRO-CTCAE Item Scores and Composite Grades, by Treatment Arms Value n median (range) n median (range) P Value n median (range) n median (range) P Value