Biostatistics


Theme 1: Personalized Dynamic predictions for Longitudinal Outcomes

Dr. Eleni-Rosalina Andrinopoulou

Prediction models are used widely in medicine to support medical decision-making and health systems planning. The increasing availability of clinical measures (e.g., through electronic medical records) leads to collecting different types of information at each visit. This, however, would pose new and significant challenges when incorporated in the prediction models.

Nowadays, predictions are based on simplistic approaches, such as regression models, that ignore the dynamic nature of the outcome. Further challenges in most real-world data sets include the fact that some important variables that influence the outcome of interest are time-dependent (possibly taken at different time points) and measured with error. A popular framework to analyze repeated measures is the framework of mixed-effects models. Including time-dependent covariates into the mixed-effects models, to obtain dynamic predictions that are updated at each visit, could become challenging.

Multivariate mixed-effects models might be a useful tool to obtain dynamic predictions when such variables are present. We aim to investigate the predictive performance of different models. In particular, we will compare simplified approaches such as regression models with more complex models such as univariate and multivariate mixed-effects models.

Research steps:

Fit different models (regression models, univariate and multivariate mixed models).

Obtain dynamic predictions for new patients from all models

Compare predictions from different models using an internal validation procedure.

Note: Part of the R code (especially for the complicated models) is already available

Theme 2: Using historical data to improve the analysis of clinical studies

Dr. Joost van Rosmalen

When analyzing data of randomized trials, we often have data from previous trials available, which could conceivably be incorporated into the analysis of the current study. Adding these historical data to the analysis may improve the precision of the estimates and the power of the analysis, and thereby reduce sample size requirements. However care must be taken to ensure that the historical data are sufficiently comparable to the data of the current study, to avoid bias in the resulting estimates. A number of statistical methods have been developed that include the historical data when it is sufficiently similar to the current data, but downweight or even discard the historical data in case of substantial differences. These methods include a) meta-analytic-predictive approaches that estimate a parameter for the between-study heterogeneity and b) the power prior, which is a Bayesian method that downweights the historical data by raising its likelihood function to a weight parameter that is estimated from the data.

We have evaluated and compared several approaches for including historical data in a variety of settings, and we have developed methods that allow us to extend the power prior to more complex and realistic models. Future work consists of extending these methodologies to better handle observational (real-world) data, applying the methods in real life, and developing more insight into the circumstances in which the use of historical data can be justified.

Theme 3: Bayesian Modelling of Time-to-event Data in the presence of time-varying covariates, missing values and a hierarchical structure

Dr. Nicole Erler (Department of Biostatistics)

The project is motivated by a study that investigates survival of patients who received a liver transplant indicated by Primary Sclerosing Cholangitis (PSC). Due to recurrence of the disease or other indications, several patients undergo (multiple) re-transplantations. Both, recurrence of the disease and the number of the graft, thus, change over time, which needs to be taken into account in the analysis. Moreover, the data to answer the research question were obtained from a large number of centers, located in different countries, which introduces a hierarchical structure in the data and the model needs to account for the correlation that may be present between observations within the same patient, center and country. The analysis is further complicated by missing values in several of the baseline covariates. Due to the complexity of the data, conventional methods to deal with missing values may not be well suited. In the Bayesian framework, however, it is possible to incorporate all the necessary features into the model.

Besides the analysis of the PSC data, the aim of the project is to explore what methodology (besides the Bayesian approach) is available to appropriately handle data in the given setting. Advantages and disadvantages of the different approaches should be investigated on a theoretical and practical level.

Theme 4: (Not) all patients are equal, but some patients are more equal than others

Dr. Nicole Erler (Department of Biostatistics)

At the beginning of each study, researchers have to face the often frustrating task to determine the sample size necessary to have sufficient power to answer their research question. And often enough, there are just not enough patients. Luckily, there is a simple solution to that, extend the study, ask other medical centres to participate, maybe even extend the study to multiple countries: conduct a multicentre study. More hospitals, more patients, more power. And, as a bonus, larger generalizability of the results, since the patient group in multicentre studies is usually more diverse than that of a single center. But this apparent upgrade of the study to a multicenter study comes at a cost many researchers are not aware of. It complicates the statistical analysis. Or at least it would, if the resulting hierarchical structure of the data would be taken into account properly. How are multicenter studies analyzed in practice? Do researchers take into account that patients within one center may be more similar than patients from other centers? And what are the implications of not taking this into account? The aim of this research project is to answer these questions by means of literature search, a methodological review and simulation study.

Theme 5: Chronic HCV infection – Connecting the dots to better understand the natural disease progression

Dr. N.S. Erler (Department of Biostatistics) and dr. A.J.P. van der Meer (Department of Gastroenterology and Hepatology)

Chronic hepatitis C virus (HCV) infection is a major global health problem with around 71 million people infected, resulting in ~350,000 deaths annually. These deaths are predominantly caused by the development of cirrhosis, at which stage patients are at risk of liver failure and hepatocellular carcinoma. Fortunately, antiviral therapy has developed tremendously over the recent years. Short term treatment with various direct-acting antivirals (DAAs) now cures >95% of patients from their chronic HCV infection, thereby reducing their risk of liver failure, HCC and all-cause mortality. The majority of patients with chronic HCV infection remain undiagnosed, however, and thus untreated. In addition, due to the high costs of the DAAs, some areas still restrict antiviral therapy to those patients with most advanced liver disease. This is problematic because virological cure lowers but not eradicates the risk of complications in patients with a severely damaged livers, who therefore need to remain included in lifelong intensive and costly surveillance programs.

Older natural history studies indicated that approximately 20% of patients develop cirrhosis in 20 years. Due to changing epidemiological characteristics of the population with HCV infection, the rate of disease progression is probably higher today. Besides obtaining an updated estimate of this overall rate, it would be desirable to identify subgroups of patients at low or high risk of fast disease progression. Hereto we have gathered an international retrospective dataset of >5000 patients with chronic HCV infection who have been followed for a median of ~10 years. All laboratory results over time have been collected, as well as clinical events (both liver-related and non-liver-related). First results indicate that the rate of disease progression is indeed much higher than previously anticipated. To make the best use of this wealth of data, sophisticated statistical methods that can fully utilize the information contained in baseline, repeatedly measured and time-to-event data should be used. Missing values in several variables pose an additional challenge and need to be handled appropriately in order to obtain valid estimates of the relation between time-varying laboratory values and clinical outcomes.