Simulating Data with Leaspy¶
This example demonstrates how to use Leaspy to simulate longitudinal data based on a fitted model.
The following imports bring in the required modules and load the synthetic Parkinson dataset from Leaspy. A logistic model will be fitted on this dataset and then used to simulate new longitudinal data.
from leaspy.datasets import load_dataset
from leaspy.io.data import Data
df = load_dataset("parkinson")
The clinical and imaging features of interest are selected and the DataFrame is converted
into a Leaspy Data object that can be used for model fitting.
data = Data.from_dataframe(
df[
[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
]
]
)
A logistic model with a two-dimensional latent space is initialized.
from leaspy.models import LogisticModel
model = LogisticModel(name="test-model", source_dimension=2)
The model is fitted to the data using the MCMC-SAEM algorithm. A fixed seed is used for reproducibility and 100 iterations are performed.
model.fit(
data,
"mcmc_saem",
n_iter=100,
progress_bar=False,
)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 6.77s
The parameters for simulating patient visits are defined. These parameters specify the number of patients, the visit spacing, and the timing variability.
visit_params = {
"patient_number": 5,
"visit_type": "random", # The visit type could also be 'dataframe' with df_visits.
# "df_visits": df_test # Example for custom visit schedule.
"first_visit_mean": 0.0, # The mean of the first visit age/time.
"first_visit_std": 0.4, # The standard deviation of the first visit age/time.
"time_follow_up_mean": 11, # The mean follow-up time.
"time_follow_up_std": 0.5, # The standard deviation of the follow-up time.
"distance_visit_mean": 2 / 12, # The mean spacing between visits in years.
"distance_visit_std": 0.75
/ 12, # The standard deviation of the spacing between visits in years.
"min_spacing_between_visits": 1, # The minimum allowed spacing between visits.
}
A new longitudinal dataset is simulated from the fitted model using the specified parameters.
df_sim = model.simulate(
algorithm="simulate",
features=[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
],
visit_parameters=visit_params,
)
Simulate with `simulate` took: 0.05s
The simulated data is converted back to a pandas DataFrame for inspection.
The simulated longitudinal dataset is displayed below.
df_sim.head(10)
| ID | TIME | MDS1_total | MDS2_total | MDS3_off_total | SCOPA_total | MOCA_total | REM_total | PUTAMEN_R | PUTAMEN_L | CAUDATE_R | CAUDATE_L | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 53.0 | 0.155898 | 0.118853 | 0.351597 | 0.368132 | 0.053468 | 0.086003 | 0.714066 | 0.738918 | 0.670682 | 0.422618 |
| 1 | 0 | 54.0 | 0.103933 | 0.143772 | 0.261258 | 0.102074 | 0.034996 | 0.275067 | 0.863962 | 0.870770 | 0.363265 | 0.525199 |
| 2 | 0 | 55.0 | 0.217049 | 0.197348 | 0.219519 | 0.222060 | 0.071889 | 0.178917 | 0.699757 | 0.722475 | 0.532379 | 0.609583 |
| 3 | 0 | 56.0 | 0.085104 | 0.052183 | 0.195426 | 0.169035 | 0.055745 | 0.244480 | 0.837818 | 0.749473 | 0.546638 | 0.622183 |
| 4 | 0 | 57.0 | 0.295789 | 0.090025 | 0.367959 | 0.168639 | 0.191922 | 0.201405 | 0.866743 | 0.862373 | 0.642726 | 0.600590 |
| 5 | 0 | 58.0 | 0.345741 | 0.244821 | 0.172055 | 0.311372 | 0.116332 | 0.250161 | 0.592222 | 0.855079 | 0.659069 | 0.587703 |
| 6 | 0 | 59.0 | 0.268063 | 0.386708 | 0.215542 | 0.356714 | 0.313937 | 0.217469 | 0.785240 | 0.698636 | 0.501990 | 0.522023 |
| 7 | 0 | 60.0 | 0.171233 | 0.155168 | 0.270350 | 0.202249 | 0.195739 | 0.267264 | 0.695824 | 0.621570 | 0.763887 | 0.569596 |
| 8 | 0 | 61.0 | 0.130114 | 0.258922 | 0.213576 | 0.219509 | 0.146115 | 0.315394 | 0.892384 | 0.861153 | 0.761038 | 0.526635 |
| 9 | 0 | 62.0 | 0.113763 | 0.132007 | 0.175709 | 0.179988 | 0.175688 | 0.580131 | 0.880047 | 0.883341 | 0.761242 | 0.642723 |
This concludes the simulation example using Leaspy. Stay tuned for more examples on model fitting and analysis!