Simulating Data with Leaspy¶
This example demonstrates how to use Leaspy to simulate longitudinal data based on a fitted model.
The following imports bring in the required modules and load the synthetic Parkinson dataset from Leaspy. A logistic model will be fitted on this dataset and then used to simulate new longitudinal data.
from leaspy.datasets import load_dataset
from leaspy.io.data import Data
df = load_dataset("parkinson")
The clinical and imaging features of interest are selected and the DataFrame is converted
into a Leaspy Data object that can be used for model fitting.
data = Data.from_dataframe(
df[
[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
]
]
)
A logistic model with a two-dimensional latent space is initialized.
from leaspy.models import LogisticModel
model = LogisticModel(name="test-model", source_dimension=2)
The model is fitted to the data using the MCMC-SAEM algorithm. A fixed seed is used for reproducibility and 100 iterations are performed.
model.fit(
data,
"mcmc_saem",
n_iter=100,
progress_bar=False,
)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 6.81s
The parameters for simulating patient visits are defined. These parameters specify the number of patients, the visit spacing, and the timing variability.
visit_params = {
"patient_number": 5,
"visit_type": "random", # The visit type could also be 'dataframe' with df_visits.
# "df_visits": df_test # Example for custom visit schedule.
"first_visit_mean": 0.0, # The mean of the first visit age/time.
"first_visit_std": 0.4, # The standard deviation of the first visit age/time.
"time_follow_up_mean": 11, # The mean follow-up time.
"time_follow_up_std": 0.5, # The standard deviation of the follow-up time.
"distance_visit_mean": 2 / 12, # The mean spacing between visits in years.
"distance_visit_std": 0.75
/ 12, # The standard deviation of the spacing between visits in years.
"min_spacing_between_visits": 1, # The minimum allowed spacing between visits.
}
A new longitudinal dataset is simulated from the fitted model using the specified parameters.
df_sim = model.simulate(
algorithm="simulate",
features=[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
],
visit_parameters=visit_params,
)
Simulate with `simulate` took: 0.05s
The simulated data is converted back to a pandas DataFrame for inspection.
The simulated longitudinal dataset is displayed below.
df_sim.head(10)
| ID | TIME | MDS1_total | MDS2_total | MDS3_off_total | SCOPA_total | MOCA_total | REM_total | PUTAMEN_R | PUTAMEN_L | CAUDATE_R | CAUDATE_L | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 48.0 | 0.082620 | 0.060121 | 0.196018 | 0.113921 | 0.072225 | 0.294356 | 0.912546 | 0.839195 | 0.568094 | 0.403453 |
| 1 | 0 | 49.0 | 0.049329 | 0.065133 | 0.364824 | 0.123405 | 0.109159 | 0.507428 | 0.882876 | 0.714090 | 0.589740 | 0.594417 |
| 2 | 0 | 50.0 | 0.055948 | 0.056322 | 0.159568 | 0.188488 | 0.051696 | 0.301199 | 0.759015 | 0.713410 | 0.730305 | 0.677861 |
| 3 | 0 | 51.0 | 0.085227 | 0.108017 | 0.263267 | 0.121330 | 0.084365 | 0.368686 | 0.838947 | 0.777509 | 0.586022 | 0.545453 |
| 4 | 0 | 52.0 | 0.073539 | 0.044140 | 0.383070 | 0.152411 | 0.011024 | 0.297112 | 0.784642 | 0.569088 | 0.828638 | 0.777341 |
| 5 | 0 | 53.0 | 0.053657 | 0.114640 | 0.265941 | 0.056333 | 0.176415 | 0.336634 | 0.846059 | 0.759150 | 0.707306 | 0.560700 |
| 6 | 0 | 54.0 | 0.164731 | 0.085557 | 0.154359 | 0.079580 | 0.183357 | 0.298657 | 0.747321 | 0.865993 | 0.706671 | 0.664610 |
| 7 | 0 | 55.0 | 0.049243 | 0.197857 | 0.300925 | 0.180697 | 0.124273 | 0.461153 | 0.754181 | 0.828697 | 0.739238 | 0.640222 |
| 8 | 0 | 56.0 | 0.061109 | 0.121964 | 0.238648 | 0.162513 | 0.131874 | 0.412425 | 0.883213 | 0.804650 | 0.588844 | 0.688872 |
| 9 | 0 | 57.0 | 0.152195 | 0.282577 | 0.229020 | 0.081195 | 0.126841 | 0.362385 | 0.927698 | 0.767671 | 0.716296 | 0.652593 |
This concludes the simulation example using Leaspy. Stay tuned for more examples on model fitting and analysis!