conninfpy.synth_datasets
Synthetic connectivity datasets for benchmarking and testing.
- conninfpy.synth_datasets.generate_fc_matrices(N, effect_size, mask=None, n_samples_group1=50, n_samples_group2=50, repeated_measures=False, seed=None)[source]
Generate synthetic functional connectivity correlation matrices for groupwise comparisons or repeated measures.
- Parameters:
N (int) – Number of ROIs (regions of interest), i.e. an
N x Nmatrix.effect_size (float) – Magnitude of correlation difference between groups.
mask (np.ndarray, optional) – Binary mask
(N, N)to apply correlation differences.n_samples_group1 (int) – Number of matrices in group 1 (default: 50).
n_samples_group2 (int) – Number of matrices in group 2 (default: 50).
repeated_measures (bool) – If True, generate within-subject repeated-measures (paired) data, otherwise independent groups (default: False).
seed (int, optional) – Random seed for reproducibility.
- Returns:
group1 (np.ndarray) – Array of FC matrices for group 1, shape
(n_samples_group1, N, N).group2 (np.ndarray) – Array of FC matrices for group 2, shape
(n_samples_group2, N, N).(base_cov, mod_cov) (tuple of np.ndarray) – Original covariance matrix and modified covariance matrix with the
effect_sizeintroduced; used for group 1 and group 2 matrices respectively.
Examples
>>> N = 6; e = 0.2; mask = np.zeros((N, N)) >>> mask[0:2, 0:2] = 1; mask[2:4, 2:4] = -1 >>> g1, g2, (c1, c2) = generate_fc_matrices(N, e, mask, 5, 10, seed=0) >>> g1.shape == (5, 6, 6) True >>> g2.shape == (10, 6, 6) True >>> np.allclose(c1, c1.T) True
- conninfpy.synth_datasets.generate_multisite_glm_dataset(n_subjects: int = 30, N: int = 100, n_sites: int = 3, effect_size: float = 0.0, site_shift_sigma: float = 0.2, corr_site_interest: float = 0.0, n_signal_edges: int = 0, base_corr_sparsity: float = 0.9, seed: int | None = None) dict[source]
Multi-site GLM connectivity dataset, sized to mimic a Schaefer-100 study.
Built for the v2.1 full-pipeline calibration tests (tests/test_full_pipeline.py) and for the matrix-level phase of the SC-prior validation work in Projects/NetworkStatistics/_wiki/pseudo_real_validation.md.
Output is in Fisher-z units already (so callers should pass
fisher_z=Falsetoanalyze()). Site effects are additive on Fisher-z; signal is linear in the regressor of interest at a fixed set of signal edges.- Parameters:
n_subjects (int, default 30) – Total subjects, evenly distributed across
n_sites.N (int, default 100) – Number of ROIs. Default 100 mimics Schaefer-100.
n_sites (int, default 3) – Number of distinct sites; each contributes a fixed symmetric Fisher-z offset on all edges.
effect_size (float, default 0.0) – Slope of the linear-in-interest perturbation at signal edges. Set to
0.0for an H₀ dataset (no group / regressor effect).site_shift_sigma (float, default 0.2) – Scale of the per-site additive Fisher-z offset. Each site’s offset is drawn from
N(0, σ_site²)once per simulation.corr_site_interest (float in [0, 1], default 0.0) – Population correlation between
sites(as an integer code) and theinterestregressor.0.0= independent (the H₀-friendly regime);0.6= the regime where unrestricted permutation after harmonisation leaks Type-I.n_signal_edges (int, default 0) – Number of edges carrying the planted signal. Ignored when
effect_size == 0. Edges are drawn uniformly from the upper triangle (excluding diagonal).base_corr_sparsity (float, default 0.9) –
alphaformake_sparse_spd_matrixcontrolling baseline connectivity sparsity (higher → sparser).seed (int, optional) – Reproducibility handle.
- Returns:
"Y"—(n_subjects, N, N)Fisher-z connectivity tensor, symmetric with zero diagonal."interest"—(n_subjects,)regressor of interest."sites"—(n_subjects,)integer site labels."signal_mask"—(N, N)boolean upper-triangle mask of true positive edges (all False wheneffect_size == 0orn_signal_edges == 0).
- Return type:
dict with keys
Notes
Designed for end-to-end exercise of
analyze(Y, interest=..., sites=...): it triggers the auto- preserve, auto-strata, ComBat-with-preserve, Freedman-Lane with within-block exchangeability path in a single call. Calibration tests useeffect_size=0.0(H₀); strata-vs-no-strata sanity tests usecorr_site_interest > 0to bring the leak into view.Examples
>>> data = generate_multisite_glm_dataset( ... n_subjects=24, N=20, n_sites=3, ... site_shift_sigma=0.3, corr_site_interest=0.6, seed=0, ... ) >>> data["Y"].shape (24, 20, 20) >>> data["sites"].shape, data["interest"].shape ((24,), (24,))
- class conninfpy.synth_datasets.ModularDatasetGenerator(N: int, n_modules: int = 5, intra_corr: float = 0.6, inter_corr: float = 0.1, noise_level: float = 0.05, seed: int = None)[source]
Bases:
objectA generator for synthetic functional connectivity data with a modular (block) structure.
This class simulates brain connectivity matrices where nodes are organized into distinct functional modules (e.g., Visual, DMN, Motor). It allows for the injection of specific topological effects (within-module or between-module changes) to simulate pathological conditions.
- get_mask_within_module(module_idx: int) ndarray[source]
Returns a binary mask for all edges WITHIN a specific module.
- get_mask_between_modules(module_idx_A: int, module_idx_B: int) ndarray[source]
Returns a binary mask for all edges connecting module A and module B.
- generate_data(effect_mask: ndarray, effect_size: float, n_samples_g1: int = 50, n_samples_g2: int = 50, time_points: int = 200)[source]
Generates sample correlation matrices for two groups.
Group 1 is sampled from the base modular covariance. Group 2 is sampled from a modified covariance (base + effect).
- Parameters:
effect_mask (np.ndarray) –
Effect mask matrix of shape (N, N).
0 means “no effect” for that edge.
Non-zero values scale the effect magnitude per edge.
The sign of the value controls effect direction (positive/negative).
The matrix is treated as undirected: it will be symmetrized internally and its diagonal will be set to 0.
effect_size (float) – Magnitude of the effect (Cohen’s d-like shift in correlation). Positive values increase correlation, negative values decrease it.
n_samples_g1 (int) – Number of subjects in Group 1 (Control).
n_samples_g2 (int) – Number of subjects in Group 2 (Test).
time_points (int) – Length of the simulated BOLD time-series. Higher values reduce sampling noise.
- Returns:
g1_data (np.ndarray (n_samples_g1, N, N)) – Correlation matrices for Group 1.
g2_data (np.ndarray (n_samples_g2, N, N)) – Correlation matrices for Group 2.
labels (np.ndarray (N,)) – Vector of module assignments for nodes.