Usage
=====

ConnInfPy provides three top-level inferential pipelines, all sharing a
common contract: edge-wise statistics → topology-aware enhancement →
permutation null → FWER / FDR-corrected p-values, with positive and
negative effects always tested separately.

Quick reference:

============================================================  ============================================================
Pipeline                                                       Use case
============================================================  ============================================================
:func:`conninfpy.compute_p_val`                                Group / paired / one-sample t-test
:func:`conninfpy.compute_p_val_glm`                            Continuous predictors with confound regression (Freedman–Lane)
:func:`conninfpy.compute_p_val_glm_multi`                      Several contrasts under a shared nuisance model in **one** permutation pass
:func:`conninfpy.compute_p_val_paired_glm`                     Paired A vs B with Δ-level confounds
:func:`conninfpy.analyze`                                      One-shot Fisher-z → ComBat → GLM/t-test → :class:`InferenceResult`
============================================================  ============================================================

Two helper layers wrap them:

- :mod:`conninfpy.harmonize` — multi-site ComBat harmonization + design diagnostics
- :mod:`conninfpy.acceleration` — GPD/gamma tail approximation for permutation acceleration

Input conventions
-----------------

- Connectivity tensors: shape ``(n_subjects, N, N)``, symmetric, zero diagonal.
- Edge weights: Fisher-z transformed correlation coefficients. Use
  :func:`conninfpy.fisher_r_to_z` before any inference call.
- The package returns :class:`~conninfpy.InferenceResult` objects, which
  behave as ``{'positive', 'negative'}`` dicts (canonical keys since
  v2.0) and additionally carry attributes ``positive`` / ``negative`` /
  ``method`` / ``n_permutations`` / ``acceleration`` / ``wall_time_s``.
  The legacy v1.x keys ``'g2>g1'`` / ``'g1>g2'`` (t-test family) remain
  readable but emit a :class:`DeprecationWarning` and will be removed in
  v2.1. F-stat omnibus tests still return ``{'omnibus': arr}``.

t-test pipeline — :func:`compute_p_val`
---------------------------------------

The simplest entry point. Computes per-edge t-statistics, applies an
enhancement operator, builds a permutation null of per-tail max
statistics, and returns FWER- or FDR-corrected p-maps with the +1
Phipson–Smyth correction.

.. code-block:: python

   from conninfpy import compute_p_val, fisher_r_to_z

   group1_z = fisher_r_to_z(group1_corr)   # (n1, N, N)
   group2_z = fisher_r_to_z(group2_corr)

   p = compute_p_val(
       group1_z, group2_z,
       test_type='two-sample',             # or 'paired', 'one-sample'
       method='tfnbs',
       n_permutations=1000,
       e=0.3, h=3.0, n=10,                 # FDR-calibrated regime
       use_mp=True, rng=42,
   )
   # p['positive'] : (N, N) p-map for group2 > group1
   # p['negative'] : (N, N) p-map for group1 > group2

GLM pipeline — :func:`compute_p_val_glm`
----------------------------------------

Edge-wise GLM with Freedman–Lane permutation. Use this when you have a
continuous predictor and want to control for nuisance regressors
(motion, age, sex, site).

Convenience API (recommended):

.. code-block:: python

   import numpy as np
   from conninfpy import compute_p_val_glm, fisher_r_to_z

   Y = fisher_r_to_z(connectivity_matrices)              # (n, N, N)
   age = np.array([25, 30, 28])                          # (n,)
   confounds = np.column_stack([motion, sex])            # (n, p)

   p = compute_p_val_glm(
       Y, interest=age, confounds=confounds,
       stat_type='tstat',                                 # or 'beta', 'fstat'
       method='tfnbs',
       n_permutations=1000,
       e=0.3, h=3.0, n=10,
       use_mp=True, rng=42,
   )
   # p['positive'] : edges where age ↑ → connectivity ↑
   # p['negative'] : edges where age ↑ → connectivity ↓

Advanced API (full design matrix + contrast):

.. code-block:: python

   from conninfpy import build_design_matrix, compute_p_val_glm

   X, contrast = build_design_matrix(interest=age, confounds=motion_sex)
   p = compute_p_val_glm(
       Y, design_matrix=X, contrast=contrast,
       stat_type='tstat', method='tfnbs', n_permutations=1000,
   )

F-contrast (omnibus) for joint multi-row tests — e.g. ≥3-condition
designs or testing several predictors jointly:

.. code-block:: python

   X = np.column_stack([np.ones(n), age, age**2, sex, motion])
   contrast = np.array([
       [0, 1, 0, 0, 0],       # beta_age
       [0, 0, 1, 0, 0],       # beta_age_squared
   ])
   p = compute_p_val_glm(
       Y, design_matrix=X, contrast=contrast,
       stat_type='fstat', method='tfnbs', n_permutations=1000,
   )
   # F-pipeline returns a single non-negative tail:
   # p['omnibus'] : (N, N) FWER-corrected p-map

Multi-contrast GLM in one pass — :func:`compute_p_val_glm_multi`
----------------------------------------------------------------

If you need to test several contrasts of interest under the same
nuisance model — e.g. ``age``, ``sex``, and ``mean_fd`` separately
while treating the others as nuisance — calling
:func:`compute_p_val_glm` once per contrast wastes work: the
reduced-model residual fit and the per-permutation
``X_pinv @ Y_perm`` matrix multiplication are identical. The
multi-contrast wrapper does it in one pass.

.. code-block:: python

   from conninfpy import compute_p_val_glm_multi

   X = np.column_stack([np.ones(n), age, sex, mean_fd])  # (n, 4)
   contrasts = {
       "age":     np.array([0.0, 1.0, 0.0, 0.0]),
       "sex":     np.array([0.0, 0.0, 1.0, 0.0]),
       "motion":  np.array([0.0, 0.0, 0.0, 1.0]),
   }

   results = compute_p_val_glm_multi(
       Y, design_matrix=X, contrasts=contrasts,
       method="tfnbs", n_permutations=5000,
       acceleration="gpd", rng=42,
   )
   # results = {'age': InferenceResult, 'sex': ..., 'motion': ...}
   results["age"].positive          # (N, N) FWER p-map
   results["age"].n_significant(0.05)

For ``K`` contrasts the wall-time is roughly that of a single
:func:`compute_p_val_glm`` call rather than ``K`` calls — typically a
3× speedup for the canonical age + sex + motion design.

By default the reduced model excludes any column touched by *any*
contrast in the dictionary. Pass ``nuisance_contrast=`` to override
explicitly, e.g. when you want sex and motion treated as nuisance for
the age contrast even though you also test them separately. F-stat /
multi-row contrasts are unsupported in this wrapper — call
:func:`compute_p_val_glm` once per omnibus test.

Paired A vs B with Δ-level confounds — :func:`compute_p_val_paired_glm`
------------------------------------------------------------------------

When you have a within-subject contrast (task A vs B) and confounds
that differ between conditions (e.g. condition-level motion):

.. code-block:: python

   from conninfpy import compute_p_val_paired_glm

   p = compute_p_val_paired_glm(
       Y_task_A, Y_task_B,                            # both (n, N, N)
       confounds_A=np.column_stack([motion_A, drowsiness_A]),
       confounds_B=np.column_stack([motion_B, drowsiness_B]),
       method='tfnbs', n_permutations=1000,
       e=0.3, h=3.0, n=10,
   )

With no confounds, this delegates to ``compute_p_val(test_type='paired')``
(sign-flip permutation, the exact non-asymptotic null). With Δ-level
confounds it constructs Δ_Y = Y^A − Y^B, Δ_C = C^A − C^B and runs a
one-sample GLM on the differences.

Methods (enhancement operators)
-------------------------------

``compute_p_val(..., method=...)`` and ``compute_p_val_glm(..., method=...)``
both accept:

==================  ================================================================
``method``          Operator
==================  ================================================================
``'tstat'``         No enhancement; per-tail max-statistic FWER on raw t / β / F
``'tfnbs'``         Threshold-Free NBS (Baggio 2018)
``'nbs'``           Classical Network-Based Statistic (Zalesky 2010); supply
                    ``threshold=2.0`` and ``nbs_stat='extent'`` or ``'intensity'``
``'cnbs'``          Constrained NBS (Noble & Scheinost 2020); supply ``net_labels=``
``'ni_tfnbs'``      Network-Informed TFNBS (this work); soft block-density prior;
                    supply ``net_labels=``
``'fbc_tfnbs'``     Functional-Block-Clustering TFNBS (this work); hard block
                    prior; supply ``net_labels=``, ``min_cluster_size=3``
``'bh_fdr'``        Parametric Benjamini–Hochberg FDR (no permutation)
``'bh_fdr_perm'``   Empirical edge-wise BH-FDR via permutation null
==================  ================================================================

Multi-site harmonization — :mod:`conninfpy.harmonize`
-----------------------------------------------------

Native NumPy implementation of parametric empirical-Bayes ComBat.
No ``neuroHarmonize`` or ``neurocombat`` dependency.

End-to-end with confound-aware GLM:

.. code-block:: python

   import numpy as np
   from conninfpy import (
       fisher_r_to_z,
       combat_harmonize, design_diagnostics,
       compute_p_val_glm,
   )

   # Y_corr : (n, N, N) per-subject Pearson correlation matrices
   # sites  : (n,) acquisition-site labels
   # age, sex, mean_fd : per-subject phenotype
   Y = fisher_r_to_z(Y_corr)

   # 1. Harmonize site effects, preserving age + sex + motion as biology
   preserve = np.column_stack([age, sex, mean_fd])
   Y = combat_harmonize(Y, sites=sites, preserve=preserve).Y_adjusted

   # 2. Sanity-check the design before running anything expensive
   X = np.column_stack([np.ones_like(age), age, sex, mean_fd])
   diag = design_diagnostics(X, names=["intercept", "age", "sex", "mean_fd"])
   # reports condition number + per-column VIF + plain-English flags

   # 3. TFNBS on age, partialling out sex + motion (Freedman–Lane)
   p = compute_p_val_glm(
       Y, interest=age,
       confounds=np.column_stack([sex, mean_fd]),
       stat_type='tstat', method='tfnbs',
       e=0.3, h=3.0, n=10,
       n_permutations=200, acceleration='gpd',
       use_mp=True, rng=42,
   )

For cross-site machine-learning transfer (fit ComBat on training
cohorts, freeze, apply to held-out sites at test time), use
:func:`combat_fit` / :func:`combat_apply` separately.

Permutation acceleration — :mod:`conninfpy.acceleration`
--------------------------------------------------------

Set ``acceleration='gpd'`` (or ``'gamma'``) on either pipeline to
replace the empirical permutation p-value formula with a fitted
parametric tail (Generalized Pareto Distribution, Winkler 2016b).
Reduces the perm budget from ~5000 to ~200 with ~25× wall-clock saving.
Reproduces the empirical FWER-corrected p-values to within
``|Δ(-log10 p)| ≤ 0.001`` on >99% of edges in the ConnInfPy ABIDE Age
validation. A goodness-of-fit guard (Anderson–Darling on tail
exceedances) falls back to empirical p-values when the GPD does not
fit cleanly.

Topology-aware default integration steps
----------------------------------------

The TFCE integral is evaluated at ``n`` thresholds. Defaults:

- ``n=100`` for direct scoring (single-shot, high resolution; via
  :func:`conninfpy.get_tfnbs_score`)
- ``n=10`` inside the permutation loop (Hao 2024 reports n=10 is
  sufficient for FDR control on network data)

The exponents ``(e, h)`` accept either scalars or equal-length lists.
Lists are zipped pairwise into a single 3-D batched call, so a 16×11
(E, H) grid runs 176 combos in one ``compute_p_val`` invocation without
multiplying the runtime.

See :mod:`conninfpy.defaults` for the full list of constants and
citations (Smith & Nichols 2009, Baggio 2018, Vinokur 2023, Hao 2024).