User guide
==========

This guide summarises how to use **Generalized Riesz Regression (GRR)** in this
package.  The main entry point is :func:`genriesz.grr_functional`.

Conceptual procedure
--------------------

To estimate a target parameter :math:`\theta` written as a linear functional of the
outcome regression :math:`\gamma(x) = \mathbb{E}[Y\mid X=x]`, you provide:

- a *functional* ``m`` (either a built-in :class:`genriesz.LinearFunctional` or a plain callable ``m(x_row, gamma)``),
- a feature map (basis) ``phi(X)``, and
- either a Bregman generator object ``generator`` **or** a generator function ``g``.

If you pass a plain callable ``m``, :func:`genriesz.grr_functional` wraps it as
:class:`genriesz.CallableFunctional`.  The callable must be **linear in** the
function argument ``gamma``.

The package then:

1. builds the link function induced by the generator (automatic regressor balancing),
2. fits a Riesz representer model :math:`\hat\alpha(x)`,
3. optionally fits an outcome model :math:`\hat\gamma(x)`,
4. reports RA, RW, ARW, and TMLE estimates with standard errors, confidence intervals,
   and p-values.


Bases
-----

All bases in this library implement the same interface:

- batched input: ``basis(X)`` with ``X.shape == (n, d)`` returns ``(n, p)``,
- single-row input: ``basis(x)`` with ``x.shape == (d,)`` returns ``(p,)``,
- ``basis.fit(X, y=None)`` fits any data-dependent parameters (centers, standardisation, etc.),
- ``basis.copy()`` returns a deep copy (used internally by cross-fitting).

Bases passed to :func:`genriesz.grr_functional` (and all convenience wrappers) are
**copied and re-fitted** inside each cross-fitting fold.  You do not need to call
``fit`` manually before passing a basis to a ``grr_*`` function.


Polynomial
^^^^^^^^^^

Use :class:`genriesz.PolynomialBasis` for polynomial expansions up to a given total
degree.

.. code-block:: python

   from genriesz import PolynomialBasis

   psi = PolynomialBasis(degree=2, include_bias=True)
   psi.fit(X)       # only needed when calling the basis directly
   Phi = psi(X)     # (n, p)

``PolynomialBasis`` implements ``derivative()`` analytically and is compatible with
:func:`genriesz.grr_ame`.


Treatment interactions
^^^^^^^^^^^^^^^^^^^^^^

For binary-treatment causal estimands (ATE, ATT, and DID), interact a base basis
:math:`\psi(Z)` with the treatment :math:`D`:

.. math::

   \phi(X) = [D \cdot \psi(Z),\; (1-D)\cdot\psi(Z)].

Use :class:`genriesz.TreatmentInteractionBasis`:

.. code-block:: python

   from genriesz import PolynomialBasis, TreatmentInteractionBasis

   psi = PolynomialBasis(degree=2, include_bias=True)   # base basis on Z
   phi = TreatmentInteractionBasis(base_basis=psi)      # [D*psi(Z), (1-D)*psi(Z)]

``TreatmentInteractionBasis`` delegates ``derivative()`` to the base basis, so it
inherits AME compatibility from the base.


RKHS bases (RBF kernel)
^^^^^^^^^^^^^^^^^^^^^^^

Three classes approximate the Gaussian (RBF-kernel) RKHS:

- :class:`genriesz.RBFRandomFourierBasis` — random Fourier features (Rahimi & Recht),
- :class:`genriesz.RBFNystromBasis` — Nyström feature map with eigendecomposition whitening,
- :class:`genriesz.GaussianRKHSBasis` — explicit kernel feature map (one basis function per center).

.. code-block:: python

   from genriesz import RBFRandomFourierBasis, RBFNystromBasis, GaussianRKHSBasis

   rff = RBFRandomFourierBasis(n_features=500, sigma=1.0, standardize=True, random_state=0)
   nys = RBFNystromBasis(n_centers=300, sigma=1.0, standardize=True, random_state=0)
   krn = GaussianRKHSBasis(n_centers=300, sigma=1.0, standardize=True, random_state=0)

.. note::

   :class:`genriesz.RBFRandomFourierBasis` implements ``derivative()`` analytically
   (the derivative of :math:`\cos(\omega^\top x + b)` is :math:`-\omega \sin(\ldots)`),
   and is therefore compatible with :func:`genriesz.grr_ame`.
   :class:`genriesz.RBFNystromBasis` and :class:`genriesz.GaussianRKHSBasis` do not
   currently implement ``derivative()`` and cannot be used with ``grr_ame``.


kNN nearest-neighbor indicator basis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Nearest-neighbor matching can be interpreted as a squared-loss Riesz (LSIF)
construction with a **nearest-neighbor indicator** basis:

.. math::

   \phi_j(z) = \mathbf{1}\{c_j \in \mathrm{NN}_k(z)\},

where :math:`\{c_j\}` are fitted centers and :math:`\mathrm{NN}_k(z)` denotes the
:math:`k`-nearest centers of :math:`z`.

For ATE, ATT, and DID, combine with :class:`genriesz.TreatmentInteractionBasis`:

.. code-block:: python

   import numpy as np
   from genriesz import (
       KNNCatchmentBasis, TreatmentInteractionBasis,
       grr_ate, SquaredGenerator,
   )

   gen = SquaredGenerator(C=0.0).as_generator()

   # The basis is fitted on the training fold inside each cross-fitting iteration.
   basis_knn = KNNCatchmentBasis(n_neighbors=5, include_bias=False)
   phi       = TreatmentInteractionBasis(base_basis=basis_knn)

   res = grr_ate(X=X, Y=Y, basis=phi, generator=gen,
                 cross_fit=True, folds=5)

When cross-fitting, each training fold becomes the set of centers, so the feature
dimension :math:`p = n_\text{train}`.  The internal outcome model uses a dual
(Woodbury) solve when :math:`p > n_\text{test}`, keeping cost at :math:`O(n^3)`.

.. note::

   ``KNNCatchmentBasis`` is piecewise-constant and does **not** implement
   ``derivative()``.  It cannot be used with :func:`genriesz.grr_ame`.


Random forest leaf basis
^^^^^^^^^^^^^^^^^^^^^^^^

:class:`genriesz.sklearn_basis.RandomForestLeafBasis` turns any scikit-learn
``RandomForest*`` model into a feature map via one-hot leaf indicators.

.. code-block:: python

   from sklearn.ensemble import RandomForestRegressor
   from genriesz import TreatmentInteractionBasis
   from genriesz.sklearn_basis import RandomForestLeafBasis

   rf  = RandomForestRegressor(n_estimators=200, max_depth=6, random_state=0)
   psi = RandomForestLeafBasis(rf)          # include_bias=False, normalize=True (defaults)
   phi = TreatmentInteractionBasis(base_basis=psi)

This keeps GRR convex (linear in parameters) while using a nonparametric partition
of the covariate space.

.. note::

   ``RandomForestLeafBasis`` normalizes each leaf-indicator row by
   :math:`1/\!\sqrt{T}` (where :math:`T` is the number of trees) by default
   (``normalize=True``).  Without this, the row :math:`\ell_2`-norm grows as
   :math:`\sqrt{T}`, making the effective regularisation scale tree-count dependent.

   Tree-based bases are **piecewise-constant** and do **not** implement
   ``derivative()``.  They cannot be used with :func:`genriesz.grr_ame`.


Neural network feature maps (PyTorch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you install PyTorch (optional), you can use a neural network as a **basis function**
via :class:`genriesz.torch_basis.TorchEmbeddingBasis`.

.. important::

   If you train the neural network jointly inside the GRR objective, you leave the
   convex (GLM) setting.  The recommended approach is:

   1) train the embedding network separately (e.g., supervised pre-training),
   2) call ``phi.fit(X, y, ...)`` to run the optional training loop,
   3) use the frozen embedding as a feature map in GRR.

.. code-block:: python

   from genriesz.torch_basis import MLPEmbeddingNet, TorchEmbeddingBasis

   net = MLPEmbeddingNet(input_dim=X.shape[1], hidden_dims=(64, 32), output_dim=16)

   phi = TorchEmbeddingBasis(net=net, include_bias=True)
   phi.fit(X, y, epochs=10, lr=1e-3, verbose=True)  # optional supervised pretraining

   Phi = phi(X)   # (n, 17)  — 16 embedding dims + 1 bias

.. note::

   ``TorchEmbeddingBasis`` does not implement ``derivative()`` and cannot be used
   with :func:`genriesz.grr_ame` without additional autograd integration.


AME-compatible bases summary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`genriesz.grr_ame` requires ``basis.derivative(X, coordinate)``.

.. list-table::
   :header-rows: 1
   :widths: 40 20 40

   * - Basis
     - AME-compatible
     - Note
   * - :class:`~genriesz.PolynomialBasis`
     - Yes
     - Analytical derivative
   * - :class:`~genriesz.RBFRandomFourierBasis`
     - Yes
     - Analytical derivative of :math:`\cos(\omega^\top x + b)`
   * - :class:`~genriesz.TreatmentInteractionBasis`
     - Depends on base
     - Delegates to base basis
   * - :class:`~genriesz.KNNCatchmentBasis`
     - No
     - Piecewise-constant
   * - :class:`~genriesz.sklearn_basis.RandomForestLeafBasis`
     - No
     - Piecewise-constant
   * - :class:`~genriesz.torch_basis.TorchEmbeddingBasis`
     - No
     - No autograd bridge


Density ratio estimation
------------------------

The function :func:`genriesz.fit_density_ratio` estimates the covariate-shift
density ratio

.. math::

   r(x) = p(x)/q(x)

from two samples ``X_num ~ p`` and ``X_den ~ q``.

* You choose a generator (either a built-in name like ``generator='ukl'`` or a
  custom ``g``).
* The function constructs the corresponding link function
  :math:`(\partial g)^{-1}` automatically.
* It fits the ratio by minimising the induced convex objective.

By default a Gaussian-kernel RKHS basis is used.  You can optionally cross-validate
the bandwidth ``sigma`` and regularisation ``lam``.

.. code-block:: python

   from genriesz import fit_density_ratio

   res = fit_density_ratio(
       X_num,
       X_den,
       generator="ukl",      # "sq", "bkl", "bp", "pu", or a generator instance
       n_centers=200,
       cv=True,
       folds=5,
       sigma_grid=[0.1, 0.3, 1.0, 3.0],
       lam_grid=[1e-3, 1e-2, 1e-1],
       random_state=0,
   )

   r_hat = res.predict_ratio(X_test)

.. important::

   For the **squared generator** (``generator='sq'``, i.e. :class:`genriesz.SquaredGenerator`),
   the fit uses a closed-form ridge solution.  For the **binary KL generator**,
   it uses classification-based density ratio estimation.  For all other generators,
   a numerical optimizer (L-BFGS-B) is used.


Generators and automatic links
-------------------------------

A Bregman generator defines both the loss and the induced link function used for
automatic regressor balancing.

The easiest option is to use one of the built-in generator objects:

- :class:`genriesz.SquaredGenerator` (SQ-Riesz)
- :class:`genriesz.UKLGenerator` (UKL-Riesz)
- :class:`genriesz.BKLGenerator` (BKL-Riesz)
- :class:`genriesz.BPGenerator` (BP-Riesz)
- :class:`genriesz.PUGenerator` (PU-Riesz)

Call ``.as_generator()`` to get a :class:`genriesz.BregmanGenerator` instance:

.. code-block:: python

   from genriesz import SquaredGenerator, UKLGenerator

   gen_sq  = SquaredGenerator(C=0.0).as_generator()

   # UKL/BP with a branch function: use + branch for treated, - for control.
   branch = lambda x: int(x[0] == 1.0)
   gen_ukl = UKLGenerator(C=1.0, branch_fn=branch).as_generator()

You can also define a completely custom generator:

.. code-block:: python

   from genriesz import BregmanGenerator

   gen = BregmanGenerator(g=g, grad=grad_g, inv_grad=inv_grad_g, grad2=grad2_g)

Custom generator call signatures
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A custom generator can be regressor-dependent:

- ``g(x, alpha)``
- optional: ``grad_g(x, alpha)`` (first derivative w.r.t. ``alpha``)
- optional: ``inv_grad_g(x, v)``  (inverse derivative map)
- optional: ``grad2_g(x, alpha)`` (second derivative w.r.t. ``alpha``)

If derivatives are omitted, the package falls back to finite differences and a
Newton solver.

.. important::

   For speed and numerical stability, providing ``inv_grad_g`` (and ideally
   ``grad_g`` and ``grad2_g``) is strongly recommended for custom generators.

Branch functions for UKL and BP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`genriesz.UKLGenerator` and :class:`genriesz.BPGenerator` are defined on
:math:`\{|\alpha| > C\}`, which has two branches (positive and negative).  When the
sign of :math:`\alpha(x)` is known in advance (e.g., positive for treated, negative
for control in ATE/ATT/DID), pass a ``branch_fn`` to avoid ambiguity:

.. code-block:: python

   branch = lambda x: int(x[0] == 1.0)   # 1 = positive branch (treated), 0 = negative
   gen    = UKLGenerator(C=1.0, branch_fn=branch).as_generator()

Without ``branch_fn``, :math:`\text{sign}(v)` is used, which is correct only when
:math:`|\alpha| > C + 1`.  A :class:`UserWarning` is raised if ``branch_fn`` is
omitted.


Estimators, cross fitting, and outcome models
---------------------------------------------

The high-level function :func:`genriesz.grr_functional` can report multiple estimators
at once via ``estimators=(...)``:

- ``"ra"``:   regression adjustment (plug-in)
- ``"rw"``:   Riesz weighting (weighting only)
- ``"arw"``:  augmented Riesz weighting (doubly robust)
- ``"tmle"``: targeted maximum likelihood estimation (one-step fluctuation)

Cross fitting
^^^^^^^^^^^^^

Set ``cross_fit=True`` (default) to enable K-fold cross fitting.  The number of
folds is controlled by ``folds`` (default 5).

With cross fitting:

1. the data are split into ``folds`` training and test splits,
2. all nuisance models (Riesz representer and outcome regression) are fitted on the
   **training** fold and evaluated on the **test** fold,
3. estimates are computed from the full-sample out-of-fold predictions.

This removes the overfitting bias that would arise if the nuisance models were
evaluated on the same data used to fit them.

Outcome models
^^^^^^^^^^^^^^

For RA, ARW, and TMLE you need an outcome regression :math:`\hat\gamma`.  Control it
via ``outcome_models``:

- ``"shared"``  — use the **same basis** (and penalty settings) as the Riesz model.
  When ``riesz_method='grr'`` (the default), the already-fitted Riesz basis is
  **reused** directly — no second fit is performed.  This is important for stochastic
  bases (RBF, Nyström, KNN) where a separate fit would draw different random features.
- ``"separate"`` — use a user-provided ``outcome_basis``.
- ``"both"``    — fit both and report both versions.
- ``"none"``    — skip outcome modeling (then only ``"rw"`` is available).

The outcome link function is specified by ``outcome_link`` (``"identity"`` or
``"logit"``).  TMLE infers the likelihood from this link:

- ``outcome_link="identity"`` ⟹ Gaussian targeting
- ``outcome_link="logit"``    ⟹ Bernoulli targeting

Regularization: :math:`\ell_p`
-------------------------------

For the Riesz model, set:

- ``riesz_penalty="l2"`` for ridge,
- ``riesz_penalty="l1"`` for lasso,
- ``riesz_penalty="lp"`` with ``riesz_p_norm=p`` for general :math:`p \ge 1`,
- shorthand: ``riesz_penalty="l1.5"`` is equivalent to ``"lp"`` with ``p_norm=1.5``.

The outcome model (linear or logistic regression on a basis) supports the same
interface via ``outcome_penalty`` and ``outcome_p_norm``.

When the basis dimension :math:`p` exceeds the number of training observations
:math:`n`, the ridge outcome regression automatically switches to a dual
(Woodbury kernel-ridge) solve:

.. math::

   \hat\theta = \Phi^\top \bigl(\tfrac{1}{n}\Phi\Phi^\top + \lambda I_n\bigr)^{-1}
                \tfrac{\mathbf{y}}{n},

which costs :math:`O(n^3)` instead of :math:`O(p^3)`.  This is triggered
automatically for ``outcome_penalty="l2"`` (``"ridge"``).


Diagnostics and balance checks
-------------------------------

After estimation, ``result.diagnostics`` is populated with several quantities.
See :doc:`diagnostics` for a detailed guide.

Alpha tail statistics (all estimands)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``alpha_abs_mean`` — :math:`n^{-1}\sum_i |\hat\alpha(X_i)|`
- ``alpha_abs_p95``  — 95th percentile of :math:`|\hat\alpha(X_i)|`
- ``alpha_abs_max``  — :math:`\max_i |\hat\alpha(X_i)|`

Large values (relative to the outcome scale) suggest that the estimated Riesz
representer has heavy tails.  This inflates the variance of the RW and ARW estimators
and may indicate that ``riesz_lam`` should be increased or the basis reconsidered.

Covariate balance (ATE, ATT, and DID)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``max_abs_smd_unweighted`` — maximum absolute standardised mean difference before weighting
- ``max_abs_smd_weighted``   — maximum absolute SMD after weighting
- ``ess_treated``, ``ess_control`` — Kish effective sample sizes

A graphical balance check (Love plot) is available via :meth:`genriesz.FunctionalEstimate.love_plot`:

.. code-block:: python

   fig, ax = result.love_plot()