.. _nested_fitting: Nested Fitting ============== .. note:: We use :term:`NFP` (Nested Fitting Procedure) and :term:`HPO` (Hyperparameter Optimization) interchangeably to refer to this feature. The main :term:`CLI` entry point is ``tadah hpo``. *NFP* is preferred in prose to emphasise the two-level fitting structure that is specific to Tadah!MLIP; *HPO* is the spelling used by every keyword and source-file identifier. .. figure:: images/nested_fitting.jpg :alt: Nested Fitting Procedure :width: 95% :align: center Workflow of the :term:`nested fitting procedure`: the outer optimiser proposes new :term:`hyperparameters`, the model is retrained, and each trial potential is validated against :term:`performance constraints` that feed back into the global loss. Nested fitting is Tadah!MLIP’s automated, two-level fitting workflow: * **inner loop** — ordinary regression that determines the :term:`learned parameters` :math:`\mathbf w` for a fixed model definition; * **outer loop** — a global optimiser that samples from the :term:`search space constraints` (SSC), retrains the model, and judges the result against user-defined :term:`performance constraints` (PC). Letting a computer search the :term:`hyperparameter` space helps to * escape the “good on the validation set, unstable in MD” trap; * trade accuracy for speed (or vice versa) in a reproducible way; * fold real-world performance constraints — elastic constants, phase stability, surface energies, … — directly into the loss function. Background ---------- A traditional MLIP workflow stops once regression converges on a training/validation split. Unfortunately, a potential that interpolates perfectly may still * disintegrate during an MD run, * produce an incorrect equation of state, * predict the wrong ground-state crystal, * extrapolate poorly beyond the training set. Manual hyperparameter tuning or simply enlarging the training set are both tedious and not guaranteed to succeed. Nested fitting tackles the problem by *measuring* emergent properties during the fit. Each trial potential is dropped into a short LAMMPS run; the resulting quantities are compared to their targets and the discrepancy contributes to a **global loss** .. math:: L_{\text{total}} = \sum_{i} w_i\; \mathcal L_i\!\bigl(|y_i-\hat y_i|\bigr), where :math:`w_i` is the user-supplied weight, :math:`y_i` the target, :math:`\hat y_i` the prediction, and :math:`\mathcal L_i` one of the built-in loss functions (L1, L2, Huber, Tukey, …). The outer optimiser explores the hyperparameter space :math:`\Theta` declared with the ``OPTIM`` directive: .. math:: \theta^\star = \operatorname*{arg\,min}_{\theta\in\Theta} L_{\text{total}}(\theta). Because the model is retrained for every candidate :math:`\theta`, the procedure is computationally expensive but also powerful: it can expose regions of hyperparameter space that yield stable, accurate, and fast potentials. It is not a silver bullet — **success still depends on sensible choices of performance constraints and search space limits.** .. _nfp_quick_start: Quick Start ----------- Nested fitting needs three input files: #. A :term:`training configuration` — the same configuration file you would use for ``tadah train``. See :ref:`ConfigSection` for the full key list. #. A *validation* file — a list of validation datasets passed via the ``DBFILE`` keyword. #. An :term:`HPO configuration` — the file documented on this page. Conventionally named ``config.hpo``. The driving command line is :: tadah hpo --config --hpotarget --validation A minimal ``config.hpo``: .. code-block:: none OPTIMIZER STRATEGY TYPE PLAIN LIB DLIB ALGO BFGS MAXEVAL 50 ENDSTRATEGY ENDOPTIMIZER LOSS L2 PC_ERMSE 0 1.0 OPTIM RCUT2B (1) 4.0 6.0 See :ref:`cli_example_2` for a slightly bigger walkthrough, and :ref:`hpo_examples` for a graded series of fifteen worked examples. .. _nfp_manual: Configuration file ------------------ This section is the reference for the HPO configuration file. For an even terser keyword reference — every flag, every default, plus the *OPTIM-able context keys* table — see the in-tree `HPO/README.md `_. Block hierarchy ............... The HPO configuration is a sequence of *root-level* keywords (``LOSS``, ``OPTIM``, ``PC_*``, ``OUTPUT``, …) plus a single ``OPTIMIZER`` ``ENDOPTIMIZER`` block. The optimiser block has nested blocks, listed in the order they normally appear: .. code-block:: none OPTIMIZER INIT ... ENDINIT # optional: seeding strategy chain NOISE ... ENDNOISE # optional: objective-noise calibration STRATEGY ... ENDSTRATEGY # required: TYPE + driver + decorators EXPLORE ... ENDEXPLORE # required for TYPE PRESTAGE REFINE ... ENDREFINE # required for TYPE PRESTAGE INNER ... ENDINNER # optional: NLopt MLSL/AUGLAG inner RESTART ... ENDRESTART # optional decorator ENDOPTIMIZER Block keywords are case-sensitive. Comments start with ``#``; long lines may be continued with a trailing ``\``. A legacy form without a ``STRATEGY`` block — ``LIB``/``ALGO``/``MAXEVAL`` written directly inside ``OPTIMIZER`` — is still accepted and behaves exactly like ``STRATEGY { TYPE PLAIN ... }``. It is fine for short single-shot runs but cannot express ``MULTISTART``, ``PRESTAGE``, init chains, or ``RESTART`` decorators; those features require the explicit grammar. .. _loss_functions: LOSS — global loss function ........................... :: LOSS [] The default loss applied to every metric that does not override it. ``PC_LAMMPS`` per-variable lines may pick a different loss locally. ================ =============== ============================== Name Extra params Comment ================ =============== ============================== ``L1`` — :math:`|x|` ``L2`` — :math:`x^2` ``HUBER`` ``δ`` quadratically near zero, linear beyond δ ``TUKEY`` ``c`` redescending, zero influence for ``|x| > c`` ``LOG_COSH`` — smooth alternative to L1 ``RMSLE`` — log-scaled L2 (non-negative targets) ================ =============== ============================== Root-level loss control: .. list-table:: :header-rows: 1 :widths: 25 75 * - Key (type) - Description * - ``FAILSCORE`` (float) - Cap on total loss; values above are clipped (default: max double). May be overridden per-script with ``--failscore`` on a ``PC_LAMMPS`` line. .. _search_space_constraints: OPTIM — search space constraints ................................ The ``OPTIM`` directive declares which configuration keys the outer optimiser may vary, and over what bounds: .. code-block:: none OPTIM (indices) * ```` — any optimisable key from the training configuration. For the full list of OPTIM-safe keys (and the tightly enumerated set of *unsafe* ones, e.g. ``RCUT2B`` with ``D2_EAM``), see the *OPTIM-able context keys: reinit-safety reference* table in `HPO/README.md `_. * ``(indices)`` — selects a subset of the values bound to ````. Indices follow the order in the training configuration and start from **1**. Comma-separated lists, ranges ``a-b``, and strides ``a-b:s`` are all accepted (e.g. ``(1,4,7-10:2)``). * ````, ```` — floating-point bounds. ``high`` must be strictly greater than ``low``. ``OPTIM`` lines may be repeated for the same key with different indices. .. _performance_constraints: PC_* — performance constraints .............................. Energy / force / stress validation metrics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Validation-set metrics are evaluated on every ``DBFILE`` listed in the ``--validation`` file. Each line takes a target value and a weight: .. code-block:: none PC_ ================== ================================================================= Key Quantity ================== ================================================================= ``PC_EMAE`` Energy mean absolute error (per atom) ``PC_ERMSE`` Energy root-mean-square error (per atom) ``PC_ErRMSE`` Energy *relative* RMSE ``PC_ERSQ`` Coefficient of determination (R²) for energies ``PC_FRMSE`` Force component RMSE ``PC_SRMSE`` Stress component RMSE ================== ================================================================= These constraints use the global ``LOSS`` selected above. Physics-informed constraints (PC_LAMMPS) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``PC_LAMMPS`` runs a regular LAMMPS script against each trial potential and feeds one or more LAMMPS *equal-style* variables back into the loss: .. code-block:: none PC_LAMMPS --script in.mysim \ --varloss myVar 0 100 \ --varloss myOtherVar 145 10 A long ``PC_LAMMPS`` line may be split with a trailing back-slash (``\``). Tadah!MLIP creates an isolated ``LAMMPS_NS::LAMMPS`` instance per script per worker thread; instances are reused across iterations (the script's own ``clear`` directive resets simulation state between runs, so existing scripts work unmodified). Required options ^^^^^^^^^^^^^^^^ ``--script `` LAMMPS input file containing the variable definitions. ``--varloss [loss-type p₁ p₂ …]`` The variable ```` (defined inside the LAMMPS script) is read at the end of the run and combined with ```` and ```` via the global ``LOSS`` (or the per-line override). Optional trailing tokens override the loss for this variable only. Additional options ^^^^^^^^^^^^^^^^^^ ``--invar `` Inject a variable (equivalent to LAMMPS' ``-var`` flag), letting the same script be reused for several structures or pressures. ``--outvar `` Record the variable in ``outvar.tadah`` without contributing to the loss. Useful for diagnostic plots. ``--failscore `` Override the global ``FAILSCORE`` for this script. If LAMMPS crashes or the loss exceeds this value the optimiser receives ``FAILSCORE`` for the whole iteration. ``--ncpu `` Request *m* MPI ranks for this script in MPI builds. In the default serial build the option is parsed and accepted but does not change runtime; HPO logs a stderr warning when ``--ncpu > 1`` is set in a non-MPI build. See :ref:`performance_parallelism`. ``--hotvar []`` Declare this ``PC_LAMMPS`` line as *hot*: a two-phase evaluator for parameters that other scripts depend on. Hot scripts run alone in **phase A** of each iteration (in parallel with other hot scripts), expose ``variable string …`` from the LAMMPS script (see :ref:`warmstart_protocol` for the required ``string``-style readback), and the returned value is validated against ``[min, max]``. Validated values are cached per ``(script, name)`` and propagated as ``${}`` strings into every cold (no-``--hotvar``) ``PC_LAMMPS`` script in **phase B**. ```` is optional; when omitted the midpoint of ``[, ]`` seeds the first iteration. If a hot value lands outside ``[min, max]`` HPO fail-scores the iteration and skips phase B. Per-metric bad-pot pre-filter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``PC_ --skip-above `` and ``PC_ --skip-rel `` attach a per-metric pre-filter on the dataset error counters (``PC_EMAE``, ``PC_ERMSE``, ``PC_ErRMSE``, ``PC_ERSQ``, ``PC_FRMSE``, ``PC_SRMSE``). Both gates fire **before** any ``PC_LAMMPS`` work for the iteration: * ``--skip-above `` fail-scores the iteration when the metric exceeds the absolute ````. * ``--skip-rel `` fail-scores when ``metric > ratio × running_best`` for that same metric. Both filters can be applied to the same metric line. They short-cut obviously bad potentials so wall-time is not spent on LAMMPS phases that would never compete. Result contract for user scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ LAMMPS scripts driven by HPO must communicate their results back *only* through ``--varloss`` / ``--outvar`` (LAMMPS ``equal``-style variables). They should avoid writing files (dumps, restarts, ``write_data``, custom logs) unless filenames are made unique, because all OpenMP threads share the process working directory. If ``-log `` or ``-screen `` is supplied explicitly inside a ``PC_LAMMPS`` entry, HPO appends ``.t`` per OpenMP thread (so e.g. ``-log run.log`` becomes ``run.log.t0``, ``run.log.t1``, …). ``-log none`` and ``-screen none`` are left untouched. OPTIMIZER block — common attributes ................................... The driver-attribute keys below are accepted wherever an ``OptimizerSpec`` is the active receiver: at the top level of ``STRATEGY`` (when ``TYPE`` is ``PLAIN`` or ``MULTISTART``) and inside ``EXPLORE``, ``REFINE``, and ``INNER`` sub-blocks. .. list-table:: :header-rows: 1 :widths: 22 78 * - Key (type) - Description * - ``LIB`` (string) - Optimisation library: ``DLIB``, ``NLOPT``, ``TADAH``, ``CERES``. * - ``ALGO`` (string) - Algorithm (see table below). * - ``MAXEVAL`` (uint) - Maximum number of evaluations. * - ``MAXTIME`` (uint) - Maximum wall-time in seconds (also accepted at root level). * - ``STOPVAL`` (float) - Stop when the loss falls below this value. * - ``FTOL_REL`` (float) - Relative tolerance on the loss. * - ``FTOL_ABS`` (float) - Absolute tolerance on the loss. * - ``XTOL_REL`` (float) - Relative tolerance on hyperparameters. * - ``XTOL_ABS`` (float) - Absolute tolerance on hyperparameters. * - ``POPULATION`` (uint) - Population size for population-based algorithms. * - ``STEP`` (float) - Initial step size. * - ``SEED`` (uint) - Random seed (default: current time). Inside an ``INIT`` block the same key seeds the stochastic init strategies instead. * - ``VECTOR_ARR`` (uint) - Storage size required by some NLOpt algorithms. * - ``THREADS`` (uint) - Outer thread count for ``DLIB:GFS`` (default: 1). * - ``PARAM`` (string float) - Generic key/value passed to the algorithm (e.g. ``PARAM fd_step 1e-3``). .. list-table:: Algorithms by library :header-rows: 1 :widths: 12 88 * - LIB - ALGO (algorithms from the selected library) * - ``TADAH`` - ``RANDOM`` (uniform random search), ``GRID``, ``ANNEAL`` (classic Kirkpatrick/Metropolis Simulated Annealing — see :ref:`tadah_anneal_block` below). * - ``NLOPT`` - All gradient-free NLOpt globals (``GN_*``) and locals (``LN_*``) — see the `NLOpt algorithm catalogue `_ for the full list. Gradient-using algorithms (``GD_*`` / ``LD_*``) route through the shared finite-difference gradient seam used by ``DLIB:BFGS`` and the Ceres drivers; ``LD_LBFGS`` is the only member exercised by the bundled examples. .. note:: **Analytical loss gradients are work in progress** and are not enabled in this beta release. Every gradient-using algorithm (``LD_*``, ``GD_*``, ``DLIB:BFGS``, all CERES drivers) currently obtains its gradient through numerical finite differences (see ``PARAM fd_step`` below). * - ``DLIB`` - ``BFGS``, ``LBFGS``, ``CG``, ``BOBYQA`` (local), ``GFS`` (the dlib MaxLIPO+TR global *Global Function Search*). * - ``CERES`` - ``LBFGS`` (limited-memory BFGS, Wolfe line search), ``WOLFE`` (full BFGS, Wolfe line search), ``NLCG`` (nonlinear conjugate gradient), ``STEEPEST_DESCENT`` (Armijo line search). All four use Ceres' ``GradientProblemSolver``; the gradient is computed numerically via Ceres' ``DynamicNumericDiffCostFunction`` (central differences; step size from ``PARAM fd_step``). The Ceres NLLS family (``LEVENBERG_MARQUARDT``, ``DOGLEG``, ``SUBSPACE_DOGLEG``) is **deferred** — mathematically degenerate over a scalar HPO loss; it will be enabled in a later phase together with residual-mode loss exposure. Ceres optimisers do **not** enforce box constraints; use ``DLIB:BFGS`` or ``NLOPT:LD_LBFGS`` for bounds-enforced gradient optimisation. CERES PARAM keys ~~~~~~~~~~~~~~~~ The four CERES algorithms accept these algorithm-specific ``PARAM`` entries (all keys are optional unless noted): ================== ================================================================================= Key Effect ================== ================================================================================= ``fd_step`` Relative finite-difference step for the numerical gradient (default ``1e-6``). ``diff_method`` Numeric-diff method: ``0`` = CENTRAL (default), ``1`` = FORWARD, ``2`` = RIDDERS. RIDDERS extrapolates assuming a smooth Taylor expansion; not recommended near ``FAIL_SCORE`` plateaus or for noisy LAMMPS-loss components. ``tol_gradinf`` Gradient-infinity-norm convergence tolerance (Ceres ``gradient_tolerance``, default ``1e-10``). ``max_iter`` Override on Ceres' ``max_num_iterations``. By default the iteration cap is set to a large sentinel and ``MAXEVAL`` is enforced externally on host evaluations. ``max_ls_iter`` Maximum line-search step-size iterations. ``max_lbfgs_rank`` (LBFGS only) limited-memory rank, default ``20``. ``nlcg_type`` (NLCG only) variant: ``0`` = FLETCHER_REEVES (default), ``1`` = POLAK_RIBIERE, ``2`` = HESTENES_STIEFEL. ================== ================================================================================= The ``MAXEVAL`` cap is enforced at the host-evaluation level by an external counter shared with the ``GradientProvider``. Ceres' internal iteration counting is intentionally given a high sentinel so that one Ceres iteration's worth of finite-diff samples (2 × N under CENTRAL) is correctly accounted for against the user budget. The ``DLIB:BFGS`` path also routes its gradient through the same seam. ``PARAM fd_step`` keeps its existing semantics; the chief observable change is that gradient evaluations now count toward ``MAXEVAL`` via the same shared counter (previously dlib's internal central-diff samples bypassed the cap by ``2 × N`` per iteration). INIT block — seeding strategies ............................... Outside an ``INIT { ... } ENDINIT`` block the optimiser starts at the single point taken from the values next to your ``OPTIM`` lines (the implicit ``CONFIG`` seed). Inside an ``INIT`` block you may declare a *chain* of strategies that produces multiple starting points. Chain entries ~~~~~~~~~~~~~ Each ``STRATEGY `` line appends one entry to the chain. The chain is resolved into a flat list of starting points before the optimiser runs. ================ ============== ============= =========================================================== Method Args Default K Effect ================ ============== ============= =========================================================== ``CONFIG`` (none) 1 Take the value next to each ``OPTIM`` key. ``RANDOM K`` ``K`` (uint) ``K`` ``K`` uniform samples within (transformed) bounds. ``LHS K`` ``K`` (uint) ``K`` Latin-hypercube sample (one per stratum per dim). ``SOBOL K`` ``K`` (uint) ``K`` Sobol low-discrepancy sequence. ``WARM `` path 1 Read a previous ``pot.tadah`` and pull values for every ``OPTIM`` key. ================ ============== ============= =========================================================== Top-level keys inside the ``INIT`` block ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :header-rows: 1 :widths: 22 16 22 40 * - Key - Type - Default - Effect * - ``STRATEGY`` - repeatable - single ``CONFIG`` point when no ``STRATEGY`` line is given - Append a strategy entry to the chain (see table above). * - ``K_TOTAL`` - uint - sum of strategies' default K - Cap on total starting points produced. * - ``SEED`` - uint - random - Seed for stochastic strategies (RANDOM / LHS / SOBOL). * - ``WARM_ON_MISMATCH`` - ``warn`` / ``error`` / ``skip`` - ``warn`` - What to do when ``WARM`` cannot fill every ``OPTIM`` key. There is no ``INCLUDE_CONFIG`` keyword. The in-config point is part of the chain only when ``STRATEGY CONFIG`` is listed explicitly; an ``INIT`` block with no ``STRATEGY`` line (or no ``INIT`` block at all) resolves to a single ``CONFIG`` point. The removed ``INCLUDE_CONFIG`` and ``INIT_INCLUDE_CONFIG`` keywords are rejected with a message pointing at ``STRATEGY CONFIG``. Two implementation details affect reproducibility: * The chain is resolved in the exact order it is written — no entry is reordered. To make the warm point iter 0, list ``STRATEGY WARM`` first in the ``INIT`` block. * When the writer that produced the ``WARM`` file dumped ``# HOT_A0