Configuration File

Note

This page is auto-generated from the blueprint TOML on every docs build. If you spot a mismatch between this reference and runtime behaviour, please report it on the Tadah!MLIP GitLab.

Note

This configuration file is NOT used by the HPO (Hyperparameter Optimization) module of Tadah!MLIP. The HPO module uses a different configuration file format, which is documented in the Nested Fitting.

This section describes the format of the configuration file used by Tadah!MLIP.

For example, the configuration file can control the training process, specifying one or more datasets for use during the training stage. It defines cutoff functions and corresponding radii along with the regression model and descriptor choices.

Important

Indexing of items that take positional indices (e.g., INDEX 1, CLI flag --index 1) starts from 1, not 0.

Key/Value Pairs

The primary structure in a configuration file is the KEY/VALUE pair. Each KEY/VALUE pair must be on a separate line, with the KEY appearing first. The KEY is always a string, followed by its VALUE. The format and type of a VALUE depend on the specific KEY.

Common Usage

Typically, only a subset of KEYS is needed to train a model. Tadah!MLIP will use default values for some keys. An error will occur if a required KEY with no default value is missing:

[user@host:~] $ tadah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
  what():  Key not found: DBFILE
Aborted (core dumped)

This message indicates that the dbfile KEY was not specified in the config.train file. To resolve this, add the dbfile key and its corresponding value to config.train.

Key Specifics

The meaning of some keys can vary with the chosen command. Check the documentation for that specific command, model or descriptor to see which keys are required and how they are interpreted. While we strive to keep key meanings consistent across Tadah!MLIP, occasional differences may still occur.

Comments

Use the # symbol to add comments in the configuration file. Both inline and full-line comments are supported.

Key Values and Formats

Some KEYS can have multiple values, specified in one of two ways:

Single line:
```
KEY VALUE1 VALUE2 VALUE3
```
Multiple lines:
```
KEY VALUE1
KEY VALUE2
KEY VALUE3
```

Value Limits

Each keyword takes a fixed number of values. Passing the wrong count can raise an error, but enforcement is not yet fully consistent. While this is being improved, keep these points in mind:

Too many values – Tadah!MLIP may reject the input with a clear message or quietly discard the extras; the latter can later surface as obscure run-time failures (even an occasional segmentation fault).
Too few values – usually triggers an error, although a crash remains possible in rare corner cases.

In short, give every keyword exactly the number of values it expects—no more, no less—to avoid unpleasant surprises.

Supported KEYS

This section contains all KEYS currently used by Tadah!MLIP.

ALPHA

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Weight precision hyper-parameter. This is the starting guess for the evidence approximation algorithm.

Example 1:

ALPHA 0.23

AMPGRIDMB

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

DM_REAM density-expansion amplitudes a_n (rho(r) = sum_n a_n psi_n(r)). Optimizable nonlinear HPO parameters (OPTIM AMPGRIDMB). Amplitudes a_n of the density expansion rho(r) = sum_n a_n psi_n(r) in the DM_REAM descriptor. They are excluded from the LINEAR weight solve by the EAM bilinearity (the linear weights live on the embedding side), but they are first-class NONLINEAR HPO parameters: OPTIM AMPGRIDMB (i) lo hi searches them like any other shape parameter. Same length as CGRIDMB. Written by ‘tadah refit’.

Example 1:

AMPGRIDMB 0.21 0.05 -0.01 0.0

ATOM

[{STRING}] <element> …>

Max number of values: 118

Description:

Chemical elements. Example 1:

"Kr"

AUDIT

[STRING] <string>

Max number of values: 1 Default: off

Description:

Pre-flight audit mode. ‘off’ (default) skips the audit. ‘warn’ emits diagnostics but proceeds. ‘error’ makes any FAIL-level finding fatal. The audit’s data scan is sampled by default (see AUDIT_SAMPLE). Example 1:

off

Example 2:

warn

Example 3:

error

AUDIT_SAMPLE

[INT] <integer>

Max number of values: 1 Default: 256

Description:

Number of training structures sampled (deterministic random) for the pre-flight audit’s dataset stats. 0 means use the entire StructureDB. Has no effect when AUDIT is ‘off’. Example 1:

Example 2:

Example 3:

BASIS

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Basis vectors for non-linear Kernel Ridge Regression. They represent the features or functions used to map input data into a higher-dimensional feature space.

Example 1:

2.0 -4.65 0.4

Example 2:

-1.0

BETA

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Noise precision hyper-parameter. This is the starting guess for the evidence approximation algorithm.

Example 1:

BETA 0.0001

CEMBFUNC

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Position parameters for an embedding function. Used by certain many-body descriptors (e.g., F_RLR). When using DM_mJoin, supply one or more lists of parameters matching those in SEMBFUNC.

Example 1:

CEMBFUNC 0.14 0.45 1.00 1.1

CGRID2B

[{DOUBLE}] <double> [<double> …], [{STRING INT DOUBLE DOUBLE}] (<algorithm> <n> <start> <stop>) […]

Max number of values: 2147483647

Description:

Controls the center positions for radial basis functions (two-body). The parameter list may be provided manually or generated automatically. When using the meta descriptor D2_mJoin, specify one or more lists of centers corresponding to each descriptor. The number of centers should typically match the number of width parameters (SGRID2B) and remain below the cutoff distance. Alternatively, use the algorithm keyword followed by parameters to generate centers automatically (e.g., LOG or LIN).

Example 1:

CGRID2B LIN 10 0 6

Example 2:

CGRID2B 1.0 2.0

Example 3:

CGRID2B   1.0 2.0
CGRID2B   1.5 2.5

CGRIDMB

[{DOUBLE}] <center> …, [{STRING INT DOUBLE DOUBLE}] <algorithm> <N> <START> <STOP>

Max number of values: 2147483647

Description:

Specifies the center positions for many-body radial basis functions. Centers may be provided manually or generated automatically. When using the DM_mJoin meta descriptor, supply one or more lists of centers for each concatenated descriptor. Alternatively, include an algorithm such as ‘L’ (logarithmic) or ‘U’ (uniform spacing) followed by parameters.

Example 1:

CGRIDMB LIN 4 0 6.2

Example 2:

CGRIDMB 0.5 0.7

Example 3:

CGRIDMB   0.5 0.7
CGRIDMB   0.6 0.8

DIMER

[BOOL DOUBLE BOOL] <boolean> <double> <boolean>

Max number of values: 3 Default: false / 0 / false

Description:

Control for DIMER models. Users should not modify this key. Example 1:

DIMER true 1.104 true

EWEIGHT

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Global energy scaling factor. Energies are always scaled by 1/(number of atoms). Additional configuration-level scaling factors can apply. Combined factor = EWEIGHT*(config eweight)/(#atoms).

Example 1:

EWEIGHT 0.96

FIXINDEX

[{INDEX_PATTERN}] <index>[,<index>…], [**] <start>-<stop>, [**] <start>-<stop>:<step>

Max number of values: 2147483647

Description:

Indices of weights to be fixed in optimization. Must be used with FIXWEIGHT; a lone FIXINDEX (or lone FIXWEIGHT) is a hard error. dim(FIXINDEX) = dim(FIXWEIGHT); the i-th index names the model weight pinned by the i-th FIXWEIGHT value (declaration order defines the pairing). Allows flexible selection of indices. Supports single indices, ranges (e.g., start-stop), lists, or intervals (start-stop:step). Indices are 1-based. Repeated indices are removed automatically. Without a training DBFILE, FIXINDEX must cover every model weight (there is no data to fit the rest).

Example 1:

1,3,5

Example 2:

1-4,7,9

Example 3:

1-10:2

FIXWEIGHT

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Values for weights to be fixed in optimization. Must be used with FIXINDEX; a lone FIXWEIGHT (or lone FIXINDEX) is a hard error, and there is no auto-seeding from WEIGHTS. dim(FIXINDEX) = dim(FIXWEIGHT). The i-th value in FIXWEIGHT pins the model weight named by the i-th index in FIXINDEX (declaration order defines the pairing). Pinned weights are excluded from the least-squares solve; the remaining weights are fitted against the residual target. Incompatible with NORM=true (the values would be interpreted in the normalized feature space).

In tadah hpo, OPTIM FIXWEIGHT (<weight-index>) <low> <high> lets the nonlinear optimiser search a pinned weight: the (indices) name MODEL WEIGHTS (values of FIXINDEX), not positions in this list. When FIXINDEX covers every model weight, the LS solve is skipped entirely; in that case a training DBFILE becomes optional (the loss must then come from validation metrics, PC_OBSERVABLE, or PC_LAMMPS, and the element set must be declared via ATOM).

Example 1:

0.12 1.0 2.00

FWEIGHT

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Global force scaling factor. Each force component is scaled by 1/(#atoms)/3. Additional config-level scaling factors can apply. Combined factor = FWEIGHT*(config fweight)/(#atoms)/3.

Example 1:

FWEIGHT 1e-2

HEALTH_LOG

[STRING] <string>

Max number of values: 1 Default: summary

Description:

HPO per-iter health monitor verbosity. ‘summary’ (default) rate-limits warnings to ~1 per 100 evals per (block, kind). ‘full’ emits every offending iter. ‘off’ disables the monitor. Example 1:

summary

Example 2:

full

Example 3:

off

INIT2B

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

If set to true, the two-body descriptor will be calculated. Example 1:

INIT2B true

INITMB

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

If set to true, the many-body descriptor will be calculated. Example 1:

INITMB true

KDEG2B

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Per-atom knot degree DATA (3|5) for D2_Knot5 (optional; absent = all quintic). NOT optimizable. Optional per-basis-function degree list for the D2_Knot5 two-body descriptor: 5 = quintic knot atom (t-r)_+^5 (C^4), 3 = cubic knot atom (t-r)_+^3 (C^2 - an exact f’’’ corner, the Mendelev/Ackland atom). Same length as CGRID2B; absent = all quintic. A data_keys entry: split by the mJoin meta-descriptors but NOT an optimisation dimension - values outside {3,5} are rejected at construction, so an OPTIM block on it fails loudly. Written by ‘tadah refit’ for REFIT_BASIS Knot3/KnotMix.

Example 1:

KDEG2B 5 5 3 5

KDEGEMB

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Per-atom knot degree DATA (3|5) for the DM_REAM embedding expansion (optional; absent = all quintic). NOT optimizable. Optional per-basis-function degree list for the DM_REAM embedding expansion under basis code 2 (knot functions). Same length as CEMBFUNC; absent = all quintic. A data_keys entry: mJoin-splittable, never OPTIM. Written by ‘tadah refit’ for REFIT_BASIS Knot3/KnotMix.

Example 1:

KDEGEMB 5 5 3

KDEGMB

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Per-atom knot degree DATA (3|5) for the DM_REAM density expansion (optional; absent = all quintic). NOT optimizable. Optional per-basis-function degree list for the DM_REAM density expansion under basis code 2 (knot functions). Same length as CGRIDMB; absent = all quintic. A data_keys entry: mJoin-splittable, never OPTIM (values outside {3,5} are rejected at construction). Written by ‘tadah refit’ for REFIT_BASIS Knot3/KnotMix.

Example 1:

KDEGMB 5 3 5

LAMBDA

[INT] <int>, [DOUBLE] <double>, [INT DOUBLE] <int> <double>

Max number of values: 2 Default: 0

Description:

Controls the regularization parameter λ for BLR and KRR. If N=0, no regularization. If N>0, λ is set to that value. If N<0, an evidence approximation is used to estimate λ. For LAMBDA 0, you can provide a second number (double) that sets the effective rank threshold (default 1e-8).

Example 1:

LAMBDA -1

Example 2:

LAMBDA 1e-4

Example 3:

LAMBDA 0

Example 4:

LAMBDA 0 1e-12

MBLOCK

[UINT] <unsigned integer>

Max number of values: 1 Default: 64

Description:

ScalaPACK row block size MB. Example 1:

MODEL

[STRING STRING] MODEL FUNCTION, [STRING STRING STRING] MODEL FUNCTION OPTION, [STRING STRING UINT] MODEL FUNCTION OPTION

Max number of values: 3

Description:

Defines the training model and function. MODEL can be any class inheriting from M_Base (e.g., M_KRR, M_BLR). FUNCTION must be a valid child class of Function_Base (e.g., Kern_Linear, BF_Linear, BF_Polynomial2). Various combinations (KRR with different kernels, BLR with various basis functions) are possible.

Example 1:

MODEL M_BLR BF_Linear

Example 2:

MODEL M_BLR BF_Polynomial2

Example 3:

MODEL M_KRR Kern_Linear

MPARAMS

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Specifies additional numeric parameters for certain models. Some models require extra parameters. Refer to the model-specific documentation for details. Many models do not need any extra parameters.

Example 1:

MPARAMS 0.1

Example 2:

MPARAMS 0.1 0.2 0.3

MPIWPCKG

[UINT] <unsigned integer>

Max number of values: 1 Default: 50

Description:

The number of structures in a single MPI work package. Example 1:

NBLOCK

[UINT] <unsigned integer>

Max number of values: 1 Default: 64

Description:

ScalaPACK column block size NB. Example 1:

NMEAN

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Mean values from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM).

Example 1:

2.0 -4.65 0.4

Example 2:

-1.0

NORM

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Standardize descriptors. Set to true to standardize descriptors, typically relevant if energies are used for fitting.

Example 1:

true

Example 2:

false

NSTDEV

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Standard deviations from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM). The vector size equals the number of columns.

Example 1:

2.0 -4.65 0.4

Example 2:

-1.0

OALGO

[INT] <option>, [INT DOUBLE] <option> <value>

Max number of values: 2 Default: 1

Description:

This key controls the optimization algorithm used to train the model. The default algorithm is Option 1 (GELSD) with a conditioning parameter (\(rcond\)) of \(1 \times 10^{-8}\). The regularization parameter is handled separately by the LAMBDA key.

Available options:

1 - GELSD: Utilizes the LAPACK routine DGELSD to solve linear least squares problems using the Singular Value Decomposition (SVD). This method is robust and can handle rank-deficient systems. It computes the minimum-norm solution and is suitable for ill-conditioned problems. The parameter M controls the reciprocal of the conditioning number (\(\text{rcond}\)). It defaults to \(1 \times 10^{-8}\) if not specified. Setting M = -1 uses machine precision, which may be unstable for ill-conditioned problems.

2 - GELS: Uses the LAPACK routine DGELS to solve linear least squares problems via QR or LQ factorization. This method assumes that the matrix has full rank and is the fastest among the options. It is suitable for well-conditioned problems but less robust for ill-conditioned or rank-deficient matrices.

3 - Custom Implementation Similar to DGELS (Uses SVD): Employs a custom algorithm similar to DGELS but utilizes SVD. This method allows the use of the evidence approximation algorithm and computes the covariance matrix \(\Sigma\), providing additional statistical information about the solution. Like option 1, the parameter M controls the reciprocal of the conditioning number (\(\text{rcond}\)), defaulting to \(1 \times 10^{-8}\). Setting M = -1 uses machine precision, which might be unstable for ill-conditioned problems.

4 - Cholesky Decomposition: Solves the normal equations \(A^\top A x = A^\top b\) using Cholesky decomposition. This method is efficient for well-conditioned, full-rank matrices but may be less stable for ill-conditioned or rank-deficient problems because it squares the condition number of the matrix. It does not require additional parameters.

Performance Comparison:

Option 2 is the fastest but assumes a full-rank matrix and is less robust for ill-conditioned problems.
Option 4 is also efficient but may suffer from numerical instability in ill-conditioned or rank-deficient problems due to squaring of the condition number.
Option 1 offers a balance between speed and robustness, handling rank-deficient and ill-conditioned problems better than options 2 and 4.
Option 3 is the slowest due to the additional computations but provides valuable extra information like the covariance matrix.

Usage:

To select an algorithm, set the OALGO key followed by the option number.
For options 1 and 3, you can specify the conditioning parameter M after the option number.
Option 4 does not require any additional parameters.

Regularization Parameter:

The regularization parameter is controlled by the LAMBDA key, which should be set separately to apply regularization to the model.

Examples:

OALGO 1 # Uses GELSD with default conditioning parameter (M = 1e-8)
OALGO 1 -1 # Uses GELSD with machine precision (M = -1)
OALGO 2 # Uses GELS
OALGO 3 1e-10 # Uses custom implementation with M = 1e-10
OALGO 4 # Uses Cholesky Decomposition

Notes:

Conditioning Parameter M: Controls the reciprocal of the condition number (\(\text{rcond}\)). A smaller M (closer to machine precision) includes more singular values in the solution, which may be necessary for certain problems but can introduce instability if the matrix is ill-conditioned.
Machine Precision (M = -1): Using machine precision can lead to numerical instability in ill-conditioned problems. It is recommended to use a positive M value to exclude negligible singular values that could adversely affect the solution.
Covariance Matrix \(\Sigma\): Option 3 provides the covariance matrix, which can be useful for statistical analyses and understanding the uncertainty in the estimated parameters.
Cholesky Decomposition: Option 4 solves the normal equations using Cholesky decomposition. It is efficient but may be numerically unstable for ill-conditioned or rank-deficient problems due to squaring the condition number. It is best used when the matrix is well-conditioned and of full rank.

Additional Information:

Regularization with LAMBDA:
- The LAMBDA key controls the regularization parameter used in the training process. It should be specified separately to apply regularization to the model.
- Regularization helps prevent overfitting by adding a penalty for larger parameter values.
LAPACK Routines:
- DGELSD: Computes the minimum-norm solution to a real linear least squares problem using the SVD of the coefficient matrix.
- DGELS: Solves overdetermined or underdetermined real linear systems involving an \(M \times N\) matrix, using a QR or LQ factorization of the matrix.
- DPOTRF and DPOTRS: Used in the Cholesky decomposition to factorize a symmetric positive-definite matrix and solve the resulting linear system.
When to Use Each Option:
- Use Option 1 (GELSD) when you need robustness against rank deficiency and moderate performance.
- Use Option 2 (GELS) for the fastest performance on well-conditioned, full-rank matrices where robustness is less of a concern.
- Use Option 3 (Custom SVD Implementation) when you require additional outputs like the covariance matrix and are willing to trade off performance for more comprehensive results.
- Use Option 4 (Cholesky Decomposition) when you have a well-conditioned, full-rank matrix and need an efficient solution, but be cautious of potential numerical instability in ill-conditioned or rank-deficient problems.

Example 1:

OALGO 1

Example 2:

OALGO 1 -1

Example 3:

OALGO 2

Example 4:

OALGO 3 1e-10

Example 5:

OALGO 4

RCTYPE2B

[{STRING}] Cut_<name>

Max number of values: 2147483647

Description:

Specifies the cutoff function type(s) for two-body descriptor(s). Provide a single cutoff type (e.g., Cut_Cos) for one descriptor or multiple types corresponding to each descriptor when using the D2_mJoin meta descriptor.

Example 1:

RCTYPE2B Cut_Cos

Example 2:

RCTYPE2B Cut_Cos Cut_Tanh

RCTYPEMB

[{STRING}] Cut_<name>

Max number of values: 2147483647

Description:

Specifies the cutoff function type(s) for many-body descriptor(s). Provide a single cutoff type (e.g., Cut_Cos) for one descriptor or a series of types—each for a corresponding descriptor when using the DM_mJoin meta descriptor.

Example 1:

RCTYPEMB Cut_Cos

Example 2:

RCTYPEMB Cut_Cos Cut_Tanh

RCUT2B

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Specifies the cutoff distance(s) for two-body descriptor(s). Provide a single value for one descriptor or multiple values—one for each descriptor when using the D2_mJoin meta descriptor.

Example 1:

RCUT2B 3.0

Example 2:

RCUT2B 3.0 7.5

RCUTENV

[DOUBLE] <double>

Max number of values: 1

Description:

Envelope cutoff distance used by F_mEnv (DM_mBlipEnv). Width of the inverse-cutoff envelope (Cut_SinInv) wrapped around the F_Blip embedding by the DM_mBlipEnv descriptor: the envelope ramps from 0 at rho=0 to 1 at rho=RCUTENV, suppressing the embedding contribution where the underlying Gaussian basis is largest.

Example 1:

RCUTENV 1.5

RCUTMB

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Specifies the cutoff distance(s) for many-body descriptor(s). Provide a single value for a standalone descriptor or multiple values corresponding to each descriptor when using the DM_mJoin meta descriptor.

Example 1:

RCUTMB 4.9

Example 2:

RCUTMB 4.9 8.0

SBASIS

[UINT] <unsigned integer>

Max number of values: 1

Description:

Number of basis functions for the DesignMatrix. Many models do not require this. If specified, it sets the number of basis functions used in the design matrix.

Example 1:

Example 2:

SEMBFUNC

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Shape parameters for an embedding function. Used by certain many-body descriptors (e.g., F_RLR). When using the DM_mJoin descriptor, provide lists of parameters corresponding to each descriptor, ensuring consistency with CEMBFUNC.

Example 1:

SEMBFUNC 0.14 0.45 1.00 1.1

SETFL

[STRING] <string>

Max number of values: 1

Description:

Path to the setfl file with the EAM potential. Example 1:

Ta1_Ravelo_2013.eam.alloy

SGRID2B

[{DOUBLE}] <double> [<double> …], [{STRING INT DOUBLE DOUBLE}] (<algorithm> <n> <start> <stop>) […]

Max number of values: 2147483647

Description:

Specifies the width parameters for two-body radial basis functions. These widths can be supplied manually or auto-generated. When using the meta descriptor D2_mJoin, provide one or more lists of width values for each concatenated descriptor. The number of widths must match the number of centers (CGRID2B). Alternatively, specify the algorithm keyword with parameters to generate widths automatically (e.g., LOG or LIN).

Example 1:

SGRID2B LIN 3 0 1.0

Example 2:

SGRID2B GEOM 6 0.1 10

Example 3:

SGRID2B   0.01 0.02 0.03
SGRID2B   GEOM 6 0.1 10

SGRIDMB

[{DOUBLE}] <width> …, [{STRING INT DOUBLE DOUBLE}] <algorithm> <N> <START> <STOP>

Max number of values: 2147483647

Description:

Specifies the width parameters for many-body radial basis functions. Values may be provided manually or generated automatically. When using the DM_mJoin meta descriptor, provide one or more lists of widths for each descriptor. Ensure consistency with the centers defined in CGRIDMB. Alternatively, use an algorithm (e.g., LOG or LIN) and its parameters to generate widths automatically.

Example 1:

SGRIDMB LIN 3 0 1.0

Example 2:

SGRIDMB 0.01 0.02 0.03

Example 3:

SGRIDMB   0.01 0.02 0.03
SGRIDMB   0.02 0.03 0.04

SIGMA

[INT {DOUBLE}] <integer> <double> …

Max number of values: 2147483647

Description:

The Σ matrix used in Bayesian Linear Regression. An N×N matrix in column-major order. Applies to M_BLR.

Example 1:

`2 1.2 2.2 2.3 3.3`

SWEIGHT

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Global stress scaling factor. Each stress component is scaled by 1/6. Additional config-level scaling can apply. Combined factor = SWEIGHT*(config sweight)/6.

Example 1:

SWEIGHT 1e-1

TYPE2B

[STRING {[INT | DOUBLE]} {STRING STRING}] D2_<name> {[param]} {ELEMENT ELEMENT}, [STRING {STRING STRING}], [STRING]

Max number of values: 2147483647

Description:

Specifies the two-body descriptor type(s) to be used. For a single descriptor, provide its type (e.g., D2_LJ). To concatenate multiple descriptors, use the meta descriptor D2_mJoin followed by the individual descriptor parameters. Elements should be provided in pairs.

Example 1:

TYPE2B D2_LJ Kr Kr

Example 2:

TYPE2B    D2_mJoin
  TYPE2B    D2_MIE 11 6 Ti Ti
  TYPE2B    D2_Blip 6 6 Ti Nb Nb Nb

TYPEMB

[STRING UINT {STRING STRING}] DM_<name> {[param]} {ELEMENT ELEMENT}, [STRING UINT UINT {STRING STRING}], [STRING UINT UINT UINT {STRING STRING}], [STRING UINT UINT UINT UINT UINT {STRING STRING}], [STRING UINT UINT UINT UINT UINT UINT UINT UINT {STRING STRING}], [STRING]

Max number of values: 2147483647

Description:

Specifies the many-body descriptor type(s) to be used. For a single descriptor, provide its type (e.g., DM_EAD). To combine multiple descriptors, use the meta descriptor DM_mJoin followed by the individual descriptor parameters. Elements should be provided in pairs.

Example 1:

TYPEMB DM_Blip 0 6 6 Ti Ti

Example 2:

TYPEMB    DM_mJoin
  TYPEMB    DM_Blip 1 6 6 Ti TI
  TYPEMB    DM_Blip 0 6 6 Ti Nb Nb Nb

WATOM

[{STRING STRING DOUBLE} *] {<element> <element> <weight>}>, [*{DOUBLE}] {<weight>}

Max number of values: 118

Description:

Atom-pair weights. Two formats accepted:

Per-species (parallel to ATOM): WATOM w1 w2 …: Pair weights are computed as geometric mean: w(Zi,Zj) = sqrt(wi*wj).
Explicit pair triples: WATOM sym1 sym2 val sym1 sym2 val …: Directly sets symmetric pair weights: w(sym1,sym2) = w(sym2,sym1) = val.

Example 1:

WATOM 1.0 1.5

Example 2:

WATOM Kr Kr 0.5

Example 3:

WATOM Kr Kr 0.5 Ar Ar 0.8 Kr Ar 0.6

WEIGHTS

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Machine-learned coefficients for the model. These are species-dependent weights, obtained during optimization. Defaults to atomic numbers if unspecified.

OPTIM WEIGHTS is rejected by tadah hpo: the least-squares solve re-fits WEIGHTS on every evaluation, so optimising them directly would be a silent no-op. To search linear coefficients, pin them with FIXINDEX/FIXWEIGHT and use OPTIM FIXWEIGHT.

Example 1:

WEIGHTS 0.12 1.2 0.3

add

[DOUBLE] <centre>

Max number of values: 1

Description:

tadah refit –edit: insert a basis function at this centre (r in Angstrom for phi/rho, electron density for F) with coefficient 0 — the potential is unchanged until HPO varies the new coefficient. Insert one basis function into the curve selected by –func, centred at the given position. The new coefficient (weight for phi/F, density amplitude for rho) is 0, so the edited potential predicts EXACTLY like the input — the new basis function only adds capacity for a subsequent tadah hpo run (its OPTIM line is emitted in the regenerated hpotarget).

Knot families (Knot5/Knot3): the knot position is the only shape parameter (–eta is rejected; widths are a pure gauge). –deg 3|5 selects the knot degree (default 5). Adding a radial knot beyond the current largest knot EXTENDS the effective cutoff — RCUT2B/RCUTMB are updated accordingly.

Blip families (Blip/Blip5): the inverse width defaults to the mean of the two nearest neighbours’ widths (–eta overrides). A blip whose support lies entirely beyond RCUT is rejected; one whose support crosses RCUT is truncated there (warning) — RCUT is never changed for blip potentials.

Example 1:

2.75

analytics

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Perform analytics. Example 1:

true

Example 2:

false

append

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Append to the existing file. Example 1:

true

Example 2:

false

atompair

[{STRING}] <element1> <element2>

Max number of values: 2

Description:

Pair of chemical elements. Example 1:

"Kr Kr"

bias

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Per-species intercept. When true, adds N=size(ATOM) one-hot intercept columns to the design matrix; the regression learns one constant energy per species and LAMMPS adds it back per atom. Required when NORM=true with MODEL=BF_Linear. Example 1:

true

Example 2:

false

bondenergy

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Calculate bond energy instead of per atom value. Example 1:

true

Example 2:

false

chunk

[{UINT}] <unsigned integer> [<unsigned integer> …]

Max number of values: 2147483647

Description:

Specify chunk sizes. Example 1:

20 5 3

Example 2:

config

[STRING] <file>

Max number of values: 1

Description:

Path to a configuration file. Example 1:

config.tadah

Example 2:

../config.tadah

Example 3:

/path/to/config.tadah

dbfile

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Path(s) to Tadah! database file(s). Absolute or relative path to the Tadah! database file(s). The relative path is interpreted relative to the current working directory. Multiple dataset paths can be provided either as space-separated tokens or by repeating this key.

Example 1:

dbfile /path/to/dbfile

Example 2:

dbfile /path/to/dbfile1 /path/to/dbfile2

deg

[UINT] {3|5}

Max number of values: 1

Description:

tadah refit –edit –add, knot families only: degree of the inserted knot — 5 (quintic, C4, default) or 3 (cubic, C2; materialises the per-knot KDEG* degree array). Rejected for blip families. Example 1:

Example 2:

derivative

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Calculate derivative of the function. Example 1:

true

Example 2:

false

dft-file

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Input DFT file(s). A single file or multiple files (space-separated). Used to extract reference data for training. Supported formats: VASP (OUTCAR, vasprun.xml), CASTEP (.castep, .md, .geom).

Example 1:

run1.outcar

Example 2:

run1.outcar run2.outcar

eam

[STRING] <file>

Max number of values: 1

Description:

Input tabulated EAM potential (setfl: eam/alloy or eam/fs) for tadah refit. Example 1:

Ta2_Ravelo_2013.eam.alloy

Example 2:

Fe_2.eam.fs

eamfast

[BOOL] <boolean>

Max number of values: 1 Default: true

Description:

Table-collapse fast path in the LAMMPS pair style, pair_style tadah. For a single-species linear model the potential collapses exactly to scalar radial functions: a radial two-body block alone (INITMB false, any radial D2 descriptor) collapses to a pair function phi(r); a two-body block plus an EAM-form many-body block (the tadah refit family, e.g. D2_Knot5 + DM_REAM) additionally collapses to density and embedding functions. The pair style then evaluates cubic-Hermite tables sampled from the descriptors themselves (plus the analytic embedding and scalar ghost communication in the EAM case; the pure two-body case needs no ghost communication at all) — typically 5-10x faster force calls. Agrees with the generic kernel to about 1e-9 relative accuracy. Set false for bitwise comparison against the generic kernel or for drivers that capture adjoint state on EAM-form models (pure two-body adjoint capture works under the fast path via the generic kernel). Silently ignored, generic kernel used, for models that do not collapse: nonlinear or normalised models, per-species bias, multi-species, dimers, or many-body descriptors that are not EAM-form. Example 1:

true

Example 2:

false

edit

[STRING] <file>

Max number of values: 1

Description:

Edit mode for tadah refit: path to an existing refit potential (pot.tadah) to add/remove one basis function (use with –func and –add or –remove). Switches tadah refit into potential-editing mode: instead of refitting a tabulated EAM file, load an existing knot/blip potential and add or remove ONE basis function per invocation.

Supported potentials: the pair descriptor must be a refit family (D2_Knot5 / D2_Blip / D2_Blip5), with BIAS false and NORM false. The many-body term is optional: - none (a pure two-body potential): –func phi edits the pair curve; - DM_REAM (a tadah refit EAM): all three curves are editable; - any other many-body descriptor: only –func phi (the pair channel) —

the many-body keys and its WEIGHTS block are preserved untouched.

Use with: - –func {phi|rho|F}: which curve to edit (pair / density / embedding;

rho and F require a DM_REAM many-body term).

–add <centre>: insert a basis function at this position with coefficient 0 — the potential’s predictions are BIT-IDENTICAL to the input (value, all derivatives, extrapolation); the new coefficient becomes an extra search dimension in the regenerated hpotarget file.
–remove <centre>: delete the basis function whose centre is nearest to <centre>; the two neighbouring coefficients are least-squares adjusted to track the original curve’s value, first and second derivative over the affected region. Removal is lossy by design — expect small changes.

Outputs: an edited potential (-o; defaults to pot_edited.tadah so the input is never silently overwritten) and a regenerated hpotarget (–hpofileout; defaults to refit_edited.hpotarget). For knot-family channels RCUT2B/RCUTMB are recomputed from the basis (effective cutoff = largest knot), so adding a knot beyond the current largest one widens the effective cutoff.

Example 1:

pot.tadah

efilter

[DOUBLE DOUBLE] <E_min_per_atom> <E_max_per_atom>

Max number of values: 2 / 2

Description:

Drop configurations whose per-atom energy is outside [E_min, E_max] (eV). Outlier filter applied at load time before any energy-shift derivation or training-weight assignment, so outliers do not poison ESHIFT_ATOM / ESHIFT_DBATOM / EWEIGHT_TEMP. The threshold is compared against E/N_atoms (per-atom energy). Both bounds must be supplied. To disable, omit the key.

Example 1:

-12.0 -2.0

error

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Generate error estimates. Example 1:

true

Example 2:

false

eshift

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Per-atom reference energy to subtract from each configuration. Per-element reference energies. If there are multiple species, the number of values must match the number of species (sorted by Z). At load time the total energy of each configuration is reduced by sum_Z N_Z * ESHIFT[Z], so an isolated-atom config with energy E_atom and ESHIFT[Z]=E_atom yields a post-shift energy of zero. Used by tadah train, tadah predict, tadah hpo, and tadah data balance. Persisted into pot.tadah for prediction round-trip.

Example 1:

0.5

Example 2:

0.5 -0.1

eshift_atom

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Derive ESHIFT from isolated-atom configurations in the dataset (mean per Z). Scans the loaded dataset for single-atom configurations (natoms == 1), groups them by atomic number, and sets ESHIFT[Z] to the mean per-Z energy. If a species has no isolated-atom config in the dataset, ESHIFT[Z] = 0 for that species and a WARNING is logged. If multiple isolated-atom configs of the same Z disagree by more than 1e-3 eV, an INFO line records the spread. Mutually exclusive with explicit ESHIFT and ESHIFT_DBATOM.

Example 1:

true

eshift_dbatom

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Derive ESHIFT by least-squares atomic-energy fit over the database. Fits per-element reference energies by least squares: minimise ||y - M beta||^2 where y[i] is the total energy of configuration i and M[i, k] is the count of species k in configuration i. The fitted beta_k becomes ESHIFT[Z(k)]. More robust than ESHIFT_ATOM when the dataset has no isolated-atom configs but does have compositional diversity. Mutually exclusive with explicit ESHIFT and ESHIFT_ATOM.

Example 1:

true

eta

[DOUBLE] <inverse width>

Max number of values: 1

Description:

tadah refit –edit –add, blip families only: inverse width of the inserted basis function (default: mean of the neighbouring widths). Rejected for knot families (width is a pure gauge). Example 1:

1.8

even

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Equal-size partition. Example 1:

true

Example 2:

false

eweight_temp

[DOUBLE] <double>

Max number of values: 1

Description:

Boltzmann reweighting temperature in Kelvin (multiplies eweight). After ESHIFT is applied, multiplies each configuration’s eweight by exp(-(E/N - E_min)/(kB * T)) where E_min is the minimum per-atom energy in the dataset and kB = 8.617333262e-5 eV/K. Emphasises low-energy configurations. Composes multiplicatively with the per-structure eweight already in the dataset file. Omit the key to disable.

Example 1:

Example 2:

explore

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

tadah refit EXPLORE stage block (EXPLORE … ENDEXPLORE): structure search + basis growth (absorbs the old GLOBAL stage at rung 0). Takes MAX_ADD <n> (max basis added per function) and a nested OPTIMIZER block (default global MLSL with an INNER local SBPLX). Captured verbatim; parsed by the refit engine. Example 1:

EXPLORE / MAX_ADD 38 / OPTIMIZER ... ENDOPTIMIZER / ENDEXPLORE

ffilter

[DOUBLE] <double>

Max number of values: 1

Description:

Drop configurations where any atomic force magnitude exceeds this value (eV/Å). Outlier filter applied at load time. A configuration is dropped if any single atom has ‖F‖ > FFILTER. Useful for catching unconverged SCF or otherwise broken DFT runs.

Example 1:

20.0

force

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Include forces. Example 1:

true

Example 2:

false

format

[STRING] <fmt>

Max number of values: 1

Description:

Output format (e.g., vasp, castep, lammps). Example 1:

castep

Example 2:

lammps

Example 3:

vasp

func

[STRING] {phi|rho|F}

Max number of values: 1

Description:

Curve selector for tadah refit –edit: phi (pair, CGRID2B), rho (density, CGRIDMB/AMPGRIDMB) or F (embedding, CEMBFUNC). Example 1:

phi

Example 2:

rho

Example 3:

fuse

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

tadah refit FUSE stage block (FUSE … ENDFUSE): final joint refinement of all three functions against curve + crystal energy-volume residuals (least-squares only: Ceres LM/DOGLEG or NONE). Takes a nested OPTIMIZER block. Captured verbatim; parsed by the refit engine. Example 1:

FUSE / OPTIMIZER ... ENDOPTIMIZER / ENDFUSE

hpofileout

[STRING] <file>

Max number of values: 1

Description:

Output file for the tadah hpo starter target written by tadah refit. Example 1:

refit.hpotarget

hpotarget

[STRING] <file>

Max number of values: 1

Description:

HPO target file. Example 1:

hpotargets.txt

in_place

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

tadah refit –retarget: overwrite the input hpotarget file (atomic temp-write + rename) instead of writing a new file. Mutually exclusive with -o. Example 1:

--in-place

index

[{INDEX_PATTERN}] <index>[,<index>…], [**] <start>-<stop>, [**] <start>-<stop>:<step>

Max number of values: 2147483647

Description:

Index pattern. Allows flexible selection of dataset indices. Supports single indices, ranges (e.g., start-stop), lists, or intervals (start-stop:step). Indices are 1-based. Repeated indices are removed automatically.

Example 1:

1,3,5

Example 2:

1-4,7,9

Example 3:

1-10:2

init

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

INIT / PHI XMIN PIN / ENDINIT

lscale

[DOUBLE] <double>

Max number of values: 1 Default: 1.0

Description:

Uniform length rescale factor applied to atomic positions, cell, and reference forces at load time. Multiplies atomic positions and cell vectors by this factor at the moment a dataset is loaded for training, prediction, or HPO. Reference forces are divided by the factor (chain rule on E(r)); stresses (stored as virial in energy units) are invariant under uniform length rescaling. The chosen factor is persisted into pot.tadah so future tadah predict and tadah hpo runs apply the same transformation. Use –no-lscale at predict time to override.

LSCALE is a training-side concept: the LAMMPS pair_style does NOT re-apply LSCALE. The user is expected to provide LAMMPS positions at the scale that matches the trained model (e.g. experimental lattice).

Example 1:

1.0030

merge

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Merge deduplication results into one file. Example 1:

true

Example 2:

false

n2b

[UINT] <unsigned integer>

Max number of values: 1

Description:

Override: number of two-body basis functions in tadah refit. Example 1:

nf_basis

[UINT] <unsigned integer>

Max number of values: 1

Description:

Override: number of embedding basis functions in tadah refit. Example 1:

no_eshift

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

(predict) Ignore any ESHIFT recorded in the loaded potential file. At predict time, override the ESHIFT values stored in pot.tadah. Use when the dataset you are predicting on is already at the shifted baseline (or you just want raw model output without any reference energy subtraction).

Example 1:

true

no_lscale

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

(predict) Ignore any LSCALE recorded in the loaded potential file. At predict time, override the LSCALE value stored in pot.tadah. Use when the dataset you are predicting on is already at the trained-model scale.

Example 1:

true

nrho_basis

[UINT] <unsigned integer>

Max number of values: 1

Description:

Override: number of density basis functions in tadah refit. Example 1:

numeric

[UINT] <unsigned integer>

Max number of values: 1 Default: 12

Description:

Numeric output precision. Sets the number of decimal places for output.

Example 1:

optim_pct

[DOUBLE] <double>

Max number of values: 1

Description:

Default +-percent half-width for the OPTIM bounds emitted by tadah refit. Example 1:

5.0

optimizer

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Optimiser sub-block for a tadah refit stage (EXPLORE/REFINE/FUSE). A multi-line OPTIMIZER … ENDOPTIMIZER block nested INSIDE a stage block; takes LIB <name> / ALGO <name> plus stopping criteria (MAXEVAL, FTOL_REL/FTOL_ABS, XTOL_REL/XTOL_ABS, GTOL, …) in the same syntax as the tadah hpo –hpotarget OPTIMIZER block, and an optional nested INNER … ENDINNER for the per-rung local search. A top-level OPTIMIZER block is rejected (it must be nested in a stage). Example 1:

see HPO_REFIT_2B_EAM/REDESIGN/configs (OPTIMIZER nested in EXPLORE/REFINE/FUSE)

option

[STRING] <arg>

Max number of values: 1

Description:

Positional argument Example 1:

arg

outfile

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Output file. The output file to be written. Multiple files can be specified if the command produces more than one output.

Example 1:

output.tadah

output_grid_scale

[DOUBLE] <double>

Max number of values: 1

Description:

Scale factor for the output setfl grid density (1.0 = same as input). Example 1:

1.0

Example 2:

2.0

pct

[{[DOUBLE | STRING]}] <pct> and/or <KEY>=<pct> […]

Max number of values: 2147483647

Description:

tadah refit –retarget: +-percent half-width for the rewritten OPTIM bounds. A bare number sets the global default; KEY=N overrides one key group, e.g. –pct 10 FIXWEIGHT=2. Example 1:

Example 2:

FIXWEIGHT=2

Example 3:

10 CGRID2B=2

percent

[{UINT}] <unsigned integer> [<unsigned integer> …]

Max number of values: 2147483647

Description:

Specify percentage partition. Example 1:

20 5 3

Example 2:

potential

[STRING] <file>

Max number of values: 1

Description:

Trained model file. Example 1:

pot.tadah

quantity

[STRING] <string>

Max number of values: 1

Description:

Generic quantity. Example 1:

validString

Example 2:

/path/to/file

random

[UINT] <unsigned integer>

Max number of values: 1

Description:

Randomly sample N entries. Example 1:

range

[DOUBLE DOUBLE INT] <START> <STOP> <NPOINTS>

Max number of values: 3 / 3

Description:

Plotting range [start stop npoints]. Example 1:

0.1 9.5 100

refine

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

tadah refit REFINE stage block (REFINE … ENDREFINE): per-rung high-precision local polish (Ceres LM/DOGLEG, or NLOPT/DLIB local). Takes a nested OPTIMIZER block. Captured verbatim; parsed by the refit engine. Example 1:

REFINE / OPTIMIZER ... ENDOPTIMIZER / ENDREFINE

refit_basis

[STRING] <string>

Max number of values: 1

Description:

Flexible-basis family used by tadah refit. Family x mode matrix — DEFAULT: ‘Knot5’ in compact mode, ‘Blip’ in dense mode (and as the automatic fallback when the compact fitter is unavailable, i.e. non-Ceres builds; an explicit REFIT_BASIS always wins). ‘Blip’ (cubic B-spline bumps, C2; dense AND compact modes; the most validated family) | ‘Blip5’ (M6 quintic B-spline bumps, C4, local support; dense AND compact; measured: best Fe elastic constants of all families but material-dependent — one reference element (Al) degraded to ~4 GPa C11; always verify with the faithfulness battery) | ‘Knot3’ (one-sided CUBIC knot functions (t-r)_+^3 — the Mendelev/Ackland form, exact for knotted-cubic source tables; compact only) | ‘Knot5’ (one-sided QUINTIC knot functions, C4; the RECOMMENDED opt-in: best measured worst-case elastic constants of all families (1.6 vs Blip’s 2.3 GPa across the 5 reference EAMs) with every E-V observable in tolerance; compact only) | ‘KnotMix’ (EXPERIMENTAL auto mode: quintic knot atoms by default, cubic atoms inserted where they win the per-insertion energy contest. Measured: best-in-class E-V faithfulness, but the greedy detector deploys cubics liberally (~40-60%) and the localized C2 sawtooth degrades elastic constants vs pure Knot5 — prefer Knot5 unless E-V is your only target; compact only) | ‘Gaussian’ (EXPERIMENTAL/UNVALIDATED for refit: never gated through the LAMMPS faithfulness battery; hard-cutoff truncation leaves a value discontinuity at rcut; dense mode only — prefer any other family). Compact-only families need the Ceres-enabled HPO build. NOTE for HPO: SGRID* width entries are a pure gauge for the Knot* families (pinned 1.0) — do not add OPTIM blocks on them. Example 1:

Blip

Example 2:

Blip5

Example 3:

Knot3

Example 4:

Knot5

Example 5:

KnotMix

refit_compact_insert_topk

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode, Knot5 family: energy-aware insertion — evaluate the top-k separated curve-residual peaks and insert at the one that most improves the crystal energy metric. Default 3 for Knot5, 1 (curve peak only) for Blip. Example 1:

Example 2:

refit_compact_jitters

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode: number of randomised warm restarts per ladder rung. Each rung runs jitters+1 candidates (1 warm start + this many perturbed restarts) through the full EXPLORE -> REFINE pipeline and keeps the energy-metric-best — exploration that stops the local search locking into a bad basin. This is why the coalescence guard can fire several times within one rung (once per candidate). Default 2 (3 candidates/rung). 0 = warm start only (cheaper, less robust). Example 1:

Example 2:

Example 3:

refit_compact_merge

[STRING] <string>

Max number of values: 1 Default: auto

Description:

tadah refit compact mode, Knot families: coalescence guard. ‘true’ applies a soft anti-coalescence penalty during EXPLORE plus a post-REFINE merge of near-coincident knot pairs that re-inserts a fresh knot at the worst residual. ‘auto’ (default) sets it OFF for REFIT_OBJECTIVE curve and ON for REFIT_OBJECTIVE energy. ‘false’ keeps it off. Knot families only. Example 1:

auto

Example 2:

true

Example 3:

false

refit_compact_nmax

[{UINT}] <unsigned integer> [<unsigned integer> …]

Max number of values: 2147483647

Description:

tadah refit compact mode: per-function basis (blip/knot) ceiling for the EXPLORE ladder. Give ONE value (applies to all three functions) or THREE values ‘phi rho F’ (separate per-function ceilings, e.g. 39 5 27). Default 40. Only EXPLORE grows the basis; REFINE/FUSE only refine the existing basis. Example 1:

Example 2:

39 5 27

refit_compact_opt_every

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode, Knot5 family: run the full per-rung optimisation (explore + polish) only every K-th ladder rung; intermediate rungs grow by energy-aware insertion + linear solve, with one final full optimisation of the best fit. Default 1 (optimise every rung — matrix-measured most consistent). 999 = insertion-only growth + final optimisation: best results when no ladder stalls (measured Ta: Cij <= 0.09 GPa), but can underfit if a ladder stalls during growth. Example 1:

Example 2:

refit_compact_samples

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode: number of uniform sample points per function across its fit window. The VALUE RMSE and the 1st/2nd-derivative (d1/d2) RMSEs — and the curve objective (REFIT_OBJECTIVE curve) — are all evaluated on THIS shared grid: at each point the EAM cubic-spline value and its analytic 1st/2nd derivatives are the targets. More points = finer RMSE estimate and finer curve fit but slower (each closed-form VARPRO solve is O(samples * basis)); fewer = faster, coarser. Default 800. Clamped up to max(50, 4 x REFIT_COMPACT_NMAX). Independent of the energy E-V scan and the OUTPUT_GRID_SCALE setfl output grid. Example 1:

Example 2:

Example 3:

refit_compact_seed

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode: RNG seed for the jittered ladder restarts (deterministic reruns). Default 12345. Example 1:

refit_compact_stall

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit compact mode: stop the ladder after this many consecutive rungs without energy-metric improvement. Default 6 (Blip) / 12 (Knot5). Example 1:

Example 2:

refit_constrain_f0

[BOOL] <boolean>

Max number of values: 1 Default: true

Description:

tadah refit: pin the embedding to the table’s F(0) (isolated-atom reference) with a constraint row. Default true. Example 1:

true

Example 2:

false

refit_curve_deriv_weight

[DOUBLE] <double>

Max number of values: 1 Default: 0.0

Description:

tadah refit compact mode, REFIT_OBJECTIVE curve only: weight of EACH derivative block (1st and 2nd) RELATIVE to the value block in the curve objective (both the stacked coefficient solve and the rung-selection score). 0 (DEFAULT) = pure VALUE fit, which is crystal-faithful: LAMMPS computes the crystal energy from the curve VALUES, so a value-only fit reproduces the crystal as well as energy mode (measured Ti: cohesive +3.8 meV/atom, E-V 5-6 meV). ANY nonzero weight degrades the crystal sharply (measured: 0.01 -> -15, 0.1 -> -69, 1.0 -> +37 meV/atom cohesive) because the relative normalisation lets even a 1% derivative term trade away the large ABSOLUTE value error of a high-magnitude function like F. Raise it ONLY to capture the 1st/2nd-derivative SHAPE at the EXPENSE of crystal energy. 1.0 = value/d1/d2 equal (the original, crystal-wrecking curve mode). Ignored unless REFIT_OBJECTIVE=curve. Example 1:

0.0

Example 2:

0.1

Example 3:

1.0

refit_curve_relax

[DOUBLE] <double>

Max number of values: 1 Default: 0.05

Description:

tadah refit compact mode, REFIT_OBJECTIVE curve with REFIT_CURVE_WEIGHTING on: the floor weight (a fraction greater than 0 and at most 1) that the pressure-relaxed profile ramps DOWN to in the compressed / high-pressure regions (phi/rho below r_amb, F above rho_amb). 1.0 reproduces uniform weighting, smaller values relax the fit harder away from ambient (default 0.05). The ramp is a C1 smoothstep between the ambient band (weight 1) and the window edge (weight equal to this floor). Rejected in energy mode. Example 1:

0.05

Example 2:

0.1

Example 3:

0.02

refit_curve_weighting

[BOOL] <boolean>

Max number of values: 1 Default: true

Description:

tadah refit compact mode, REFIT_OBJECTIVE curve only: pressure-relaxed curve weighting (ON by default). Concentrates the fit where the crystal actually probes and relaxes it toward the high-pressure extremes, so a fixed knot budget spends its accuracy where it matters: phi(r) and rho(r) are weighted 1 between the ambient nearest-neighbour distance r_amb and the cutoff, ramping smoothly down to REFIT_CURVE_RELAX toward smaller r (compression); F(rho) is weighted 1 from rho=0 up to the ambient density rho_amb, ramping down to the floor for higher rho (compression). r_amb and rho_amb are derived self-contained from the lattice geometry and the EAM tables (no crystal energy-volume scan dependency). The weights apply to BOTH the stacked value+d1+d2 coefficient solve AND the rung-selection metrics, so fit and selection agree. false = uniform weighting (legacy curve mode). This key is rejected in energy mode. Multiplies with REFIT_CURVE_DERIV_WEIGHT (orthogonal: deriv_weight scales d1/d2 vs value; this scales per-region). Example 1:

true

Example 2:

false

refit_level

[STRING] <string>

Max number of values: 1

Description:

Grid-accuracy preset for tadah refit: coarse | balanced | fine | accurate. Example 1:

balanced

Example 2:

fine

refit_log

[STRING] <string>

Max number of values: 1

Description:

tadah refit optimisation-log file (default refit_opt.log): one timestamped, immediately-flushed line per ladder rung (the rung reports its winning candidate’s metric/rmse/d1/d2 progression across the insert->explore->refine->merge checkpoints, any coalescence-guard merge count, and each stage’s stop reason + work done) plus the global seed, joint-refinement (FUSE) summary and crystal self-test, inspectable DURING the fit. Example 1:

refit_opt.log

refit_max_escalations

[UINT] <unsigned integer>

Max number of values: 1

Description:

tadah refit maximum automatic basis-escalation rounds when the self-test is above tolerance (0 disables escalation). Default 4. Example 1:

Example 2:

refit_mode

[STRING] <string>

Max number of values: 1

Description:

tadah refit fitting mode: ‘compact’ (DEFAULT; optimised blip centres/widths via ladder + analytic-gradient Ceres polish, ~30-120 params — the HPO-friendly search space; requires the Ceres-enabled HPO build, else falls back to dense with a warning) or ‘dense’ (region-uniform grids + auto-escalation, ~300-450 params, fastest fit). Example 1:

dense

Example 2:

compact

refit_objective

[STRING] <string>

Max number of values: 1 Default: energy

Description:

tadah refit compact mode: what the WHOLE fitting chain (seed, EXPLORE, REFINE, FUSE, and the rung selection / stall / stop decisions) optimises toward. ‘energy’ (default) = the crystal energy metric (current behaviour). ‘curve’ = a normalised combined RMSE of the fitted VALUE + 1st-derivative + 2nd-derivative against the tabulated function (each term divided by the table’s own RMS over the window, then RMS-combined). In curve mode the coefficients are solved from a stacked value+d1+d2 least-squares, the knot positions are searched with the configured EXPLORE optimiser on that objective, FUSE is a per-function curve refine that drops the energy term, and the energy metric is unused end-to-end. The target 1st/2nd derivatives are the EXACT analytic derivatives of the setfl cubic spline (no finite differencing). OPTIMISER POLICY: curve mode HONOURS the configured optimiser at every stage (EXPLORE inner, REFINE, FUSE; ENABLED false or ALGO NONE skips a stage); the per-stage legal set is enforced at parse time and the actual stack is logged. The pressure-relaxed weighting (REFIT_CURVE_WEIGHTING, ON by default) concentrates the fit where the crystal actually probes (ambient r/rho) so an aggressive least-squares optimiser does not over-fit the unprobed high-magnitude regions and wreck the crystal. FUSE keeps NO crystal E-V coupling (curve mode is independent of energy mode). Note the ladder grows until stall / n_max (the curve score has no tolerance threshold). Example 1:

energy

Example 2:

curve

refit_placement

[STRING] <string>

Max number of values: 1

Description:

tadah refit basis placement: ‘window’ (default; basis concentrated on the physically reachable pair-distance/density window, small uniform tail allocation) or ‘uniform’ (legacy single uniform grid over the full tabulated range). Example 1:

window

Example 2:

uniform

refit_r_lo_frac

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit inner radial fit bound as a fraction of rcut (default 0.20). Below it the round-trip setfl tabulates a smooth C2-matched repulsive continuation and the native pot.tadah is outside its validity domain. Example 1:

0.20

Example 2:

0.15

refit_reach_margin

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit safety margin multiplying the maximum reachable embedding density when choosing the F(rho) fit window. Default 1.25. Example 1:

1.25

refit_reach_smin

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit reachability probe: smallest linear compression a/a0 the fit must stay faithful at (0.75 ~ V/V0=0.42, multi-hundred-GPa regime). Sets the reachable pair-distance/density window and the self-test span. Default 0.75. Example 1:

0.75

Example 2:

0.65

refit_tol_ambient

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit faithfulness tolerance (meV/atom) for the ambient crystal self-test (E-V, a/a0 0.95-1.10). The basis auto-escalates until met. Default 0.1. Example 1:

0.5

refit_tol_compressed

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit faithfulness tolerance (meV/atom) for the COMPRESSED crystal self-test (a/a0 0.75-0.95, V/V0 down to ~0.42). The basis auto-escalates until met. Default 1.0. Example 1:

5.0

remove

[DOUBLE] <centre>

Max number of values: 1

Description:

tadah refit –edit: remove the basis function nearest this centre; the two neighbouring coefficients are least-squares adjusted to track the original curve (value + 1st + 2nd derivative). Remove the basis function of the –func curve whose centre is nearest to the given position (the chosen centre is reported). Unless the removed coefficient is exactly 0, the two remaining basis functions nearest in centre are least-squares adjusted against the original curve’s value, first and second derivative, sampled densely over the region the removed function controlled. Removal is inherently lossy — the report prints the residual before/after compensation; expect model performance to change and re-validate.

Removing the OUTERMOST radial knot of a knot-family potential shrinks the effective cutoff (RCUT2B/RCUTMB follow the largest remaining knot); the tail between the new and old cutoff cannot be compensated (warning).

Example 1:

2.75

rescale

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Enable rescaling of training weights. Example 1:

true

Example 2:

false

retarget

[STRING] <file>

Max number of values: 1

Description:

Retarget mode for tadah refit: rewrite the OPTIM search bounds of an existing hpotarget file (use with –pct, –set, -o or –in-place). Switches tadah refit into hpotarget-retargeting mode: read an existing hpotarget file (as written by tadah refit or a previous –retarget run) and rewrite the bounds of every OPTIM line — active AND commented, preserving the comment state — around the recovered parameter values.

Bound width control: - –pct <N>: global +-percent half-width (default: OPTIM_PCT, else 5). - –pct KEY=N: per-key override, e.g. –pct FIXWEIGHT=2 (repeatable; mix

with the global value: –pct 10 CGRID2B=2).

–set ‘KEY(i)=lo,hi’: absolute bounds for one parameter (repeatable, applied last).
–rcut2b <r> / –rcutmb <r>: declare a bump channel’s fixed cutoff in the same run — recentre the file’s OPTIM RCUT reference on r and truncate the rewritten CGRID/SGRID boxes to the bump-support rule c + w/eta <= r (active OPTIM RCUT lines are commented out with a warning; set the same RCUT in the training config).

Everything that is not an OPTIM line is preserved byte-for-byte. The output goes to -o (default: <input>_edited.hpotarget) or back into the input with –in-place. For knot-family potentials the commented RCUT2B/RCUTMB lines are recomputed from the largest knot position (the true effective cutoff).

Example 1:

refit.hpotarget

retarget_rcut2b

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit –retarget: declare the pair channel’s fixed cutoff in ONE step — recentre the file’s OPTIM RCUT2B reference line on this value and bind the bump-support cap against it, so the rewritten CGRID2B/SGRID2B boxes are truncated to c + w/eta <= this value. Any ACTIVE OPTIM RCUT2B line is commented out with a warning: on a bump channel the support gate keeps every bump inside the cutoff’s LOW bound, so searching the cutoff cannot affect the model. Remember to set the same RCUT2B in the training config. Ignored with a warning on knot channels, where RCUT2B is derived from the largest knot. Example 1:

6.0

retarget_rcutmb

[DOUBLE] <double>

Max number of values: 1

Description:

tadah refit –retarget: the density channel’s twin of –rcut2b — recentre the OPTIM RCUTMB reference and truncate the CGRIDMB/SGRIDMB boxes to the declared cutoff. Example 1:

6.0

set

[{STRING}] ‘<KEY>(<i>)=<lo>,<hi>’ […]

Max number of values: 2147483647

Description:

tadah refit –retarget: absolute bounds for one OPTIM parameter, e.g. –set ‘CGRID2B(3)=2.1,2.4’ (repeatable; applied after –pct; comment state preserved). Example 1:

'CGRID2B(3)=2.1,2.4'

Example 2:

'FIXWEIGHT(7)=-50,-10'

setfl_out

[STRING] <file>

Max number of values: 1

Description:

Output setfl file written from the native fitted functions (tadah refit). Example 1:

refit.eam.alloy

shuffle

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Randomize entries before splitting. Example 1:

true

Example 2:

false

stress

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Include stresses. Example 1:

true

Example 2:

false

structure

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Unified structural input(s). Supported file formats: .cif (Crystallographic Information File), VASP (POSCAR/CONTCAR), and CASTEP (.cell). The online option fetches structures from databases (MP, COD, NOMAD). Multiple structures can be space-separated or repeated. A mix of files and online sources is allowed.

Example 1:

crystal.cif

Example 2:

crystal1.cif crystal2.cell

Example 3:

mp-42 crystal.cif

task

Description:

A file containing task(s) to be executed. The task file is a convenient way to specify multiple tasks without having to provide all the command-line arguments for each task. The task file should be in the same format as the configuration file, but it can also include additional information such as the task name and any specific parameters for that task. A task in a task file begins with the keyword ‘TASK’ followed by the task name. The task name is simply a command to be executed or both command and subcommand. The lines following the TASK keyword should contain parameters required for the task specified above. For example, CLI –verbose 2 is ‘VERBOSE 2’ in the task file.

# Example TASK file containing two tasks: # Global options NUMERIC 14 # output precision VERBOSE 2 # verbosity level

TASK predict DBFILE db1.tadah db2.tadah db3.tadah DBFILE db4.tadah db5.tadah db6.tadah FORCE true ANALYTICS true

TASK data print STRUCTURE crystal1.cif crystal2.cif

Example 1:
path/to/tasks.tadah

threshold

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Floating point comparison threshold. Example 1:

1e-4

type

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Generic types. Example 1:

string1 string2

Example 2:

string1 /path/to/file

uncertainty

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Output uncertainty estimates. Example 1:

true

Example 2:

false

uniform

[UINT] <unsigned integer>

Max number of values: 1

Description:

Sample uniformly every N-th entry. Example 1:

validation

[{STRING}] <string> [<string> …]

Max number of values: 2147483647

Description:

Validation dataset file(s). Example 1:

valid.tadah

verbose

[UINT] <unsigned integer>

Max number of values: 1 Default: 1

Description:

Verbosity level. 0-2: ERROR, WARNING, INFO. Verbosity level. 0: ERROR, 1: WARNING, 2: INFO. The verbosity level controls the amount of information printed during execution. Higher levels provide more detailed output.

Example 1:

wdbfile

[{DOUBLE}] <double> [<double> …]

Max number of values: 2147483647

Description:

Per-dataset weight multipliers, one per DBFILE entry. Multiplies eweight, fweight, and sweight of every configuration in the corresponding DBFILE by the given factor. Use to bias training toward or away from particular datasets. Composes multiplicatively with WDBFILE_AUTO.

Example 1:

1.0 0.5 0.1

wdbfile_auto

[DOUBLE] <double>

Max number of values: 1 Default: 0.0

Description:

Auto size-balance datasets: per-config weight multiplied by 1/N_i^alpha. Rebalances per-dataset contributions to the training loss by multiplying each configuration’s weight by N_i^(-alpha), where N_i is the number of (post- filter) configurations in dataset i. alpha=0 disables (default). alpha=0.5 is the recommended starting point (sqrt-inverse, soft balance). alpha=1 fully equalises aggregate dataset contribution. Composes multiplicatively with user-given WDBFILE.

Example 1:

0.5

Example 2:

1.0

zero_com_force

[BOOL] <boolean>

Max number of values: 1 Default: false

Description:

Subtract per-config mean force so each configuration has zero net force. Per configuration, subtracts the mean force from each atom so that the sum of forces over the configuration is exactly zero. Standard DFT post-processing trick to remove residual translational forces from incomplete relaxation/SCF.

Example 1:

true