Configuration File
Note
This page is auto-generated from the blueprint TOML on every docs build. If you spot a mismatch between this reference and runtime behaviour, please report it on the Tadah!MLIP GitLab.
Note
This configuration file is NOT used by the HPO (Hyperparameter Optimization) module of Tadah!MLIP. The HPO module uses a different configuration file format, which is documented in the Nested Fitting.
This section describes the format of the configuration file used by Tadah!MLIP.
For example, the configuration file can control the training process, specifying one or more datasets for use during the training stage. It defines cutoff functions and corresponding radii along with the regression model and descriptor choices.
Important
Indexing of items that take positional indices (e.g., INDEX 1, CLI flag
--index 1) starts from 1, not 0.
Key/Value Pairs
The primary structure in a configuration file is the KEY/VALUE pair. Each KEY/VALUE pair must be on a separate line, with the KEY appearing first. The KEY is always a string, followed by its VALUE. The format and type of a VALUE depend on the specific KEY.
Common Usage
Typically, only a subset of KEYS is needed to train a model. Tadah!MLIP will use default values for some keys. An error will occur if a required KEY with no default value is missing:
[user@host:~] $ tadah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
what(): Key not found: DBFILE
Aborted (core dumped)
This message indicates that the dbfile KEY was not specified in the config.train file. To resolve this, add the dbfile key and its corresponding value to config.train.
Key Specifics
The meaning of some keys can vary with the chosen command. Check the documentation for that specific command, model or descriptor to see which keys are required and how they are interpreted. While we strive to keep key meanings consistent across Tadah!MLIP, occasional differences may still occur.
Key Values and Formats
Some KEYS can have multiple values, specified in one of two ways:
Single line:
KEY VALUE1 VALUE2 VALUE3
Multiple lines:
KEY VALUE1 KEY VALUE2 KEY VALUE3
Value Limits
Each keyword takes a fixed number of values. Passing the wrong count can raise an error, but enforcement is not yet fully consistent. While this is being improved, keep these points in mind:
Too many values – Tadah!MLIP may reject the input with a clear message or quietly discard the extras; the latter can later surface as obscure run-time failures (even an occasional segmentation fault).
Too few values – usually triggers an error, although a crash remains possible in rare corner cases.
In short, give every keyword exactly the number of values it expects—no more, no less—to avoid unpleasant surprises.
Supported KEYS
This section contains all KEYS currently used by Tadah!MLIP.
ALPHA
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Weight precision hyper-parameter. This is the starting guess for the evidence approximation algorithm.
Example 1:
ALPHA 0.23
ATOM
- [{STRING}] <element> …>
Max number of values: 118
Description:
Chemical elements. Example 1:
"Kr"
AUDIT
- [STRING] <string>
Max number of values: 1 Default: off
Description:
Pre-flight audit mode. ‘off’ (default) skips the audit. ‘warn’ emits diagnostics but proceeds. ‘error’ makes any FAIL-level finding fatal. The audit’s data scan is sampled by default (see AUDIT_SAMPLE). Example 1:
offExample 2:
warnExample 3:
error
AUDIT_SAMPLE
- [INT] <integer>
Max number of values: 1 Default: 256
Description:
Number of training structures sampled (deterministic random) for the pre-flight audit’s dataset stats. 0 means use the entire StructureDB. Has no effect when AUDIT is ‘off’. Example 1:
256Example 2:
1024Example 3:
0
BASIS
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Basis vectors for non-linear Kernel Ridge Regression. They represent the features or functions used to map input data into a higher-dimensional feature space.
Example 1:
2.0 -4.65 0.4
Example 2:
-1.0
BETA
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Noise precision hyper-parameter. This is the starting guess for the evidence approximation algorithm.
Example 1:
BETA 0.0001
CEMBFUNC
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Position parameters for an embedding function. Used by certain many-body descriptors (e.g., F_RLR). When using DM_mJoin, supply one or more lists of parameters matching those in SEMBFUNC.
Example 1:
CEMBFUNC 0.14 0.45 1.00 1.1
CGRID2B
- [{DOUBLE}] <double> [<double> …], [{STRING INT DOUBLE DOUBLE}] (<algorithm> <n> <start> <stop>) […]
Max number of values: 2147483647
Description:
Controls the center positions for radial basis functions (two-body). The parameter list may be provided manually or generated automatically. When using the meta descriptor D2_mJoin, specify one or more lists of centers corresponding to each descriptor. The number of centers should typically match the number of width parameters (SGRID2B) and remain below the cutoff distance. Alternatively, use the algorithm keyword followed by parameters to generate centers automatically (e.g., LOG or LIN).
Example 1:
CGRID2B LIN 10 0 6
Example 2:
CGRID2B 1.0 2.0
Example 3:
CGRID2B 1.0 2.0 CGRID2B 1.5 2.5
CGRIDMB
- [{DOUBLE}] <center> …, [{STRING INT DOUBLE DOUBLE}] <algorithm> <N> <START> <STOP>
Max number of values: 2147483647
Description:
Specifies the center positions for many-body radial basis functions. Centers may be provided manually or generated automatically. When using the DM_mJoin meta descriptor, supply one or more lists of centers for each concatenated descriptor. Alternatively, include an algorithm such as ‘L’ (logarithmic) or ‘U’ (uniform spacing) followed by parameters.
Example 1:
CGRIDMB LIN 4 0 6.2
Example 2:
CGRIDMB 0.5 0.7
Example 3:
CGRIDMB 0.5 0.7 CGRIDMB 0.6 0.8
DIMER
- [BOOL DOUBLE BOOL] <boolean> <double> <boolean>
Max number of values: 3 Default: false / 0 / false
Description:
Control for DIMER models. Users should not modify this key. Example 1:
DIMER true 1.104 true
EWEIGHT
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Global energy scaling factor. Energies are always scaled by 1/(number of atoms). Additional configuration-level scaling factors can apply. Combined factor = EWEIGHT*(config eweight)/(#atoms).
Example 1:
EWEIGHT 0.96
FIXINDEX
- [{INDEX_PATTERN}] <index>[,<index>…], [**] <start>-<stop>, [**] <start>-<stop>:<step>
Max number of values: 2147483647
Description:
Indices of weights to be fixed in optimization. Must be used with FIXWEIGHT. dim(FIXINDEX) = dim(FIXWEIGHT). Allows flexible selection of indices. Supports single indices, ranges (e.g., start-stop), lists, or intervals (start-stop:step). Indices are 1-based. Repeated indices are removed automatically.
Example 1:
1,3,5
Example 2:
1-4,7,9
Example 3:
1-10:2
FIXWEIGHT
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Values for weights to be fixed in optimization. Must be used with FIXINDEX. dim(FIXINDEX) = dim(FIXWEIGHT). The i-th value in FIXWEIGHT corresponds to the i-th index in FIXINDEX.
Example 1:
0.12 1.0 2.00
FWEIGHT
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Global force scaling factor. Each force component is scaled by 1/(#atoms)/3. Additional config-level scaling factors can apply. Combined factor = FWEIGHT*(config fweight)/(#atoms)/3.
Example 1:
FWEIGHT 1e-2
HEALTH_LOG
- [STRING] <string>
Max number of values: 1 Default: summary
Description:
HPO per-iter health monitor verbosity. ‘summary’ (default) rate-limits warnings to ~1 per 100 evals per (block, kind). ‘full’ emits every offending iter. ‘off’ disables the monitor. Example 1:
summaryExample 2:
fullExample 3:
off
INIT2B
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
If set to true, the two-body descriptor will be calculated. Example 1:
INIT2B true
INITMB
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
If set to true, the many-body descriptor will be calculated. Example 1:
INITMB true
LAMBDA
- [INT] <int>, [DOUBLE] <double>, [INT DOUBLE] <int> <double>
Max number of values: 2 Default: 0
Description:
Controls the regularization parameter λ for BLR and KRR. If N=0, no regularization. If N>0, λ is set to that value. If N<0, an evidence approximation is used to estimate λ. For LAMBDA 0, you can provide a second number (double) that sets the effective rank threshold (default 1e-8).
Example 1:
LAMBDA -1
Example 2:
LAMBDA 1e-4
Example 3:
LAMBDA 0
Example 4:
LAMBDA 0 1e-12
MBLOCK
- [UINT] <unsigned integer>
Max number of values: 1 Default: 64
Description:
ScalaPACK row block size MB. Example 1:
20
MODEL
- [STRING STRING] MODEL FUNCTION, [STRING STRING STRING] MODEL FUNCTION OPTION, [STRING STRING UINT] MODEL FUNCTION OPTION
Max number of values: 3
Description:
Defines the training model and function. MODEL can be any class inheriting from M_Base (e.g., M_KRR, M_BLR). FUNCTION must be a valid child class of Function_Base (e.g., Kern_Linear, BF_Linear, BF_Polynomial2). Various combinations (KRR with different kernels, BLR with various basis functions) are possible.
Example 1:
MODEL M_BLR BF_Linear
Example 2:
MODEL M_BLR BF_Polynomial2
Example 3:
MODEL M_KRR Kern_Linear
MPARAMS
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Specifies additional numeric parameters for certain models. Some models require extra parameters. Refer to the model-specific documentation for details. Many models do not need any extra parameters.
Example 1:
MPARAMS 0.1
Example 2:
MPARAMS 0.1 0.2 0.3
MPIWPCKG
- [UINT] <unsigned integer>
Max number of values: 1 Default: 50
Description:
The number of structures in a single MPI work package. Example 1:
20
NBLOCK
- [UINT] <unsigned integer>
Max number of values: 1 Default: 64
Description:
ScalaPACK column block size NB. Example 1:
20
NMEAN
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Mean values from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM).
Example 1:
2.0 -4.65 0.4
Example 2:
-1.0
NORM
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Standardize descriptors. Set to true to standardize descriptors, typically relevant if energies are used for fitting.
Example 1:
trueExample 2:
false
NSTDEV
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Standard deviations from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM). The vector size equals the number of columns.
Example 1:
2.0 -4.65 0.4
Example 2:
-1.0
OALGO
- [INT] <option>, [INT DOUBLE] <option> <value>
Max number of values: 2 Default: 1
Description:
This key controls the optimization algorithm used to train the model. The default algorithm is Option 1 (GELSD) with a conditioning parameter (\(rcond\)) of \(1 \times 10^{-8}\). The regularization parameter is handled separately by the LAMBDA key.
Available options:
1 - GELSD: Utilizes the LAPACK routine
DGELSDto solve linear least squares problems using the Singular Value Decomposition (SVD). This method is robust and can handle rank-deficient systems. It computes the minimum-norm solution and is suitable for ill-conditioned problems. The parameterMcontrols the reciprocal of the conditioning number (\(\text{rcond}\)). It defaults to \(1 \times 10^{-8}\) if not specified. SettingM = -1uses machine precision, which may be unstable for ill-conditioned problems.2 - GELS: Uses the LAPACK routine
DGELSto solve linear least squares problems via QR or LQ factorization. This method assumes that the matrix has full rank and is the fastest among the options. It is suitable for well-conditioned problems but less robust for ill-conditioned or rank-deficient matrices.3 - Custom Implementation Similar to DGELS (Uses SVD): Employs a custom algorithm similar to
DGELSbut utilizes SVD. This method allows the use of the evidence approximation algorithm and computes the covariance matrix \(\Sigma\), providing additional statistical information about the solution. Like option 1, the parameterMcontrols the reciprocal of the conditioning number (\(\text{rcond}\)), defaulting to \(1 \times 10^{-8}\). SettingM = -1uses machine precision, which might be unstable for ill-conditioned problems.4 - Cholesky Decomposition: Solves the normal equations \(A^\top A x = A^\top b\) using Cholesky decomposition. This method is efficient for well-conditioned, full-rank matrices but may be less stable for ill-conditioned or rank-deficient problems because it squares the condition number of the matrix. It does not require additional parameters.
Performance Comparison:
Option 2 is the fastest but assumes a full-rank matrix and is less robust for ill-conditioned problems.
Option 4 is also efficient but may suffer from numerical instability in ill-conditioned or rank-deficient problems due to squaring of the condition number.
Option 1 offers a balance between speed and robustness, handling rank-deficient and ill-conditioned problems better than options 2 and 4.
Option 3 is the slowest due to the additional computations but provides valuable extra information like the covariance matrix.
Usage:
To select an algorithm, set the
OALGOkey followed by the option number.For options 1 and 3, you can specify the conditioning parameter
Mafter the option number.Option 4 does not require any additional parameters.
Regularization Parameter:
The regularization parameter is controlled by the LAMBDA key, which should be set separately to apply regularization to the model.
Examples:
OALGO 1# Uses GELSD with default conditioning parameter (M = 1e-8)OALGO 1 -1# Uses GELSD with machine precision (M = -1)OALGO 2# Uses GELSOALGO 3 1e-10# Uses custom implementation withM = 1e-10OALGO 4# Uses Cholesky Decomposition
Notes:
Conditioning Parameter
M: Controls the reciprocal of the condition number (\(\text{rcond}\)). A smallerM(closer to machine precision) includes more singular values in the solution, which may be necessary for certain problems but can introduce instability if the matrix is ill-conditioned.Machine Precision (
M = -1): Using machine precision can lead to numerical instability in ill-conditioned problems. It is recommended to use a positiveMvalue to exclude negligible singular values that could adversely affect the solution.Covariance Matrix \(\Sigma\): Option 3 provides the covariance matrix, which can be useful for statistical analyses and understanding the uncertainty in the estimated parameters.
Cholesky Decomposition: Option 4 solves the normal equations using Cholesky decomposition. It is efficient but may be numerically unstable for ill-conditioned or rank-deficient problems due to squaring the condition number. It is best used when the matrix is well-conditioned and of full rank.
Additional Information:
Regularization with LAMBDA:
The LAMBDA key controls the regularization parameter used in the training process. It should be specified separately to apply regularization to the model.
Regularization helps prevent overfitting by adding a penalty for larger parameter values.
LAPACK Routines:
DGELSD: Computes the minimum-norm solution to a real linear least squares problem using the SVD of the coefficient matrix.DGELS: Solves overdetermined or underdetermined real linear systems involving an \(M \times N\) matrix, using a QR or LQ factorization of the matrix.DPOTRFandDPOTRS: Used in the Cholesky decomposition to factorize a symmetric positive-definite matrix and solve the resulting linear system.
When to Use Each Option:
Use Option 1 (GELSD) when you need robustness against rank deficiency and moderate performance.
Use Option 2 (GELS) for the fastest performance on well-conditioned, full-rank matrices where robustness is less of a concern.
Use Option 3 (Custom SVD Implementation) when you require additional outputs like the covariance matrix and are willing to trade off performance for more comprehensive results.
Use Option 4 (Cholesky Decomposition) when you have a well-conditioned, full-rank matrix and need an efficient solution, but be cautious of potential numerical instability in ill-conditioned or rank-deficient problems.
Example 1:
OALGO 1
Example 2:
OALGO 1 -1
Example 3:
OALGO 2
Example 4:
OALGO 3 1e-10
Example 5:
OALGO 4
RCTYPE2B
- [{STRING}] Cut_<name>
Max number of values: 2147483647
Description:
Specifies the cutoff function type(s) for two-body descriptor(s). Provide a single cutoff type (e.g., Cut_Cos) for one descriptor or multiple types corresponding to each descriptor when using the D2_mJoin meta descriptor.
Example 1:
RCTYPE2B Cut_Cos
Example 2:
RCTYPE2B Cut_Cos Cut_Tanh
RCTYPEMB
- [{STRING}] Cut_<name>
Max number of values: 2147483647
Description:
Specifies the cutoff function type(s) for many-body descriptor(s). Provide a single cutoff type (e.g., Cut_Cos) for one descriptor or a series of types—each for a corresponding descriptor when using the DM_mJoin meta descriptor.
Example 1:
RCTYPEMB Cut_Cos
Example 2:
RCTYPEMB Cut_Cos Cut_Tanh
RCUT2B
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Specifies the cutoff distance(s) for two-body descriptor(s). Provide a single value for one descriptor or multiple values—one for each descriptor when using the D2_mJoin meta descriptor.
Example 1:
RCUT2B 3.0
Example 2:
RCUT2B 3.0 7.5
RCUTENV
- [DOUBLE] <double>
Max number of values: 1
Description:
Envelope cutoff distance used by F_mEnv (DM_mBlipEnv). Width of the inverse-cutoff envelope (Cut_SinInv) wrapped around the F_Blip embedding by the DM_mBlipEnv descriptor: the envelope ramps from 0 at rho=0 to 1 at rho=RCUTENV, suppressing the embedding contribution where the underlying Gaussian basis is largest.
Example 1:
RCUTENV 1.5
RCUTMB
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Specifies the cutoff distance(s) for many-body descriptor(s). Provide a single value for a standalone descriptor or multiple values corresponding to each descriptor when using the DM_mJoin meta descriptor.
Example 1:
RCUTMB 4.9
Example 2:
RCUTMB 4.9 8.0
SBASIS
- [UINT] <unsigned integer>
Max number of values: 1
Description:
Number of basis functions for the DesignMatrix. Many models do not require this. If specified, it sets the number of basis functions used in the design matrix.
Example 1:
10Example 2:
102
SEMBFUNC
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Shape parameters for an embedding function. Used by certain many-body descriptors (e.g., F_RLR). When using the DM_mJoin descriptor, provide lists of parameters corresponding to each descriptor, ensuring consistency with CEMBFUNC.
Example 1:
SEMBFUNC 0.14 0.45 1.00 1.1
SETFL
- [STRING] <string>
Max number of values: 1
Description:
Path to the setfl file with the EAM potential. Example 1:
Ta1_Ravelo_2013.eam.alloy
SGRID2B
- [{DOUBLE}] <double> [<double> …], [{STRING INT DOUBLE DOUBLE}] (<algorithm> <n> <start> <stop>) […]
Max number of values: 2147483647
Description:
Specifies the width parameters for two-body radial basis functions. These widths can be supplied manually or auto-generated. When using the meta descriptor D2_mJoin, provide one or more lists of width values for each concatenated descriptor. The number of widths must match the number of centers (CGRID2B). Alternatively, specify the algorithm keyword with parameters to generate widths automatically (e.g., LOG or LIN).
Example 1:
SGRID2B LIN 3 0 1.0
Example 2:
SGRID2B GEOM 6 0.1 10
Example 3:
SGRID2B 0.01 0.02 0.03 SGRID2B GEOM 6 0.1 10
SGRIDMB
- [{DOUBLE}] <width> …, [{STRING INT DOUBLE DOUBLE}] <algorithm> <N> <START> <STOP>
Max number of values: 2147483647
Description:
Specifies the width parameters for many-body radial basis functions. Values may be provided manually or generated automatically. When using the DM_mJoin meta descriptor, provide one or more lists of widths for each descriptor. Ensure consistency with the centers defined in CGRIDMB. Alternatively, use an algorithm (e.g., LOG or LIN) and its parameters to generate widths automatically.
Example 1:
SGRIDMB LIN 3 0 1.0
Example 2:
SGRIDMB 0.01 0.02 0.03
Example 3:
SGRIDMB 0.01 0.02 0.03 SGRIDMB 0.02 0.03 0.04
SIGMA
- [INT {DOUBLE}] <integer> <double> …
Max number of values: 2147483647
Description:
The Σ matrix used in Bayesian Linear Regression. An N×N matrix in column-major order. Applies to M_BLR.
Example 1:
`2 1.2 2.2 2.3 3.3`
SWEIGHT
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Global stress scaling factor. Each stress component is scaled by 1/6. Additional config-level scaling can apply. Combined factor = SWEIGHT*(config sweight)/6.
Example 1:
SWEIGHT 1e-1
TYPE2B
- [STRING {[INT | DOUBLE]} {STRING STRING}] D2_<name> {[param]} {ELEMENT ELEMENT}, [STRING {STRING STRING}], [STRING]
Max number of values: 2147483647
Description:
Specifies the two-body descriptor type(s) to be used. For a single descriptor, provide its type (e.g., D2_LJ). To concatenate multiple descriptors, use the meta descriptor D2_mJoin followed by the individual descriptor parameters. Elements should be provided in pairs.
Example 1:
TYPE2B D2_LJ Kr Kr
Example 2:
TYPE2B D2_mJoin TYPE2B D2_MIE 11 6 Ti Ti TYPE2B D2_Blip 6 6 Ti Nb Nb Nb
TYPEMB
- [STRING UINT UINT {STRING STRING}] DM_<name> {[param]} {ELEMENT ELEMENT}, [STRING UINT UINT UINT {STRING STRING}], [STRING UINT UINT UINT UINT UINT {STRING STRING}], [STRING]
Max number of values: 2147483647
Description:
Specifies the many-body descriptor type(s) to be used. For a single descriptor, provide its type (e.g., DM_EAD). To combine multiple descriptors, use the meta descriptor DM_mJoin followed by the individual descriptor parameters. Elements should be provided in pairs.
Example 1:
TYPEMB DM_Blip 0 6 6 Ti Ti
Example 2:
TYPEMB DM_mJoin TYPEMB DM_Blip 1 6 6 Ti TI TYPEMB DM_Blip 0 6 6 Ti Nb Nb Nb
WATOM
- [{DOUBLE}] <double> [<double> …]
Max number of values: 118
Description:
Weights sorted by atomic number, from lowest Z to highest. WATOMS.size() must match ATOMS.size().
Example 1:
2.0 -4.65 0.4
Example 2:
-1.0
WEIGHTS
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Machine-learned coefficients for the model. These are species-dependent weights, obtained during optimization. Defaults to atomic numbers if unspecified.
Example 1:
WEIGHTS 0.12 1.2 0.3
analytics
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Perform analytics. Example 1:
trueExample 2:
false
append
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Append to the existing file. Example 1:
trueExample 2:
false
atompair
- [{STRING}] <element1> <element2>
Max number of values: 2
Description:
Pair of chemical elements. Example 1:
"Kr Kr"
bias
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Per-species intercept. When true, adds N=size(ATOM) one-hot intercept columns to the design matrix; the regression learns one constant energy per species and LAMMPS adds it back per atom. Required when NORM=true with MODEL=BF_Linear. Example 1:
trueExample 2:
false
bondenergy
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Calculate bond energy instead of per atom value. Example 1:
trueExample 2:
false
chunk
- [{UINT}] <unsigned integer> [<unsigned integer> …]
Max number of values: 2147483647
Description:
Specify chunk sizes. Example 1:
20 5 3
Example 2:
10
config
- [STRING] <file>
Max number of values: 1
Description:
Path to a configuration file. Example 1:
config.tadah
Example 2:
../config.tadah
Example 3:
/path/to/config.tadah
dbfile
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Path(s) to Tadah! database file(s). Absolute or relative path to the Tadah! database file(s). The relative path is interpreted relative to the current working directory. Multiple dataset paths can be provided either as space-separated tokens or by repeating this key.
Example 1:
dbfile /path/to/dbfile
Example 2:
dbfile /path/to/dbfile1 /path/to/dbfile2
derivative
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Calculate derivative of the function. Example 1:
trueExample 2:
false
dft-file
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Input DFT file(s). A single file or multiple files (space-separated). Used to extract reference data for training. Supported formats: VASP (OUTCAR, vasprun.xml), CASTEP (.castep, .md, .geom).
Example 1:
run1.outcar
Example 2:
run1.outcar run2.outcar
efilter
- [DOUBLE DOUBLE] <E_min_per_atom> <E_max_per_atom>
Max number of values: 2 / 2
Description:
Drop configurations whose per-atom energy is outside [E_min, E_max] (eV). Outlier filter applied at load time before any energy-shift derivation or training-weight assignment, so outliers do not poison ESHIFT_ATOM / ESHIFT_DBATOM / EWEIGHT_TEMP. The threshold is compared against E/N_atoms (per-atom energy). Both bounds must be supplied. To disable, omit the key.
Example 1:
-12.0 -2.0
error
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Generate error estimates. Example 1:
trueExample 2:
false
eshift
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Per-atom reference energy to subtract from each configuration. Per-element reference energies. If there are multiple species, the number of values must match the number of species (sorted by Z). At load time the total energy of each configuration is reduced by sum_Z N_Z * ESHIFT[Z], so an isolated-atom config with energy E_atom and ESHIFT[Z]=E_atom yields a post-shift energy of zero. Used by tadah train, tadah predict, tadah hpo, and tadah data balance. Persisted into pot.tadah for prediction round-trip.
Example 1:
0.5Example 2:
0.5 -0.1
eshift_atom
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Derive ESHIFT from isolated-atom configurations in the dataset (mean per Z). Scans the loaded dataset for single-atom configurations (natoms == 1), groups them by atomic number, and sets ESHIFT[Z] to the mean per-Z energy. If a species has no isolated-atom config in the dataset, ESHIFT[Z] = 0 for that species and a WARNING is logged. If multiple isolated-atom configs of the same Z disagree by more than 1e-3 eV, an INFO line records the spread. Mutually exclusive with explicit ESHIFT and ESHIFT_DBATOM.
Example 1:
true
eshift_dbatom
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Derive ESHIFT by least-squares atomic-energy fit over the database. Fits per-element reference energies by least squares: minimise ||y - M beta||^2 where y[i] is the total energy of configuration i and M[i, k] is the count of species k in configuration i. The fitted beta_k becomes ESHIFT[Z(k)]. More robust than ESHIFT_ATOM when the dataset has no isolated-atom configs but does have compositional diversity. Mutually exclusive with explicit ESHIFT and ESHIFT_ATOM.
Example 1:
true
even
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Equal-size partition. Example 1:
trueExample 2:
false
eweight_temp
- [DOUBLE] <double>
Max number of values: 1
Description:
Boltzmann reweighting temperature in Kelvin (multiplies eweight). After ESHIFT is applied, multiplies each configuration’s eweight by exp(-(E/N - E_min)/(kB * T)) where E_min is the minimum per-atom energy in the dataset and kB = 8.617333262e-5 eV/K. Emphasises low-energy configurations. Composes multiplicatively with the per-structure eweight already in the dataset file. Omit the key to disable.
Example 1:
300Example 2:
1000
ffilter
- [DOUBLE] <double>
Max number of values: 1
Description:
Drop configurations where any atomic force magnitude exceeds this value (eV/Å). Outlier filter applied at load time. A configuration is dropped if any single atom has ‖F‖ > FFILTER. Useful for catching unconverged SCF or otherwise broken DFT runs.
Example 1:
20.0
force
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Include forces. Example 1:
trueExample 2:
false
format
- [STRING] <fmt>
Max number of values: 1
Description:
Output format (e.g., vasp, castep, lammps). Example 1:
castepExample 2:
lammpsExample 3:
vasp
hpotarget
- [STRING] <file>
Max number of values: 1
Description:
HPO target file. Example 1:
hpotargets.txt
index
- [{INDEX_PATTERN}] <index>[,<index>…], [**] <start>-<stop>, [**] <start>-<stop>:<step>
Max number of values: 2147483647
Description:
Index pattern. Allows flexible selection of dataset indices. Supports single indices, ranges (e.g., start-stop), lists, or intervals (start-stop:step). Indices are 1-based. Repeated indices are removed automatically.
Example 1:
1,3,5
Example 2:
1-4,7,9
Example 3:
1-10:2
lscale
- [DOUBLE] <double>
Max number of values: 1 Default: 1.0
Description:
Uniform length rescale factor applied to atomic positions, cell, and reference forces at load time. Multiplies atomic positions and cell vectors by this factor at the moment a dataset is loaded for training, prediction, or HPO. Reference forces are divided by the factor (chain rule on E(r)); stresses (stored as virial in energy units) are invariant under uniform length rescaling. The chosen factor is persisted into pot.tadah so future tadah predict and tadah hpo runs apply the same transformation. Use –no-lscale at predict time to override.
LSCALE is a training-side concept: the LAMMPS pair_style does NOT re-apply LSCALE. The user is expected to provide LAMMPS positions at the scale that matches the trained model (e.g. experimental lattice).
Example 1:
1.0030
merge
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Merge deduplication results into one file. Example 1:
trueExample 2:
false
no_eshift
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
(predict) Ignore any ESHIFT recorded in the loaded potential file. At predict time, override the ESHIFT values stored in pot.tadah. Use when the dataset you are predicting on is already at the shifted baseline (or you just want raw model output without any reference energy subtraction).
Example 1:
true
no_lscale
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
(predict) Ignore any LSCALE recorded in the loaded potential file. At predict time, override the LSCALE value stored in pot.tadah. Use when the dataset you are predicting on is already at the trained-model scale.
Example 1:
true
numeric
- [UINT] <unsigned integer>
Max number of values: 1 Default: 12
Description:
Numeric output precision. Sets the number of decimal places for output.
Example 1:
12
option
- [STRING] <arg>
Max number of values: 1
Description:
Positional argument Example 1:
arg
outfile
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Output file. The output file to be written. Multiple files can be specified if the command produces more than one output.
Example 1:
output.tadah
percent
- [{UINT}] <unsigned integer> [<unsigned integer> …]
Max number of values: 2147483647
Description:
Specify percentage partition. Example 1:
20 5 3
Example 2:
10
potential
- [STRING] <file>
Max number of values: 1
Description:
Trained model file. Example 1:
pot.tadah
quantity
- [STRING] <string>
Max number of values: 1
Description:
Generic quantity. Example 1:
validStringExample 2:
/path/to/file
random
- [UINT] <unsigned integer>
Max number of values: 1
Description:
Randomly sample N entries. Example 1:
5
range
- [DOUBLE DOUBLE INT] <START> <STOP> <NPOINTS>
Max number of values: 3 / 3
Description:
Plotting range [start stop npoints]. Example 1:
0.1 9.5 100
rescale
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Enable rescaling of training weights. Example 1:
trueExample 2:
false
shuffle
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Randomize entries before splitting. Example 1:
trueExample 2:
false
stress
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Include stresses. Example 1:
trueExample 2:
false
structure
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Unified structural input(s). Supported file formats: .cif (Crystallographic Information File), VASP (POSCAR/CONTCAR), and CASTEP (.cell). The online option fetches structures from databases (MP, COD, NOMAD). Multiple structures can be space-separated or repeated. A mix of files and online sources is allowed.
Example 1:
crystal.cif
Example 2:
crystal1.cif crystal2.cell
Example 3:
mp-42 crystal.cif
task
Description:
A file containing task(s) to be executed. The task file is a convenient way to specify multiple tasks without having to provide all the command-line arguments for each task. The task file should be in the same format as the configuration file, but it can also include additional information such as the task name and any specific parameters for that task. A task in a task file begins with the keyword ‘TASK’ followed by the task name. The task name is simply a command to be executed or both command and subcommand. The lines following the TASK keyword should contain parameters required for the task specified above. For example, CLI –verbose 2 is ‘VERBOSE 2’ in the task file.
# Example TASK file containing two tasks: # Global options NUMERIC 14 # output precision VERBOSE 2 # verbosity level
TASK predict DBFILE db1.tadah db2.tadah db3.tadah DBFILE db4.tadah db5.tadah db6.tadah FORCE true ANALYTICS true
TASK data print STRUCTURE crystal1.cif crystal2.cif
Example 1:
path/to/tasks.tadah
threshold
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Floating point comparison threshold. Example 1:
1e-4
type
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Generic types. Example 1:
string1 string2
Example 2:
string1 /path/to/file
uncertainty
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Output uncertainty estimates. Example 1:
trueExample 2:
false
uniform
- [UINT] <unsigned integer>
Max number of values: 1
Description:
Sample uniformly every N-th entry. Example 1:
10
validation
- [{STRING}] <string> [<string> …]
Max number of values: 2147483647
Description:
Validation dataset file(s). Example 1:
valid.tadah
verbose
- [UINT] <unsigned integer>
Max number of values: 1 Default: 1
Description:
Verbosity level. 0-2: ERROR, WARNING, INFO. Verbosity level. 0: ERROR, 1: WARNING, 2: INFO. The verbosity level controls the amount of information printed during execution. Higher levels provide more detailed output.
Example 1:
2
wdbfile
- [{DOUBLE}] <double> [<double> …]
Max number of values: 2147483647
Description:
Per-dataset weight multipliers, one per DBFILE entry. Multiplies eweight, fweight, and sweight of every configuration in the corresponding DBFILE by the given factor. Use to bias training toward or away from particular datasets. Composes multiplicatively with WDBFILE_AUTO.
Example 1:
1.0 0.5 0.1
wdbfile_auto
- [DOUBLE] <double>
Max number of values: 1 Default: 0.0
Description:
Auto size-balance datasets: per-config weight multiplied by 1/N_i^alpha. Rebalances per-dataset contributions to the training loss by multiplying each configuration’s weight by N_i^(-alpha), where N_i is the number of (post- filter) configurations in dataset i. alpha=0 disables (default). alpha=0.5 is the recommended starting point (sqrt-inverse, soft balance). alpha=1 fully equalises aggregate dataset contribution. Composes multiplicatively with user-given WDBFILE.
Example 1:
0.5Example 2:
1.0
zero_com_force
- [BOOL] <boolean>
Max number of values: 1 Default: false
Description:
Subtract per-config mean force so each configuration has zero net force. Per configuration, subtracts the mean force from each atom so that the sum of forces over the configuration is exactly zero. Standard DFT post-processing trick to remove residual translational forces from incomplete relaxation/SCF.
Example 1:
true
Comments
Use the
#symbol to add comments in the configuration file. Both inline and full-line comments are supported.