.. This file, including introduction, is auto-generated from the TOML blueprint file. Do not edit it directly. .. _ConfigSection: Configuration File ================== .. warning:: This section is under construction and may not be complete or accurate. If you find any issues, please report them on the `Tadah!MLIP GitLab `_. .. note:: This configuration file is NOT used by the HPO (Hyperparameter Optimization) module of Tadah!MLIP. The HPO module uses a different configuration file format, which is documented in the :ref:`nested_fitting`. This section describes the format of the configuration file used by Tadah!MLIP. For example, the configuration file can control the training process, specifying one or more datasets for use during the training stage. It defines cutoff functions and corresponding radii along with the regression model and descriptor choices. .. important:: Indexing of items that take *positional indices* (e.g., ``INDEX 1``, CLI flag ``--index 1``) starts from **1**, not 0. Key/Value Pairs --------------- The primary structure in a configuration file is the KEY/VALUE pair. Each KEY/VALUE pair must be on a separate line, with the KEY appearing first. The KEY is always a string, followed by its VALUE. The format and type of a VALUE depend on the specific KEY. Common Usage ------------ Typically, only a subset of KEYS is needed to train a model. Tadah!MLIP will use default values for some keys. An error will occur if a required KEY with no default value is missing: .. code-block:: bash [user@host:~] $ tadah train -c config.train terminate called after throwing an instance of 'std::runtime_error' what(): Key not found: DBFILE Aborted (core dumped) This message indicates that the :ref:`DBFILE` KEY was not specified in the :file:`config.train` file. To resolve this, add the :ref:`DBFILE` key and its corresponding value to :file:`config.train`. Key Specifics ------------- The meaning of some keys can vary with the chosen command. Check the documentation for that specific command, model or descriptor to see which keys are required and how they are interpreted. While we strive to keep key meanings consistent across Tadah!MLIP, occasional differences may still occur. Comments -------- Use the ``#`` symbol to add comments in the configuration file. Both inline and full-line comments are supported. Key Values and Formats ---------------------- Some KEYS can have multiple values, specified in one of two ways: - Single line: .. code-block:: bash KEY VALUE1 VALUE2 VALUE3 - Multiple lines: .. code-block:: bash KEY VALUE1 KEY VALUE2 KEY VALUE3 Value Limits ------------ Each keyword takes a fixed number of values. Passing the wrong count can raise an error, but enforcement is not yet fully consistent. While this is being improved, keep these points in mind: * **Too many values** – Tadah!MLIP may reject the input with a clear message or quietly discard the extras; the latter can later surface as obscure run-time failures (even an occasional segmentation fault). * **Too few values** – usually triggers an error, although a crash remains possible in rare corner cases. In short, give every keyword exactly the number of values it expects—no more, no less—to avoid unpleasant surprises. .. _SupportedKEYS: Supported KEYS -------------- This section contains all KEYS currently used by Tadah!MLIP. .. _ALPHA: ALPHA ^^^^^ [*DOUBLE*] *Max number of values*: 1 *Default*: 1.0 *Description*: Weight precision hyper-parameter. This is the starting guess for the evidence approximation algorithm. *Example* 1:: ALPHA 0.23 .. _BASIS: BASIS ^^^^^ [*{DOUBLE}*] [ ...] *Max number of values*: 2147483647 *Description*: Basis vectors for non-linear Kernel Ridge Regression. They represent the features or functions used to map input data into a higher-dimensional feature space. *Example* 1:: 2.0 -4.65 0.4 *Example* 2:: -1.0 .. _BETA: BETA ^^^^ [*DOUBLE*] *Max number of values*: 1 *Default*: 1.0 *Description*: Noise precision hyper-parameter. This is the starting guess for the evidence approximation algorithm. *Example* 1:: BETA 0.0001 .. _CEMBFUNC: CEMBFUNC ^^^^^^^^ [*{DOUBLE}*] [ ...] *Max number of values*: 2147483647 *Description*: Position parameters for an embedding function. Used by certain many-body descriptors (e.g., F_RLR). When using DM_mJoin, supply one or more lists of parameters matching those in SEMBFUNC. *Example* 1:: CEMBFUNC 0.14 0.45 1.00 1.1 .. _CGRID2B: CGRID2B ^^^^^^^ [*{DOUBLE}*] [ ...], [*{STRING INT DOUBLE DOUBLE}*] ( ) [...] *Max number of values*: 2147483647 *Description*: Controls the center positions for radial basis functions (two-body). The parameter list may be provided manually or generated automatically. When using the meta descriptor D2_mJoin, specify one or more lists of centers corresponding to each descriptor. The number of centers should typically match the number of width parameters (SGRID2B) and remain below the cutoff distance. Alternatively, use the algorithm keyword followed by parameters to generate centers automatically (e.g., LOG or LIN). *Example* 1:: CGRID2B LIN 10 0 6 *Example* 2:: CGRID2B 1.0 2.0 *Example* 3:: CGRID2B 1.0 2.0 CGRID2B 1.5 2.5 .. _CGRIDMB: CGRIDMB ^^^^^^^ [*{DOUBLE}*]
..., [*{STRING INT DOUBLE DOUBLE}*] *Max number of values*: 2147483647 *Description*: Specifies the center positions for many-body radial basis functions. Centers may be provided manually or generated automatically. When using the DM_mJoin meta descriptor, supply one or more lists of centers for each concatenated descriptor. Alternatively, include an algorithm such as 'L' (logarithmic) or 'U' (uniform spacing) followed by parameters. *Example* 1:: CGRIDMB LIN 4 0 6.2 *Example* 2:: CGRIDMB 0.5 0.7 *Example* 3:: CGRIDMB 0.5 0.7 CGRIDMB 0.6 0.8 .. _DIMER: DIMER ^^^^^ [*BOOL DOUBLE BOOL*] *Max number of values*: 3 *Default*: false / 0 / false *Description*: Control for DIMER models. Users should not modify this key. *Example* 1:: DIMER true 1.104 true .. _EWEIGHT: EWEIGHT ^^^^^^^ [*DOUBLE*] *Max number of values*: 1 *Default*: 1.0 *Description*: Global energy scaling factor. Energies are always scaled by 1/(number of atoms). Additional configuration-level scaling factors can apply. Combined factor = EWEIGHT*(config eweight)/(#atoms). *Example* 1:: EWEIGHT 0.96 .. _FIXINDEX: FIXINDEX ^^^^^^^^ [*{UINT}*] [ ...] *Max number of values*: 2147483647 *Description*: Indices of weights to be fixed in optimization. Must be used with FIXWEIGHT. *Example* 1:: 1 4 5 .. _FIXWEIGHT: FIXWEIGHT ^^^^^^^^^ [*{UINT}*] [ ...] *Max number of values*: 2147483647 *Description*: Values for weights to be fixed in optimization. Must be used with FIXINDEX. The i-th value in FIXWEIGHT corresponds to the i-th index in FIXINDEX. *Example* 1:: 1 5 9 .. _FWEIGHT: FWEIGHT ^^^^^^^ [*DOUBLE*] *Max number of values*: 1 *Default*: 1.0 *Description*: Global force scaling factor. Each force component is scaled by 1/(#atoms)/3. Additional config-level scaling factors can apply. Combined factor = FWEIGHT*(config fweight)/(#atoms)/3. *Example* 1:: FWEIGHT 1e-2 .. _INIT2B: INIT2B ^^^^^^ [*BOOL*] *Max number of values*: 1 *Default*: false *Description*: If set to true, the two-body descriptor will be calculated. *Example* 1:: INIT2B true .. _INITMB: INITMB ^^^^^^ [*BOOL*] *Max number of values*: 1 *Default*: false *Description*: If set to true, the many-body descriptor will be calculated. *Example* 1:: INITMB true .. _LAMBDA: LAMBDA ^^^^^^ [*INT*] , [*DOUBLE*] , [*INT DOUBLE*] *Max number of values*: 2 *Default*: 0 *Description*: Controls the regularization parameter λ for BLR and KRR. If N=0, no regularization. If N>0, λ is set to that value. If N<0, an evidence approximation is used to estimate λ. For LAMBDA 0, you can provide a second number (double) that sets the effective rank threshold (default 1e-8). *Example* 1:: LAMBDA -1 *Example* 2:: LAMBDA 1e-4 *Example* 3:: LAMBDA 0 *Example* 4:: LAMBDA 0 1e-12 .. _MBLOCK: MBLOCK ^^^^^^ [*UINT*] *Max number of values*: 1 *Default*: 64 *Description*: ScalaPACK row block size MB. *Example* 1:: 20 .. _MODEL: MODEL ^^^^^ [*STRING STRING*] MODEL FUNCTION, [*STRING STRING STRING*] MODEL FUNCTION OPTION, [*STRING STRING UINT*] MODEL FUNCTION OPTION *Max number of values*: 3 *Description*: Defines the training model and function. MODEL can be any class inheriting from M_Base (e.g., M_KRR, M_BLR). FUNCTION must be a valid child class of Function_Base (e.g., Kern_Linear, BF_Linear, BF_Polynomial2). Various combinations (KRR with different kernels, BLR with various basis functions) are possible. *Example* 1:: MODEL M_BLR BF_Linear *Example* 2:: MODEL M_BLR BF_Polynomial2 *Example* 3:: MODEL M_KRR Kern_Linear .. _MPARAMS: MPARAMS ^^^^^^^ [*{DOUBLE}*] [ ...] *Max number of values*: 2147483647 *Description*: Specifies additional numeric parameters for certain models. Some models require extra parameters. Refer to the model-specific documentation for details. Many models do not need any extra parameters. *Example* 1:: MPARAMS 0.1 *Example* 2:: MPARAMS 0.1 0.2 0.3 .. _MPIWPCKG: MPIWPCKG ^^^^^^^^ [*UINT*] *Max number of values*: 1 *Default*: 50 *Description*: The number of structures in a single MPI work package. *Example* 1:: 20 .. _NBLOCK: NBLOCK ^^^^^^ [*UINT*] *Max number of values*: 1 *Default*: 64 *Description*: ScalaPACK column block size NB. *Example* 1:: 20 .. _NMEAN: NMEAN ^^^^^ [*{DOUBLE}*] [ ...] *Max number of values*: 2147483647 *Description*: Mean values from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM). *Example* 1:: 2.0 -4.65 0.4 *Example* 2:: -1.0 .. _NORM: NORM ^^^^ [*BOOL*] *Max number of values*: 1 *Default*: false *Description*: Standardize descriptors. Set to true to standardize descriptors, typically relevant if energies are used for fitting. *Example* 1:: true *Example* 2:: false .. _NSTDEV: NSTDEV ^^^^^^ [*{DOUBLE}*] [ ...] *Max number of values*: 2147483647 *Description*: Standard deviations from descriptor normalization. Obtained after standardizing the columns of the DesignMatrix (see NORM). The vector size equals the number of columns. *Example* 1:: 2.0 -4.65 0.4 *Example* 2:: -1.0 .. _OALGO: OALGO ^^^^^ [*INT*]