Quick Start

This section covers the fundamentals and command line interface (CLI) of Tadah!MLIP. The CLI provides necessary tools to train and test models. The trained model (interatomic potential) can be used to run molecular dynamics simulations with LAMMPS via the Tadah!LAMMPS plugin.

For installation instructions, see Installation. For CLI examples, see Examples.

If you plan to use Tadah! as a C++ library, we recommend completing this Quick-Start guide first; once you have the basics, head over to the API Examples and the full API reference.

Overview

To train a model, a config file (see Config File) and a dataset (see Dataset Format) are required.

For prediction, a potential (trained model) and a dataset are required. The potential file is an output of the training process. It is an ASCII text file containing key-value pairs that fully determine the potential.

Built-in Help

Tadah! provides some basic help. Try running it with the -h flag.

tadah -h

To read more about a particular subcommand, try:

tadah train -h

Training

To train a model, run the following command in a terminal:

tadah train -c config.train -v 2

-v 2: Enables verbosity INFO level, resulting in detailed output being printed to the screen during the training process.

This will train a model using energies only. To also train on forces, add the -F flag; for stresses, add the -S flag, or include FORCE true and STRESS true keys in the config file.

Note: Command line flags take precedence over those defined in the config file.

The config.train is type of a Config File. Training datasets, descriptors, cutoffs, model, and all parameters are specified in the config file. The output of this command is a trained model as a pot.tadah file.

Here is a minimal example of the config.train file:

DBFILE     db.train           # Ta dataset
INIT2B     true               # Use two-body descriptor
RCUT2B     6.5                # Cutoff distance
TYPE2B     D2_LJ Ta Ta        # Lennard-Jones descriptor
RCTYPE2B   Cut_Dummy          # No cutoff function
MODEL      M_KRR Kern_Linear  # Kernel Ridge Regression with a linear kernel

The dbfile key specifies the dataset to be used for training. The path can be either absolute or relative, and the relative path is resolved with respect to the working directory. INIT2B controls whether the two-body descriptor is used. TYPE2B, RCUT2B, and RCTYPE2B refer to the two-body type, cutoff distance, and cutoff type, respectively. Finally, the regression model is selected; in this case, it is Kernel Ridge Regression with a linear kernel for simplicity.

For a detailed explanation of KEYS and a general discussion about the configuration files, see Configuration File.

An example dataset can be downloaded from here.

Prediction

To predict energies on prediction dataset using an existing pot.tadah model, run:

tadah predict -p pot.tadah -d db.predict -v 2 -A

-v 2: Enables verbosity INFO level, resulting in detailed output being printed to the screen.
-A: Provides some basic analysis of the prediction, such as the RMSE (Root Mean Square Error) between true and predicted values.

To also predict forces, add the -F flag; for stresses, add the -S flag.

The output of the predict subcommand includes three files:

energy.pred: Three columns where the first column is an index, second is dataset energy/atom and the third column lists predicted energy/atom. The ordering follows the dataset order in the config file or with the -d flag.
forces.pred: Similar idea as above, but now forces are listed. The first row is the force on the first atom in the x-direction from the first dataset, the second row is the force on the first atom in the y-direction, and so on.
stress.pred: The first six rows list components of the stress tensor from the first configuration, followed by six components from the second configuration, and so on. The ordering is xx, xy, xz, yy, yz, zz.

Task File

A task file is a plain-text wrapper around Tadah!MLIP’s config syntax. You can use it for a single command (as an alternative to a CLI invocation) or chain several TASK blocks in the same file to execute them sequentially. All key–value pairs inside the file follow the normal config rules. The CONFIG key can be used to nest config files, allowing for reuse of settings or to keep a full training or prediction setup in a dedicated config file.

Basic rules

Each command starts with a line beginning with the keyword TASK followed by the sub-command name exactly as you would type it on the CLI.
The lines that follow are ordinary KEY/VALUE pairs.
- Keys written above all TASK blocks act as defaults for every task.
- Keys appearing inside a task block override global keys written above the first TASK line.
Boolean values are written in lowercase true / false.
There is a one-to-one mapping between CLI flags/options and configuration keys:

CLI (short/long)

Config-file key

-c, --config

CONFIG

-d, --dbfile

DBFILE

-p, --potential

POTENTIAL

-v, --verbose

VERBOSE

-A, --analytics

ANALYTICS

-F, --force

FORCE

…

…

In other words, anything you can set on the command line can also be set with an upper-case key in the task file.

Example

The small task file below trains a model and then immediately uses it for a prediction run:

# --- GLOBAL DEFAULTS --------------------------------------------------
DBFILE     db.train           # Shared unless overridden
VERBOSE    2                  # Print INFO messages

# --- FIRST TASK: TRAINING --------------------------------------------
TASK       train
  INIT2B     true               # Use two-body descriptor
  RCUT2B     6.5                # Cut-off distance
  TYPE2B     D2_LJ Ta Ta        # Lennard-Jones descriptor
  RCTYPE2B   Cut_Dummy          # No cut-off function
  MODEL      M_KRR Kern_Linear  # Kernel Ridge Regression (linear)
  VERBOSE    0                  # Override global: silent training

# --- SECOND TASK: PREDICTION -----------------------------------------
TASK       predict
  DBFILE     db.predict         # New dataset overrides the global one
  POTENTIAL  pot.tadah          # Same as CLI -p/--potential
  ANALYTICS  true               # Same as CLI -A
  VERBOSE    1                  # Show warnings during prediction

You would execute it with

tadah --task taskfile.txt

and Tadah!MLIP will first run the training step, produce pot.tadah, and then feed that potential into the subsequent prediction step-all in one go.

Hints

Use CONFIG other_file.txt inside a task block if you prefer to keep a full training or prediction setup in a dedicated config file.
Indexing of positional items (e.g. --index 1 -> INDEX 1) starts from 1.
Comments start with # and run to the end of the line, just like in ordinary config files.

Units and Tadah!

Tadah! is numerically unit-agnostic during training: whatever numbers appear in the dataset are learned as-is, so any internally consistent set of units (e.g. Hartree/Bohr, kJ mol⁻¹/nm, …) will yield a valid potential. In day-to-day work, however, most companion tools assume the “standard materials-science” system of

energy in electron-volts (eV)
length in Ångström (Å)
force in eV Å⁻¹
stress in eV Å⁻³.

These are therefore the units of choice.

Practical rules

Training on a dataset expressed in any units still works, provided all quantities in all structures use the same unit system.
The generated model keeps those units. When you couple it to LAMMPS you must pick a compatible units style—e.g. metal for eV / Å, real for kcal mol⁻¹ / Å, and so on.
Some post-processing and analysis helpers (notably the --analytics flag) currently assume eV / Å. Using a model trained in different units may trigger an error or, worse, silently produce meaningless numbers.

Important

Unless you have a compelling reason to deviate, use eV for energy and Å for length. That choice is tested across the entire Tadah! tool-chain and matches the metal unit style in LAMMPS out of the box.

Config File

The Tadah! configuration file manages the training process by specifying datasets, cutoff functions, radii, regression models, and descriptor choices. It supports two- and many-body descriptors. The file’s structure consists of KEY/VALUE pairs, with each pair on a separate line. The KEY is a string followed by its VALUE, and the format of the VALUE depends on the KEY.

For a list and explanation of supported KEY-VALUE pairs, see Configuration File.

Dataset Format

Datasets are included using the dbfile key in a Config file. More than one dataset can be specified.

There is no restriction on the number of atoms in different structures, so it’s okay to have a structure with 12 atoms and another one with 24 atoms in the same dataset.

The dataset has the following structure:

Comment line
eweight fweight sweight
ENERGY
cell vector a
cell vector b
cell vector c
stress tensor row s_1
stress tensor row s_2
stress tensor row s_3
Element px py pz fx fy fz
...
<blank line>

The first line is a comment line; it will be used as a label for a structure. Do not leave a blank line.
eweight, fweight, sweight are optional weighting parameters used for training. If this line is missing, it defaults to 1.0 1.0 1.0. Do not leave a blank line.
Each cell vector contains 3 numbers.
Each stress tensor row contains 3 numbers.
The number of lines beginning with Element is equal to the number of atoms in a structure.
Element is an atom label, a chemical element symbol, such as Fe, Ti.
px, py, and pz are Cartesian coordinates of the atom position.
fx, fy, and fz are components of the force vector acting on the atom.
Each configuration is separated by a blank line.

If forces and/or stresses are not available, they can be set to zero to satisfy the parser.

Also see Units and Tadah!.

Tadah! and LAMMPS

Once trained, the pot.tadah file can be used with LAMMPS like any other pair potential.

pair_style      tadah
pair_coeff      * * pot.tadah ELEMENT1 ELEMENT2

Here is an example LAMMPS script file and pot.file.

See Installing ML‑TADAH for LAMMPS (Tadah!LAMMPS) for Tadah!LAMMPS interface installation instructions.

Support for Multi-species Systems

Tadah! can generate machine-learned potentials for both single-component and multi-component systems. The interatomic potential adapts based on the species present in the training dataset, allowing for accurate modeling of different material compositions.