# UQ-uMLIP

**Repository Path**: pfsuo/UQ-uMLIP

## Basic Information

- **Project Name**: UQ-uMLIP
- **Description**: UQ-uMLIP from github
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-01-26
- **Last Updated**: 2026-01-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Uncertainty Quantification for Universal Machine Learning Interatomic Potentials

This repository provides datasets, scripts, and trained interatomic potentials used in the study of **uncertainty quantification (UQ)** for **universal machine-learning interatomic potentials (uMLIPs)**.  
It accompanies the methodology proposed in:

> **K. Liu et al.**, *Heterogeneous ensemble enables a universal uncertainty metric for atomistic foundation models*,  
> **npj Computational Materials (2025)**.

The repository consists of two main parts:
1. Benchmarking uncertainty metrics on the **OMat24 dataset**
2. Application to **pure tungsten (W)** with mixed-precision ACE potentials

---

## 1. Benchmarking Uncertainty \( U \) on the OMat24 Dataset

This part evaluates the correlation between the proposed uncertainty metric \( U \) and prediction errors on the **OMat24 test set**, and determines optimal weights for a **heterogeneous ensemble** of uMLIPs.

### Directory Contents

#### `0-GRACE-2L-OMAT/` (and related folders)
Energy and force predictions produced by uMLIPs using the **ASE calculator interface**.

- A total of **21 uMLIPs** are included
- Due to GitHub file-size limitations, **only ~3% of the full dataset** used in the study is provided here
- The complete dataset is available upon request or via the original publication

#### `RMSE_of_Energy_Force_Predicted_by_UMLIP.ipynb`
Computes **energy and force RMSEs** of uMLIP predictions with respect to DFT reference values.

#### `Uncertainty Quantification.ipynb`
Computes the uncertainty metric \( U \) from uMLIP ensemble predictions.

- Demonstrates the correlation between \( U \) and prediction error
- Identifies the **optimal ensemble size (11 models)** for uncertainty estimation

---

## 2. Pure W Dataset and Potentials

This section contains datasets and trained interatomic potentials for **pure tungsten (W)**, used to demonstrate uncertainty-guided **mixed-precision training** of system-specific ACE models.

### `Pure_W/`

All data related to the pure W system.

---

### `W_uMLIP/`

Predictions from general-purpose uMLIPs:

- Energies and atomic forces for approximately **1000 atomic configurations**
- Each configuration is evaluated independently by **20 uMLIPs**
- Used for:
  - uncertainty estimation
  - ensemble statistics
  - comparison with DFT reference data

---

### `potentials/`

This directory contains **ACE interatomic potentials** trained on datasets with different levels of reference accuracy.

#### Naming convention

- The **first digit** of the filename indicates the **training-set precision**
- The **last digit** denotes the **replicate index**

| Label | Training dataset description |
|------:|-----------------------------|
| `0` | 100% DFT data (full DFT reference set) |
| `1` | Mixed-precision dataset with uncertainty cutoff \( U_c = 0.1 \), ~95% DFT |
| `2` | Mixed-precision dataset with \( U_c = 0.5 \), ~39% DFT |
| `3` | Mixed-precision dataset with \( U_c = 1.0 \), ~4% DFT |
| `4` | Mixed-precision dataset with \( U_c = 10.0 \), ~3% DFT |
| `5` | 100% uMLIP-generated data (no DFT) |

For each dataset, **five independently trained ACE potentials** were generated to assess training variability.  
The **last digit of the filename** denotes the **replicate index (0–4)**.

---

### `Testset/`

An independent W test set taken from:

> https://doi.org/10.1038/s41524-025-01599-1

It includes configurations involving:
- 2D grain boundaries
- 3D grain boundaries
- crack geometries
- liquid structures

---

### `W_DFT.pckl.gzip`

Compressed pickle file containing the **DFT reference dataset** for pure tungsten.

- DFT energies
- DFT atomic forces
- Corresponding atomic configurations

This dataset serves as the **ground truth** for validation and benchmarking.

---

### `Build_dataset_from_uMLIP_and_DFT.ipynb`

- Evaluates uncertainty \( U \) for configurations in the W dataset
- Constructs mixed-precision training sets using different uncertainty thresholds \( U_c \)
- Compares the accuracy of mixed datasets against the full DFT dataset

---

### `Test_W_ACE.ipynb`

Evaluates the performance of ACE potentials trained on different datasets using the independent W test set from  
https://doi.org/10.1038/s41524-025-01599-1.

---

## Notes

1. For the **MoNbTaW** system discussed in the article, the scripts for mixed-dataset construction and ACE testing are **identical** to those used for the W system.  
   The corresponding dataset is taken from:  
   https://doi.org/10.1016/j.ijplas.2025.104308

2. ACE potential testing follows scripts provided in:  
   https://github.com/Addition-P/Cu-W-ACE

3. If you use this repository or the mixed-precision dataset construction strategy, please cite:

```bibtex
@article{liu2025heterogeneous,
  title={Heterogeneous ensemble enables a universal uncertainty metric for atomistic foundation models},
  author={Liu, Kai and Wei, Zixiong and Gao, Wei and Dey, Poulumi and Sluiter, Marcel HF and Shuang, Fei},
  journal={npj Computational Materials},
  year={2025},
  publisher={Nature Publishing Group UK London}
}