# dpanalyse **Repository Path**: DingChangjie/dpanalyse ## Basic Information - **Project Name**: dpanalyse - **Description**: No description available - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2021-08-09 - **Last Updated**: 2021-09-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README **DpAnalyse Documentation**

DpAnalyse Documentation

Introduction

DpAnalyse is a python toolkit specially designed for automating those DeepPotential-related scientific calculation processes (aka "workflow"). In DpAnalyse you can easily customize calculations such as RDF (Radial Distribution Functions) or heat conductivity. Then by a simple keyboard command, a workflow will be automatically triggered. Corresponding results will be shown at the end of the workflow.

Note that DpAnalyse is initially designed not only for calculations, but more likely to be a multifunctional analysis tool. Currently a built-in visualization module is available, in which you may check the fitting error of force intuitively.

Dependencies and Installation

Create a fresh conda environment and install the Deepmd software series

It is strongly recommended to install DpAnalyse in a Conda environment, despite the toolkit itself is a pip package.
To make up all necessary dependencies before installation, you may make use of the environment.yml. First create an empty environment :
cd dpanalyse-kit
conda env create -f environment.yml
An environment named dpenv will be created. Note that this environment contains everthing (including deepmd-kit and dpdata, which means you can solely use these official packages in this environment), except for DpAnalyse itself.
Now it's time to install DpAnalyse via pip :
pip install .
Et Voilà. You may check the installation by :
conda activate dpenv
dpanalyse -h
If installed correctly, some messages similar to the following will emerge :

usage: DpAnalyse, the analysis package designed for Deep Potential [-h] {rdf,vis} ... optional arguments: -h, --help show this help message and exit Valid subcommands: {rdf,vis}

A fresh conda environment can cost lots of disk space. If you are already a user of Deepmd-kit, it would be more natural to install DpAnalyse in your current environment. Basically DpAnalyse depends on two packages : dpdata and airflow. You will need to install those before installing DpAnalyse :
conda install -c deepmodeling dpdata
conda install -c conda-forge airflow
Then install DpAnalyse via pip :
pip install .

Configure your Apache Airflow

DpAnalyse is based on Apache Airflow. No matter in which way you install DpAnalyse, you will have to configure your Airflow after installation.

Although you can follow Airflow's official manual, we recommend a more robust way to run airflow webserver and scheduler : systemd service.

After initialized airflow database and created user, please enter dpanalyse-kit/examples/airflow_systemd , you will find two provided templates, one for webserver and another for scheduler. Please do replacements according to those prompts, then make two softlinks from these two files (or just copy them) to /lib/systemd/system/. When completed, please execute the following commands :

sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver
sudo systemctl enable airflow-scheduler
sudo systemctl start airflow-webserver
sudo systemctl start airflow-scheduler

The commands above make airflow itself a service managed by systemd, so you can easily reload configuration or restart airflow by a series of systemctl commands. For example the following two commands will stop and restart airflow-webserver seperately :

sudo systemctl stop airflow-webserver
sudo systemctl restart airflow-webserver

You can now check the configuration at localhost:8088 (See the guidance below for a detailed explanation). For more information, please read Airflow's official manual.

A simple Guidance

Things are easy when you use DpAnalyse. All you need to provide are three files, no matter which task you would like to perform : RDF, heat conductivity (i.e. "kappa" in DpAnalyse) or just visualize fitting error. Currently let's take RDF calculation as an example :

  1. An Initial structure file of which you would like to calculate RDF. In examples folder, we provided Initial.lmp, which is a Te-Pb binary compound system.
  2. A frozen Deep Potential suitable for your initial structure. In examples folder, we provided graph.pb
  3. A user configuration file in json format, in which you specify all parameters needed in your calculation. In examples we provided user_config.json, where most parameters are self-explanatory because of their "Lammps style", and all "_comment" keys are used as human-readable comments.

Remember to create a new folder anywhere you like, and copy the files above into it. This folder will be DpAnalyse's working directory.

Now you're ready. please enter that working directory, activate the environment, and type the following command in your terminal:
dpanalyse rdf user_config.json -s Initial.lmp -p graph.pb
(here -s is short for --structure, -p for --potential)

Then DpAnalyse will start. It's convenient to monitor your workflow in Apache Airflow's GUI. To do so, open your browser and enter localhost:8088, because 8088 is the default port of DpAnalyse. If you're working on a remote server, you may need a tool to forward the remote server's 8088 port to a random port of your own computer (for example VsCode, MobaXTerm or even manually modify your router).

When the workflow ends, the structure of your working directory would be something like this :
├── af_startup.json
├── graph.pb
├── Initial.lmp
├── rdf
│ ├── graph.pb
│ ├── log.cite
│ ├── log.lammps
│ ├── process_rdf.in
│ ├── rdf11.data
│ ├── rdf12.data
│ ├── rdf22.data
│ └── structure.restart
├── relax
│ ├── graph.pb
│ ├── Initial.lmp
│ ├── log.cite
│ ├── log.lammps
│ ├── process_relax.in
│ ├── relaxed_structure.lmp
│ └── structure.restart
└── user_config.json
Two folders emphasized in bold are automatically created. They hold all the intermediate results of the calculation, making it possible for users to manually check the data. In our example, however, what you really need to concern are three rdf data files : rdf11.data, rdf12.data and rdf13.data. The numbers (1 and 2) are exactly what you specified in Initial.lmp, that's to say, 1 is Te and 2 is Pb. So if you would like to plot the Te-Te radial distribution function, rdf11.data is exactly your choice.

In Matplotlib you can easily plot these rdfs :

  1. RDF11 (Te-Te) RDF11
  2. RDF12 (Te-Pb) RDF12
  3. RDF22 (Pb-Pb) RDF22

Credits

This project is initially launched by the Deep Messing Around Group :
Ding Changjie (丁昌杰). Institute of Solid State Physics, Chinese Academy of Sciences
Xia Wenming (夏文明). Institute of Solid State Physics, Chinese Academy of Sciences
Qin Mi (秦密). Institute of Solid State Physics, Chinese Academy of Sciences
Zhang Pan (张攀). Wuhan University
We also give special thanks to the DpTechnology company and its committee's great dedication to the Deepmodeling Hackathon 2021.

You are welcomed to email us if you have any questions about this project (in Chinese or English) :
Ding2020@mail.ustc.edu.cn;
wmxia@mail.ustc.edu.cn;
qmxxf@mail.ustc.edu.cn