# customer-chatbot

**Repository Path**: mirrors_oneapi-src/customer-chatbot

## Basic Information

- **Project Name**: customer-chatbot
- **Description**: AI Starter Kit for Customer Chatbot using Intel® Extension for Pytorch
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-06-24
- **Last Updated**: 2026-02-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

PROJECT NOT UNDER ACTIVE MANAGEMENT

This project will no longer be maintained by Intel.

Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  

Intel no longer accepts patches to this project.

If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  

Contact: webadmin@linux.intel.com
# Customer Care Chatbot

## Introduction
Customers across various industries expect quick and accurate responses to their queries. Artificial Inteligence(AI)-Powered Customer Care Chatbots aim to provide this, but building efficient chatbots that can understand user intent and entities in real-time queries is challenging.

This workflow demonstrates how to construct an AI-Powered Customer Care Chatbot using [Intel's oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) to predict user intent and entities in queries. By leveraging Intel's hardware and optimized software, it accelerates the performance of the chatbot. This results in faster and more accurate responses, leading to improved customer satisfaction and more efficient customer support operations.

>Check out more workflow examples in the [Developer Catalog](https://developer.intel.com/aireferenceimplementations).

## Solution Technical Overview

This workflow provides a high-level technical overview of building an AI-Powered Customer Care Chatbot using [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html). Developers can understand why this workflow is relevant, its benefits, and what they will learn by trying it:

- **Relevance to Developers**:
    - This workflow is essential for Natural Language Processing (NLP) and chatbot developers.
    - Developers interested in harnessing Intel's hardware acceleration, especially [Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html), will find it valuable.

- **Chosen Workflow**:
    - The workflow covers the complete chatbot lifecycle, from training to real-time prediction.
    - It emphasizes integrating Intel's technologies for optimized Machine Learning (ML).

- **What Developers Will Learn**:
    - Setting up an optimized environment for Intel®-accelerated ML.
    - Training NLP chatbots for intent classification and named entity recognition.
    - Leveraging Intel's hardware acceleration for efficient model training and inference.
    - Constructing chatbots that deliver fast and precise responses to customer queries.
    - Hands-on experience with [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) and PyTorch*  .

This workflow equips developers with the knowledge and tools to create high-performance AI-Powered Customer Care Chatbots, enhancing customer service across various industries.

>For more details, visit the [AI-Powered Customer Care Chatbots](https://github.com/oneapi-src/customer-chatbot) GitHub repository.

## Solution Technical Details
In this section, we describe the code base and how to replicate the results. The included code demonstrates a complete framework for

  1. Setting up a virtual environment for Intel®-accelerated ML
  2. Training an NLP AI-Powered Customer Care Chatbot for intent classification and name entity recognition using PyTorch*/[Intel® Extension for PyTorch*](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html) 
  3. Predicting from the trained model on new data using PyTorch*/[Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html)

### **Use Case E2E flow**

![Use_case_flow](assets/conversationai-e2e-flow.PNG)

#### *Intel® Extension for PyTorch**  

The [Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html)  extends PyTorch* with optimizations for an extra performance boost on Intel® hardware. Most of the optimizations will be included in stock PyTorch* releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch* on Intel® hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).

#### *Intel® Neural Compressor*

[Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) (INC) is an open-source Python* library designed to help you quickly deploy low-precision inference solutions on popular deep-learning frameworks such as TensorFlow*, PyTorch* , MXNet*, and ONNX* (Open Neural Network Exchange) runtime. The tool automatically optimizes low-precision recipes for deep-learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria.

## Validated Hardware Details
There are workflow-specific hardware and software setup requirements depending on
how the workflow is run.

| Recommended Hardware                                            | Precision
| --------------------------------------------------------------- | ------------
| CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher | FP32, INT8
| RAM: 187 GB                                                     |
| Recommended Free Disk Space: 20 GB or more                      |

#### Minimal Requirements
* RAM: 16 GB total memory
* CPUs: 4
* Storage: 20GB
* Operating system: Ubuntu\* 22.04 LTS


## How it Works
Intel® oneAPI is used to accelerate results for critical low-latency applications. It provides the capability to reuse the code present in different languages so that hardware utilization is optimized to provide these results.

To reproduce the results in this repository, we describe the following tasks

1. How to create an execution environment which utilizes Intel® versions of libraries
2. How to run the code to benchmark model training
3. How to run the code to benchmark model inference
4. How to quantize trained models using INC 
5. How to benchmark concurrency

## Get Started
Start by **defining an environment variable** that will store the workspace path, this can be an existing directory or one to be created in further steps. This ENVVAR will be used for all the commands executed using absolute paths. 

[//]: # (capture: baremetal)
```bash
export WORKSPACE=$PWD/customer-chatbot
```

Set the following environment variables:

[//]: # (capture: baremetal)
```bash
export DATA_DIR=$WORKSPACE/data
export OUTPUT_DIR=$WORKSPACE/output
export CONFIG_DIR=$WORKSPACE/config
```

### Download the Workflow Repository
Create a working directory for the workflow and clone the [Main Repository](https://github.com/oneapi-src/frameworks.ai.platform.sample-apps.customer-chatbot) repository into your working
directory.

```bash
mkdir -p $WORKSPACE && cd $WORKSPACE
git clone https://github.com/oneapi-src/customer-chatbot.git $WORKSPACE
```

Create following directories.

[//]: # (capture: baremetal)
```bash
mkdir -p $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs
```

### Set Up Conda*
1. Download the appropriate Miniconda Installer for Linux.

    ```bash
    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    ``` 

2. In your terminal window, run.

    ```bash
    bash Miniconda3-latest-Linux-x86_64.sh
    ```

3. Delete downloaded file.

    ```bash
    rm Miniconda3-latest-Linux-x86_64.sh
    ```

To learn more about Conda* installation, see the [Conda* Linux installation instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html).

### Set Up Environment
Before creating the environments, if you don't already have Anaconda*, install and setup Anaconda* for Linux following this [link](https://www.anaconda.com/products/distribution).

Install and set the libmamba solver as default solver. Run the following commands:

```bash
# If the user wants to set libmamba as conda's default solver 
# for base environment, run the following two lines; if not
# continue executing from to line number 3. Newer versions of 
# Anaconda have libmamba already installed and will be the default
# solver in September 2023.
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
```
The `$WORKSPACE/env/intel_env.yml` file contains all the dependencies to create the intel environment necesary for runnig the workflow. 

Execute the next command to create the Conda* environment.

```bash
conda env create -f $WORKSPACE/env/intel_env.yml
conda activate customer_chatbot_intel
```

Environment setup is required only once. This step does not cleanup the existing environment with the same name hence we need to make sure there is no Conda* environment with the same name. During this setup, `customer_chatbot_intel` Conda* environment will be created with the dependencies listed in the YAML configuration.

#### For Concurrency Benchmarking

For running concurrency benchmarking we will need to install additional dependancies

Apache* Utils will also be needed:
```bash
sudo apt-get install apache2-utils git
```
Model Archiver will be used to produce `.mar` files (this file can then be redistributed and served by anyone using TorchServe*):

[//]: # (capture: baremetal)
```bash
python -m pip install torch-model-archiver captum
```

You then need to clone the TorchServe* repo: 

[//]: # (capture: baremetal)
```bash
export TORCH_SERVE_DIR=$WORKSPACE/src/concurrency_benchmarking/serve
git clone https://github.com/pytorch/serve.git --branch v0.9.0 $TORCH_SERVE_DIR
```

Once the repo has been cloned follow the next steps or follow the steps described at [Quick start with TorchServe*](https://github.com/pytorch/serve#-quick-start-with-torchserve):

[//]: # (capture: baremetal)
```bash
cd $TORCH_SERVE_DIR
python ./ts_scripts/install_dependencies.py
python -m pip install torch==2.1.1 torchserve==0.9.0 torch-model-archiver==0.9.0 torch-workflow-archiver==0.2.11 click-config-file==0.6.0
```

After installing TorchServe*, Apache* Bench is needed in order to run the [benchmarks](https://github.com/pytorch/serve/tree/master/benchmarks#benchmarking-with-apache-bench). Follow the next instructions to install pip dependencies:

[//]: # (capture: baremetal)
```bash
cd $TORCH_SERVE_DIR/benchmarks/
python -m pip install -r requirements-ab.txt
```

### Download the Dataset
The dataset used for this demo is the commonly used Airline Travel Information Systems (ATIS) dataset, which consists of ~5000 utterances of customer requests for flight related details.  Each of these utterances is annotated with the intent of the query and the entities involved within the query. For example, the phrase

> I want to fly from Baltimore to Dallas round trip.

would be classified with the intent of `atis_flight`, corresponding to a flight reservation and the entities would be `Baltimore (fromloc.city_name)`, `Dallas (toloc.city_name)`, and `round_trip (round_trip)`.  

Preprocessing code and data for this repository were originally sourced from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2.

> *Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.*

The benchmarking scripts expect all of the data files to be present in `data/atis-2/` directory.

Create `atis-2/` directory if not present in `$DATA_DIR`.

[//]: # (capture: baremetal)
```bash
mkdir -p $DATA_DIR/atis-2/
```

To setup the data for benchmarking under these requirements, do the following:

1. Download all of the files from https://github.com/sz128/slot_filling_and_intent_detection_of_SLU/tree/master/data/atis-2 sand save them into the `atis-2` directory.
> *Please see this data set's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.*


[//]: # (capture: baremetal)
```
cd $DATA_DIR/atis-2/

wget  https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/train
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/test
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/valid
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.intent
wget https://raw.githubusercontent.com/sz128/slot_filling_and_intent_detection_of_SLU/master/data/atis-2/vocab.slot
```
	
2. Combine the `atis-2/train` and `atis-2/valid` files into one called `atis-2/train_all`.  In Linux, this can be done from the current directory using

[//]: # (capture: baremetal)
```shell
cat train valid > train_all
cd $WORKSPACE
```

## Supported Runtime Environment
You can execute the references pipelines using the following environments:
* Bare Metal
* Jupyter Notebook

---

### Run Using Bare Metal
Follow these instructions to set up and run this workflow on your own development system.

#### Set Up System Software
Our examples use the ``conda`` package and environment on your local computer. If you don't already have ``conda`` installed, go to [Set Up Conda*](#set-up-conda) or see the [Conda* Linux installation instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html).

#### Run Workflow
To run the benchmarks on a selected configuration, the corresponding environment needs to be setup and activated. For example, to benchmark the model training with ***Intel® oneAPI technologies***, the environment `customer_chatbot_intel` should be activated using:

```bash
conda activate customer_chatbot_intel
```

##### **Running the Benchmarks for Training**

Benchmarking for training can be done using the python script `run_training.py`.

The script *reads and preprocesses the data*, *trains a joint classification and entity recognition model*, and *predicts on unseen test data* using the trained model, while also reporting on the execution time for these 3 steps.  ***Optionally, the script can also save the trained model weights, which is necessary to run the inference benchmarks***.

The run benchmark script takes the following arguments:

```shell
usage: run_training.py [-h] [-l LOGFILE] [-s SAVE_MODEL_DIR] -d DATASET_DIR [--save_onnx]

optional arguments:
  -h, --help            show this help message and exit
  -l LOGFILE, --logfile LOGFILE
                        log file to output benchmarking results to
  -s SAVE_MODEL_DIR, --save_model_dir SAVE_MODEL_DIR
                        directory to save model under
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  --save_onnx           also export an ONNX model
```

Execute `run_training.py` script as follows:

[//]: # (capture: baremetal)
```
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/
```

The saved model weights are independent of the technology used. The model is trained using a Bidirectional Encoder Representations from Transformers (BERT) pretrained model with sequence_length = 64, batch_size = 20, epochs = 3. These can be changed within the script.

**Note:**
[Intel® Extension for PyTorch*](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html) contains many environment specific configuration parameters which can be set using the included CPU launcher tool. Further details for this can be found at https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/performance_tuning/launch_script.html.  While the above command sets many parameters automatically, for our specific environment (D4v5), we benchmark with the following command.

[//]: # (capture: baremetal)
```shell
OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_training.py --logfile $OUTPUT_DIR/logs/intel_train.log -s $OUTPUT_DIR/saved_models/intel -d $DATA_DIR/atis-2/
```

##### **Running the Benchmarks for Inference**

Benchmarking for inference for PyTorch* (.pt) models can be done using the python script `run_inference.py`.

`run_inference.py` : runs inference benchmarks using models optimized by [Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html).

The `run_inference.py` script takes the following arguments:

```shell
usage: run_inference.py [-h] -s SAVED_MODEL_DIR [--is_jit] [--is_inc_int8] [-b BATCH_SIZE] -d
                        DATASET_DIR [-l LENGTH] [--logfile LOGFILE] [-n N_RUNS]

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
                        directory of saved model to benchmark.
  --is_jit              if the model is torchscript. defaults to False.
  --is_inc_int8         saved model dir is a quantized int8 model. defaults to False.
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        batch size to use. defaults to 200.
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  -l LENGTH, --length LENGTH
                        sequence length to use. defaults to 512.
  --logfile LOGFILE     logfile to use.
  -n N_RUNS, --n_runs N_RUNS
                        number of trials to test. defaults to 100.
```

As attention based models are independent of the sequence length, we can test on different sequence lengths without introducing new parameters.  Both scripts run `n` times and prints the average time taken to call the predict on a batch of size `b` with sequence lenght `l`.

To run benchmarks on the oneAPI PyTorch* execution engine, use:

[//]: # (capture: baremetal)
```shell
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
```

**Note:**
[Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html) contains many environment specific configuration parameters which can be set using the included CPU launcher tool.  Further details for this can be found at https://intel.github.io/intel-extension-for-pytorch/1.11.200/tutorials/performance_tuning/launch_script.html.  While the above command sets many parameters automatically, for our specific environment (D4v5), we benchmark with the following command.

[//]: # (capture: baremetal)
```shell
OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 200 --length 512 --n_runs 5 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

OMP_NUM_THREADS=4 KMP_BLOCKTIME=50 python -m intel_extension_for_pytorch.cpu.launch --disable_numactl $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel --batch_size 1 --length 512 --n_runs 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/
```

##### **Quantization**

Quantization is the practice of converting the FP32 weights in deep neural networks to a 
lower precision, such as INT8 in order **to accelerate computation time and reduce storage
space of trained models**.  This may be useful if latency and throughput are critical.  Intel® 
offers multiple algorithms and packages for quantizing trained models. In this repo, we 
include scripts to quantize the AI Chatbot model using [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html).

##### Intel® Neural Compressor Quantization

A trained model from the `run_training.py` script above can be quantized 
using [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html) 
through the `run_quantize_inc.py` script.  This converts the model from FP32 to INT8 while trying to 
maintain a specified level of accuracy specified via a `config.yaml` file. A simple `config.yaml` has been 
provided for basic accuracy aware quantization though several further options exist and can be explored in the link above.


```shell
usage: run_quantize_inc.py [-h] -s SAVED_MODEL -o OUTPUT_DIR [-l LENGTH] [-q QUANT_SAMPLES] -c INC_CONFIG -d DATASET_DIR

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL, --saved_model SAVED_MODEL
                        saved pytorch (.pt) model to quantize.
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        directory to save quantized model to.
  -l LENGTH, --length LENGTH
                        sequence length to use. defaults to 512.
  -q QUANT_SAMPLES, --quant_samples QUANT_SAMPLES
                        number of samples to use for quantization. defaults to 100.
  -c INC_CONFIG, --inc_config INC_CONFIG
                        INC conf yaml.
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
```

A workflow of "training -> INC quantization -> inference" benchmarking may look like

[//]: # (capture: baremetal)
```shell
# run training, outputs as $OUTPUT_DIR/saved_models/intel/convai.pt
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_training.py -s $OUTPUT_DIR/saved_models/intel --logfile $OUTPUT_DIR/logs/intel_train.log -d $DATA_DIR/atis-2/

# quantize the trained model, outputs into the $OUTPUT_DIR/saved_models/intel_int8/best_model.pt directory
python $WORKSPACE/src/run_quantize_inc.py -s $OUTPUT_DIR/saved_models/intel/convai.pt -o $OUTPUT_DIR/saved_models/intel_int8/ -c $CONFIG_DIR/config.yml -d $DATA_DIR/atis-2/

# benchmark the non-quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel/ -b 1 -n 1000 --logfile $OUTPUT_DIR/logs/intel_bench.log -d $DATA_DIR/atis-2/

# benchmark the quantized model using intel
python -m intel_extension_for_pytorch.cpu.launch $WORKSPACE/src/run_inference.py -s $OUTPUT_DIR/saved_models/intel_int8/ -b 1 -n 1000 --is_inc_int8 --logfile $OUTPUT_DIR/logs/intel_bench_quant.log -d $DATA_DIR/atis-2/
```

##### Concurrency

A critical aspect of good AI Chatbots is their ability to quickly respond to multiple independent customer queries.  From a technical perspective, this is a question of how well these models can be run to handle concurrency on a single server.

In order to benchmark this, we need to do the following

1. Package trained/optimized models using torch-model-archiver
2. Deploy a trained model to use TorchServe*
3. Run the TorchServe* benchmarks using apache bench
4. Collect the reports of the TorchServe* benchmark


##### **Preparing the model for benchmarking**

#### **1. Convert the model to TorchScript***

To use the trained models in torch-serve, they first need to be converted to a TorchScript* model.  To do this, use the `convert_jit.py` script

```shell
usage: convert_jit.py [-h] -s SAVED_MODEL_DIR -o OUTPUT_MODEL [--is_inc_int8]

optional arguments:
  -h, --help            show this help message and exit
  -s SAVED_MODEL_DIR, --saved_model_dir SAVED_MODEL_DIR
                        directory of saved model to benchmark.
  -o OUTPUT_MODEL, --output_model OUTPUT_MODEL
                        saved torchscript (.pt) model
  -d DATASET_DIR, --dataset_dir DATASET_DIR
                        directory to dataset
  --is_inc_int8         saved model dir is a quantized int8 model. defaults to False.
```

If the model is not quantized using INC and assuming the saved model is saved in the `$OUTPUT_DIR/saved_models/intel` directory:

[//]: # (capture: baremetal)
```shell
python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel -o $OUTPUT_DIR/saved_models/intel/convai_jit.pt -d $DATA_DIR/atis-2/
```
which will convert the saved model into a TorchScript* model called `convai_jit.pt`.

If the model is quantized using INC, we need to specify the flag `--is_inc_int8` and then use:

[//]: # (capture: baremetal)
```shell
python $WORKSPACE/src/convert_jit.py -s $OUTPUT_DIR/saved_models/intel_int8 -o $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --is_inc_int8 -d $DATA_DIR/atis-2/
```

#### **2. Package the TorchScript* model using torch-model-archiver**

After creating a TorchScript* model, the trained model needs to be packaged to a `.mar` file using torch-model-archiver.  Assuming the serialized model is saved as `convai_jit.pt` in the current directory, a sample command to do this is:

[//]: # (capture: baremetal)
```shell
torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py
```
Or if working with the quantized model, use:

[//]: # (capture: baremetal)
```bash
torch-model-archiver --model-name convai --export-path $OUTPUT_DIR/saved_models/intel_int8 --version 1.0 --serialized-file $OUTPUT_DIR/saved_models/intel_int8/convai_jit.pt --handler $WORKSPACE/src/concurrency_benchmarking/custom_handler.py
```

This will create a file called `convai.mar` which can be used to deploy to TorchServe*.  

### Benchmarking using the TorchServe*-benchmarking script

To benchmark this model using the [TorchServe* benchmarking tools](https://github.com/pytorch/serve/tree/master/benchmarks#benchmarking-with-apache-bench),

1. Copy the `config.json` file and the `config.properties` file into the cloned `serve/benchmarks` directory:
    
    [//]: # (capture: baremetal)
    ```shell
    cp $CONFIG_DIR/config.properties $TORCH_SERVE_DIR/benchmarks/config.properties
    cp $CONFIG_DIR/config.json $TORCH_SERVE_DIR/benchmarks/config.json
    ```

2. Modify the [config.json](#configjson) and [config.properties](#configproperties) to point to the relevant files and the desired experimental parameters, e.g.:

    [//]: # (capture: baremetal)
    ```bash
    sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json
    sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json
    sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json
    ```
    Or if using the quantized model:

    ```bash
    sed -i "s|file:///PATH_TO_MAR|file://${OUTPUT_DIR}/saved_models/intel_int8/convai.mar|" $TORCH_SERVE_DIR/benchmarks/config.json
    sed -i "s|PATH_TO_INPUT_FILE|${WORKSPACE}/src/concurrency_benchmarking/input_data.json|" $TORCH_SERVE_DIR/benchmarks/config.json
    sed -i "s|PATH_TO_CONFIG_PROPERTIES|${WORKSPACE}/src/concurrency_benchmarking/serve/benchmarks/config.properties|" $TORCH_SERVE_DIR/benchmarks/config.json
    ```
    We included a simple `input_data.json` file to provide a test input for running the benchmarks.
3. Run the benchmark using:
   
    [//]: # (capture: baremetal)
    ```shell
    PATH=$CONDA_PREFIX/bin/:$PATH python $TORCH_SERVE_DIR/benchmarks/benchmark-ab.py --config $TORCH_SERVE_DIR/benchmarks/config.json
    ```
The reports should be stored in the temporary directory `/tmp/benchmark`.  Measurements for latency and throughput can be found in the file `/tmp/benchmark/ab_report.csv`.

#### config.json

The available fields for the `config.json` file, as an example, are: 

```python
{'url': "file:///PATH_TO_MAR",
'gpus': '',
'exec_env': 'local',
'batch_size': 1,
'batch_delay': 200,
'workers': 1,
'concurrency': 10,
'requests': 100,
'input': 'PATH_TO_INPUT',
'content_type': 'application/json',
'image': '',
'docker_runtime': '',
'backend_profiling': False,
'config_properties': 'PATH_TO_CONFIG_PROPERTIES',
'inference_model_url': 'predictions/benchmark',
```

### config.properties

The `config.properties` file adjusts the parameters for the TorchServe* server.

The two most important fields are to either enable or disable Intel® Extension for PyTorch*
Extensions using

```shell
ipex_enable=true
cpu_launcher_enable=true
```

#### Clean Up Bare Metal

Follow these steps to restore your ``$WORKSPACE`` directory to an initial step. Please note that all downloaded dataset files, Conda* environment created, and logs created by workflow will be deleted. Before executing next steps back up your important files.


```bash
conda deactivate
conda remove --name customer_chatbot_intel --all -y
```

```bash
rm -rf $OUTPUT_DIR/saved_models/ $DATA_DIR/atis-2/ $OUTPUT_DIR/logs $TORCH_SERVE_DIR
```

### Run using Jupyter Notebook
Follow the instructions described on [Get Started](#get-started) to set required environment variables.

Execute [Set Up Conda*](#set-up-conda) and [Set Up environment](#set-up-environment) steps.

To be able to run GettingStarted.ipynb the Conda* environment must install additional packages:
```bash
conda activate customer_chatbot_intel
conda install -c intel nb_conda_kernels jupyter notebook -y
cd $WORKSPACE
jupyter notebook
```
Open Jupyter Notebook in a web browser, select GettingStarted.ipynb and select conda env:customer_chatbot_intel as the jupyter kernel. Now you can follow the notebook's instructions step by step.

#### Clean Up Jupyer Notebook
To clean Jupyter Notebook follow the instructions described in [Clean Up Bare Metal](#clean-up-bare-metal).

### Expected Output
Training output is stored in `$OUTPUT_DIR/logs` directory. You can see information on training time and training loss and accuracy per epoch. The final information should look similarly to below:
```bash
INFO - =======> Test Accuracy on NER : 0.94
INFO - =======> Test Accuracy on CLS : 0.91
INFO - =======> Training Time : 309.539 secs
INFO - =======> Inference Time : 5.648 secs
INFO - =======> Total Time: 315.187 secs
```

Benchmark results are stored in the `$OUTPUT_DIR/logs` directory. It includes a progress bar of the benchmark progress followed by the average time per batch like below:
```bash
INFO - Avg time per batch : 19.659 s
```

Quantization results are stored in the `$OUTPUT_DIR/logs` directory. It includes statistics of the quitized models accuracy and latency compared to the baseline model such as below:
```bash
[INFO] FP32 baseline is: [Accuracy: 0.9443, Duration (seconds): 15.7158]
[INFO] |******Mixed Precision Statistics******|
[INFO] +-----------------+----------+---------+
[INFO] |     Op Type     |  Total   |   INT8  |
[INFO] +-----------------+----------+---------+
[INFO] |    Embedding    |    3     |    3    |
[INFO] |      Linear     |    75    |    75   |
[INFO] +-----------------+----------+---------+
[INFO] Pass quantize model elapsed time: 1495.84 ms
[INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.9302|0.9443, Duration (seconds) (int8|fp32): 7.5332|15.7158], Best tune result is: [Accuracy: 0.9302, Duration (seconds): 7.5332]
[INFO] |**********************Tune Result Statistics**********************|
[INFO] +--------------------+----------+---------------+------------------+
[INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |
[INFO] +--------------------+----------+---------------+------------------+
[INFO] |      Accuracy      | 0.9443   |    0.9302     |     0.9302       |
[INFO] | Duration (seconds) | 15.7158  |    7.5332     |     7.5332       |
[INFO] +--------------------+----------+---------------+------------------+
```


## Summary and Next Steps
In this example, we focus on leveraging the [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html) on the task of training and deploying an accurate AI system to predict the Intent and Entities of a user query.

Using Intel® technologies can result in more efficient model experimentation and more robust deployed AI solutions, even when using state-of-the-art Deep Learning based NLP models.


## Learn More
For more information about or to read about other relevant workflow examples, see these guides and software resources:

- [PyTorch*](https://pytorch.org/get-started/locally/)
- [TorchServe* benchmarking tools](https://github.com/pytorch/serve/tree/master/benchmarks#benchmarking-with-apache-bench)
- [Conda* Linux installation instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html)
- [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html)
- [Intel® oneAPI AI Analytics Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html)
- [Intel® Extension for PyTorch* ](https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html)
- [Intel® Neural Compressor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/neural-compressor.html)
- [Intel® Distribution for Python*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html)

## Support
If you have questions or issues about this use case, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.

## Appendix

\*Other names and brands that may be claimed as the property of others. [Trademarks](https://www.intel.com/content/www/us/en/legal/trademarks.html).

To the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.
Intel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.