```
You can then import and use the converter in Python.
```python
import os
from isolationforestonnx.isolation_forest_converter import IsolationForestConverter
# This is the same path used in the previous example showing how to save the model in Scala above.
path = '/user/testuser/isolationForestWriteTest'
# Get model data path
data_dir_path = path + '/data'
avro_model_file = os.listdir(data_dir_path)
model_file_path = data_dir_path + '/' + avro_model_file[0]
# Get model metadata file path
metadata_dir_path = path + '/metadata'
metadata_file = os.listdir(path + '/metadata/')
metadata_file_path = metadata_dir_path + '/' + metadata_file[0]
# Convert the model to ONNX format (this will return the ONNX model in memory)
converter = IsolationForestConverter(model_file_path, metadata_file_path)
onnx_model = converter.convert()
# Convert and save the model in ONNX format (this will save the ONNX model to disk)
onnx_model_path = '/user/testuser/isolationForestWriteTest.onnx'
converter.convert_and_save(onnx_model_path)
```
### Using the ONNX model for inference (example in Python)
```python
import numpy as np
import onnx
from onnxruntime import InferenceSession
# `onnx_model_path` the same path used above in the convert and save operation
onnx_model_path = '/user/testuser/isolationForestWriteTest.onnx'
dataset_path = 'isolation-forest-onnx/test/resources/shuttle.csv'
num_examples_to_print = 10
# Load data
input_data = np.loadtxt(dataset_path, delimiter=',')
num_features = input_data.shape[1] - 1
last_col_index = num_features
print(f'Number of features: {num_features}')
# The last column is the label column
input_dict = {'features': np.delete(input_data, last_col_index, 1).astype(dtype=np.float32)}
actual_labels = input_data[:, last_col_index]
# Load the ONNX model from local disk and do inference
onx = onnx.load(onnx_model_path)
sess = InferenceSession(onx.SerializeToString())
res = sess.run(None, input_dict)
# Print scores
actual_outlier_scores = res[0]
print('ONNX Converter outlier scores:')
print(np.transpose(actual_outlier_scores[:num_examples_to_print])[0])
```
## Performance and benchmarks
We benchmarked the standard Isolation Forest (`StandardIF`), Extended Isolation Forest at extension
level 0 (`ExtendedIF_0`), and the fully extended variant (`ExtendedIF_max`) against the Liu et al.
2008 paper results and the
[reference Python EIF implementation](https://github.com/sahandha/eif) (`Ref. Python`). All results
use 100 trees, 256 samples per tree, 10 trials with unique random seeds, and report the mean
± standard error of the mean. The `Ref. Python` columns show EIF results at the corresponding
extension level, not standard IF.
| Dataset | Dim | Model | AUROC | AUPRC | Liu et al. AUROC (IF) | Ref. Python AUROC (EIF) | Ref. Python AUPRC (EIF) |
|---|--:|---|---|---|--:|---|---|
| [Annthyroid](http://odds.cs.stonybrook.edu/annthyroid-dataset/) | 6 | StandardIF | 0.813 ± 0.004 | 0.312 ± 0.004 | 0.82 | - | - |
| | | ExtendedIF_0 | 0.813 ± 0.004 | 0.307 ± 0.004 | - | 0.822 ± 0.004 | 0.314 ± 0.007 |
| | | ExtendedIF_max | 0.646 ± 0.002 | 0.1791 ± 0.0017 | - | 0.651 ± 0.003 | 0.183 ± 0.005 |
| [Arrhythmia](http://odds.cs.stonybrook.edu/arrhythmia-dataset/) | 274 | StandardIF | 0.8064 ± 0.0019 | 0.494 ± 0.006 | 0.80 | - | - |
| | | ExtendedIF_0 | 0.802 ± 0.002 | 0.478 ± 0.004 | - | 0.796 ± 0.004 | 0.462 ± 0.005 |
| | | ExtendedIF_max | 0.810 ± 0.004 | 0.495 ± 0.005 | - | 0.803 ± 0.003 | 0.490 ± 0.004 |
| [Breastw](http://odds.cs.stonybrook.edu/breast-cancer-wisconsin-original-dataset/) | 9 | StandardIF | 0.9864 ± 0.0003 | 0.9684 ± 0.0008 | 0.99 | - | - |
| | | ExtendedIF_0 | 0.9878 ± 0.0003 | 0.9726 ± 0.0008 | - | 0.9873 ± 0.0005 | 0.9704 ± 0.0016 |
| | | ExtendedIF_max | 0.9835 ± 0.0004 | 0.9569 ± 0.0015 | - | 0.9841 ± 0.0006 | 0.959 ± 0.002 |
| [Cardio](http://odds.cs.stonybrook.edu/cardiotocography-dataset/) | 21 | StandardIF | 0.928 ± 0.002 | 0.565 ± 0.008 | - | - | - |
| | | ExtendedIF_0 | 0.921 ± 0.002 | 0.553 ± 0.009 | - | 0.918 ± 0.003 | 0.546 ± 0.013 |
| | | ExtendedIF_max | 0.933 ± 0.002 | 0.541 ± 0.006 | - | 0.931 ± 0.002 | 0.547 ± 0.009 |
| [ForestCover](http://odds.cs.stonybrook.edu/forestcovercovertype-dataset/) | 10 | StandardIF | 0.882 ± 0.006 | 0.051 ± 0.003 | 0.88 | - | - |
| | | ExtendedIF_0 | 0.865 ± 0.008 | 0.050 ± 0.005 | - | 0.872 ± 0.010 | 0.049 ± 0.004 |
| | | ExtendedIF_max | 0.688 ± 0.008 | 0.0138 ± 0.0003 | - | 0.662 ± 0.009 | 0.0129 ± 0.0004 |
| [Http (KDDCUP99)](http://odds.cs.stonybrook.edu/http-kddcup99-dataset/) | 3 | StandardIF | 0.99970 ± 0.00010 | 0.93 ± 0.02 | 1.00 | - | - |
| | | ExtendedIF_0 | 0.99410 ± 0.00010 | 0.392 ± 0.004 | - | 0.99390 ± 0.00010 | 0.379 ± 0.004 |
| | | ExtendedIF_max | 0.99410 ± 0.00010 | 0.379 ± 0.006 | - | 0.9939 ± 0.0003 | 0.371 ± 0.009 |
| [Ionosphere](http://odds.cs.stonybrook.edu/ionosphere-dataset/) | 33 | StandardIF | 0.8443 ± 0.0002 | 0.8014 ± 0.0003 | 0.85 | - | - |
| | | ExtendedIF_0 | 0.8568 ± 0.0006 | 0.8108 ± 0.0007 | - | 0.8556 ± 0.0016 | 0.808 ± 0.002 |
| | | ExtendedIF_max | 0.9075 ± 0.0002 | 0.8804 ± 0.0002 | - | 0.9061 ± 0.0014 | 0.876 ± 0.002 |
| [Mammography](http://odds.cs.stonybrook.edu/mammography-dataset/) | 6 | StandardIF | 0.8649 ± 0.0015 | 0.218 ± 0.007 | 0.86 | - | - |
| | | ExtendedIF_0 | 0.865 ± 0.002 | 0.220 ± 0.006 | - | 0.868 ± 0.002 | 0.229 ± 0.013 |
| | | ExtendedIF_max | 0.8630 ± 0.0010 | 0.190 ± 0.003 | - | 0.8639 ± 0.0016 | 0.184 ± 0.004 |
| [Mulcross](https://www.openml.org/d/40897) | 4 | StandardIF | 0.9910 ± 0.0009 | 0.852 ± 0.014 | 0.97 | - | - |
| | | ExtendedIF_0 | 0.938 ± 0.002 | 0.428 ± 0.009 | - | 0.960 ± 0.003 | 0.538 ± 0.017 |
| | | ExtendedIF_max | 0.940 ± 0.003 | 0.442 ± 0.011 | - | 0.941 ± 0.005 | 0.45 ± 0.02 |
| [Pima](http://odds.cs.stonybrook.edu/pima-indians-diabetes-dataset/) | 8 | StandardIF | 0.668 ± 0.004 | 0.490 ± 0.003 | 0.67 | - | - |
| | | ExtendedIF_0 | 0.667 ± 0.004 | 0.507 ± 0.004 | - | 0.675 ± 0.005 | 0.514 ± 0.005 |
| | | ExtendedIF_max | 0.644 ± 0.003 | 0.498 ± 0.002 | - | 0.640 ± 0.004 | 0.493 ± 0.004 |
| [Satellite](http://odds.cs.stonybrook.edu/satellite-dataset/) | 36 | StandardIF | 0.717 ± 0.008 | 0.672 ± 0.008 | 0.71 | - | - |
| | | ExtendedIF_0 | 0.715 ± 0.004 | 0.675 ± 0.003 | - | 0.700 ± 0.004 | 0.664 ± 0.006 |
| | | ExtendedIF_max | 0.725 ± 0.003 | 0.704 ± 0.004 | - | 0.740 ± 0.005 | 0.711 ± 0.005 |
| [Shuttle](http://odds.cs.stonybrook.edu/shuttle-dataset/) | 9 | StandardIF | 0.9971 ± 0.0002 | 0.9742 ± 0.0017 | 1.00 | - | - |
| | | ExtendedIF_0 | 0.9974 ± 0.0002 | 0.9789 ± 0.0014 | - | 0.99750 ± 0.00010 | 0.9805 ± 0.0010 |
| | | ExtendedIF_max | 0.9934 ± 0.0002 | 0.822 ± 0.004 | - | 0.9932 ± 0.0002 | 0.818 ± 0.003 |
| [Smtp (KDDCUP99)](http://odds.cs.stonybrook.edu/smtp-kddcup99-dataset/) | 3 | StandardIF | 0.9099 ± 0.0014 | 0.00450 ± 0.00010 | 0.88 | - | - |
| | | ExtendedIF_0 | 0.896 ± 0.002 | 0.00400 ± 0.00010 | - | 0.897 ± 0.002 | 0.00410 ± 0.00010 |
| | | ExtendedIF_max | 0.858 ± 0.003 | 0.0098 ± 0.0011 | - | 0.857 ± 0.003 | 0.014 ± 0.003 |
**Key observations:**
* **StandardIF** results are in agreement with the original Liu et al. paper.
* **ExtendedIF_max closely matches the reference Python EIF** across all 13 datasets.
* **EIF improves on high-dimensional datasets**, e.g., **ionosphere** (AUROC 0.907 vs 0.844, AUPRC 0.880 vs 0.801) and **satellite**
(AUROC 0.725 vs 0.717, AUPRC 0.704 vs 0.672). EIF underperforms IF on some datasets.
* **ExtendedIF_0 is not equivalent to StandardIF.** Both use axis-aligned splits, but standard IF
retries on constant features while EIF does not (matching the reference Python and C++
implementations). ExtendedIF_0 closely matches the reference Python EIF on all datasets.
## Copyright and license
Copyright 2019 LinkedIn Corporation
All Rights Reserved.
Licensed under the BSD 2-Clause License (the "License").
See [License](LICENSE) in the project root for license information.
## Contributing
If you would like to contribute to this project, please review the instructions [here](CONTRIBUTING.md).
## Citing this project
If you use this library in your research or project, please cite it using the metadata in
[CITATION.cff](CITATION.cff), or use the following BibTeX entry:
```bibtex
@software{isolation_forest,
author = {Verbus, James},
title = {isolation-forest},
year = {2019},
url = {https://github.com/linkedin/isolation-forest},
license = {BSD-2-Clause}
}
```
## References
* F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413–422.
* F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation-based anomaly detection,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, no. 1, p. 3, 2012.
* S. Hariri, M. Carrasco Kind, and R. J. Brunner, “Extended Isolation Forest,” IEEE Transactions on Knowledge and Data Engineering, 2019. [DOI:10.1109/TKDE.2019.2947676](https://doi.org/10.1109/TKDE.2019.2947676), [arXiv:1811.02141](https://arxiv.org/abs/1811.02141).
* S. Hariri, "eif: Extended Isolation Forest for Anomaly Detection," [https://github.com/sahandha/eif](https://github.com/sahandha/eif).
* Shebuti Rayana (2016). ODDS Library [http://odds.cs.stonybrook.edu]. Stony Brook, NY: Stony Brook University, Department of Computer Science.