# path_explain **Repository Path**: coracoding/path_explain ## Basic Information - **Project Name**: path_explain - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-03-24 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Path Explain A repository for explaining feature importances and feature interactions in deep neural networks using path attribution methods. This repository contains tools to interpret and explain machine learning models using [Integrated Gradients](https://arxiv.org/abs/1703.01365) and [Expected Gradients](https://arxiv.org/abs/1906.10670). In addition, it contains code to explain _interactions_ in deep networks using Integrated Hessians and Expected Hessians - methods that we introduced in our most recent paper: "Explaining Explanations: Axiomatic Feature Interactions for Deep Networks" (https://arxiv.org/abs/2002.04138). If you use our work to explain your networks, please cite this paper. ``` @misc{janizek2020explaining, title={Explaining Explanations: Axiomatic Feature Interactions for Deep Networks}, author={Joseph D. Janizek and Pascal Sturmfels and Su-In Lee}, year={2020}, eprint={2002.04138}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` This repository contains two important directories: the `path_explain` directory, which contains the packages used to interpret and explain machine learning models, and the `examples` directory, which contains many examples using the `path_explain` module to explain different models on different data types. ## Installation The easiest way to install this package is by using pip: ``` pip install path-explain ``` Alternatively, you can clone this repository to re-run and explore the examples provided. ## Comptability This package was written to support TensorFlow 2.0 (in eager execution mode) with Python 3. We have no current plans to support earlier versions of TensorFlow or Python. ## API Although we don't yet have formal API documentation, the underlying code does a pretty good job at explaining the API. See the code for generating [attributions](https://github.com/suinleelab/path_explain/blob/master/path_explain/explainers/path_explainer_tf.py#L302) and [interactions](https://github.com/suinleelab/path_explain/blob/master/path_explain/explainers/path_explainer_tf.py#L445) to better understand what the arguments to these functions mean. ## Examples For a simple, quick example to get started using this repository, see the `example_usage.ipynb` notebook in the top-level directory of this repository. It gives an overview of the functionality provided by this repository. For more advanced examples, keep reading on. ### Tabular Data using Expected Gradients and Expected Hessians Our repository can easily be adapted to explain attributions and interactions learned on tabular data. ```python # other import statements... from path_explain import PathExplainerTF, scatter_plot, summary_plot ### Code to train a model would go here x_train, y_train, x_test, y_test = datset() model = ... model.fit(x_train, y_train, ...) ### ### Generating attributions using expected gradients explainer = PathExplainerTF(model) attributions = explainer.attributions(inputs=x_test, baseline=x_train, batch_size=100, num_samples=200, use_expectation=True, output_indices=0) ### ### Generating interactions using expected hessians interactions = explainer.interactions(inputs=x_test, baseline=x_train, batch_size=100, num_samples=200, use_expectation=True, output_indices=0) ### ``` Once we've generated attributions and interactions, we can use the provided plotting modules to help visualize them. First we plot a summary of the top features and their attribution values: ```python ### First we need a list of strings denoting the name of each feature feature_names = ... ### summary_plot(attributions=attributions, feature_values=x_test, feature_names=feature_names, plot_top_k=10) ``` ![Heart Disease Summary Plot](/images/heart_disease.png) Second, we plot an interaction our model has learned between maximum achieved heart rate and gender: ```python scatter_plot(attributions=attributions, feature_values=x_test, feature_index='max. achieved heart rate', interactions=interactions, color_by='is male', feature_names=feature_names, scale_y_ind=True) ``` ![Interaction: Heart Rate and Gender](/images/max_heart_rate.png) The model used to generate the above interactions is a two layer neural network trained on the [UCI Heart Disease Dataset](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). Interactions learned by this model were featured in our paper. To learn more about this particular model and the experimental setup, see [the notebook used to train and explain the model](https://github.com/suinleelab/path_explain/blob/master/examples/tabular/heart_disease/attributions.ipynb). ### Explaining an NLP model using Integrated Gradients and Integrated Hessians As discussed in our paper, we can use Integrated Hessians to get interactions in language models. We explain a transformer from the [HuggingFace Transformers Repository](https://github.com/huggingface/transformers). ```python from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification, \ DistilBertConfig, glue_convert_examples_to_features, \ glue_processors # This is a custom explainer to explain huggingface models from path_explain import EmbeddingExplainerTF, text_plot, matrix_interaction_plot, bar_interaction_plot tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased') config = DistilBertConfig.from_pretrained('distilbert-base-uncased', num_labels=num_labels) model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', config=config) ### Some custom code to fine-tune the model on a sentiment analysis task... max_length = 128 data, info = tensorflow_datasets.load('glue/sst-2', with_info=True) train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length, 'sst-2) valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, max_length, 'sst-2') ... ### we won't include the whole fine-tuning code. See the HuggingFace repository for more. ### Here we define functions that represent two pieces of the model: ### embedding and prediction def embedding_model(batch_ids): batch_embedding = model.distilbert.embeddings(batch_ids) return batch_embedding def prediction_model(batch_embedding): # Note: this isn't exactly the right way to use the attention mask. # It should actually indicate which words are real words. This # makes the coding easier however, and the output is fairly similar, # so it suffices for this tutorial. attention_mask = tf.ones(batch_embedding.shape[:2]) attention_mask = tf.cast(attention_mask, dtype=tf.float32) head_mask = [None] * model.distilbert.num_hidden_layers transformer_output = model.distilbert.transformer([batch_embedding, attention_mask, head_mask], training=False)[0] pooled_output = transformer_output[:, 0] pooled_output = model.pre_classifier(pooled_output) logits = model.classifier(pooled_output) return logits ### ### We need some data to explain for batch in valid_dataset.take(1): batch_input = batch[0] batch_ids = batch_input['input_ids'] batch_embedding = embedding_model(batch_ids) baseline_ids = np.zeros((1, 128), dtype=np.int64) baseline_embedding = embedding_model(baseline_ids) ### ### We are finally ready to explain our model explainer = EmbeddingExplainerTF(prediction_model) attributions = explainer.attributions(inputs=batch_embedding, baseline=baseline_embedding, batch_size=32, num_samples=256, use_expectation=False, output_indices=1) ### ### For interactions, the hessian is rather large so we use a very small batch size interactions = explainer.interactions(inputs=batch_embedding, baseline=baseline_embedding, batch_size=1, num_samples=256, use_expectation=False, output_indices=1) ### ``` We can plot the learned attributions and interactions as follows. First we plot the attributions: ```python ### First we need to decode the tokens from the batch ids. batch_sentences = ... ### Doing so will depend on how you tokenized your model! text_plot(batch_sentences[0], attributions[0], include_legend=True) ``` ![Showing feature attributions in text](/images/little_to_love_text.png) Then we plot the interactions: ```python bar_interaction_plot(interactions[0], batch_sentences[0], top_k=5) ``` ![Showing feature interactions in text](/images/little_to_love_bar.png) If you would rather plot the full matrix of attributions rather than the top interactions in a bar plot, our package also supports this. First we show the attributions: ```python text_plot(batch_sentences[1], attributions[1], include_legend=True) ``` ![Showing additional attributions](/images/painfully_funny_text.png) And then we show the full interaction matrix. Here we've zeroed out the diagonals so you can better see the off-diagonal terms. ```python matrix_interaction_plot(interaction_list[1], token_list[1]) ``` ![Showing the full matrix of feature interactions](/images/painfully_funny_matrix.png) This example - interpreting [DistilBERT](https://arxiv.org/abs/1910.01108) - was also featured in our paper. You can examine the setup more [here](https://github.com/suinleelab/path_explain/tree/master/examples/natural_language/transformers). For more examples, see the `examples` directory in this repository.