# Kernel-Based-Neural-Ranking-Models

**Repository Path**: thunlp/Kernel-Based-Neural-Ranking-Models

## Basic Information

- **Project Name**: Kernel-Based-Neural-Ranking-Models
- **Description**: No description available
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-05-29
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Kernel-Based-Neural-Ranking-Models

This is the repository of the codes of the **Neural Kernel Match IR** methods on the [MSMARCO Passage Reranking Task](http://www.msmarco.org/leaders.aspx).

| Rank (Jan 25th 2019) | MSMARCO Passage Re-Ranking         | Eval MRR@10 | Eval MRR@10 |
| -------------------- | ---------------------------------- | ----------- | ----------- |
| 4th                  | Neural Kernel Match IR (Conv-KNRM) | 27.12       | 29.02       |
| 5th                  | Neural Kernel Match IR (KNRM)      | 19.82       | 21.84       |

### Environment Requirement

- Python3

- PyTorch 0.4.1

### Data Download & Preparation

To download and prepare the training data, see [here](https://github.com/thunlp/Kernel-Based-Neural-Ranking-Models/tree/master/data).

### Model

The main codes and running instructions can be found [here](https://github.com/thunlp/Kernel-Based-Neural-Ranking-Models/tree/master/src).

The codes provide models including `KNRM`, `CKNRM`, `MAXPOOL`, `AVGPOOL`, `LSTM`.

- KNRM: [End-to-End Neural Ad-hoc Ranking with Kernel Pooling](https://arxiv.org/abs/1706.06613). Additionally introduced idf information.
- CKNRM: [Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search](https://dl.acm.org/citation.cfm?doid=3159652.3159659). Additionally introduced idf information.
- MAXPOOL: Calculate the **max** value on the query embedding vectors and document embedding vectors, then use cos_similarity to measure the similarity.
- AVGPOOL: Calculate the **mean** value on the query embedding vectors and document embedding vectors, then use cos_similarity to measure the similarity.
- LSTM: Encode the query embedding vectors and document embedding vectors using RNN, then use cos_similarity to measure the similarity.

### Reproduce the Leaderboard Result

- Neural Kernel Match IR (Conv-KNRM)

  This is the result on ensembling 8 Conv-KNRM models. The code for ensembling is located [here](https://github.com/thunlp/Kernel-Based-Neural-Ranking-Models/tree/master/src#ensembling-model).

  Checkpoints can be found [here](https://github.com/thunlp/Kernel-Based-Neural-Ranking-Models/tree/master/chkpt).


- Neural Kernel Match IR (KNRM)

  This is the result on the KNRM model with `glove.6b.300d` pretrained embedding. You can use the `-embed` option to load the pretrained embedding file:

  ```shell
  # eg.
  CUDA_VISIBLE_DEVICES=0 python main.py -train_data ../data/train.txt -val_data ../data/dev_part.txt -task KNRM -batch_size 64 -save_model CKNRM -vocab_size 315370 -embed ../chkpt/embed.npy
  ```

### Contact

If you have any questions, suggestions or bug reports, please email at qiaoyf96@gmail.com.