# Multi-CPR **Repository Path**: ma-lechi/Multi-CPR ## Basic Information - **Project Name**: Multi-CPR - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-03-30 - **Last Updated**: 2022-03-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval This repo contains the annotated datasets introduced in our resource paper Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval. [[Paper]](https://arxiv.org/pdf/2203.03367.pdf). ## Introduction Multi-CPR is a multi-domain Chinese dataset for passage retrieval. The dataset is collected from three different domains, including E-commerce, Entertainment video and Medical. Each dataset contains millions of passages and a certain amount of human annotated query-passage related pairs. Examples of annotated query-passage related pairs in three different domains: | Domain | Query | Passage | | ---- | ---- | ---- | | E-commerce | 尼康z62 (Nikon z62) |