Dataset Card for GLUE

annotations_creators

language_creators

language

license

multilinguality

size_categories

source_datasets

task_categories

task_ids

paperswithcode_id

pretty_name

config_names

tags

dataset_info

configs

train-eval-index

other

en

other

monolingual

10K<n<100K

original

text-classification

acceptability-classification

natural-language-inference

semantic-similarity-scoring

sentiment-classification

text-scoring

glue

GLUE (General Language Understanding Evaluation benchmark)

ax

cola

mnli

mnli_matched

mnli_mismatched

mrpc

qnli

qqp

rte

sst2

stsb

wnli

qa-nli

coreference-nli

paraphrase-identification

config_name

features

splits

download_size

dataset_size

ax

name	dtype
premise	string

name	dtype
hypothesis	string

name

dtype

label

class_label

names

0	1	2
entailment	neutral	contradiction

name	dtype
idx	int32

name	num_bytes	num_examples
test	237694	1104

80767

237694

config_name

features

splits

download_size

dataset_size

cola

name	dtype
sentence	string

name

dtype

label

class_label

names

0	1
unacceptable	acceptable

name	dtype
idx	int32

name	num_bytes	num_examples
train	484869	8551

name	num_bytes	num_examples
validation	60322	1043

name	num_bytes	num_examples
test	60513	1063

326394

605704

config_name

features

splits

download_size

dataset_size

mnli

name	dtype
premise	string

name	dtype
hypothesis	string

name

dtype

label

class_label

names

0	1	2
entailment	neutral	contradiction

name	dtype
idx	int32

name	num_bytes	num_examples
train	74619646	392702

name	num_bytes	num_examples
validation_matched	1833783	9815

name	num_bytes	num_examples
validation_mismatched	1949231	9832

name	num_bytes	num_examples
test_matched	1848654	9796

name	num_bytes	num_examples
test_mismatched	1950703	9847

57168425

82202017

config_name

features

splits

download_size

dataset_size

mnli_matched

name	dtype
premise	string

name	dtype
hypothesis	string

name

dtype

label

class_label

names

0	1	2
entailment	neutral	contradiction

name	dtype
idx	int32

name	num_bytes	num_examples
validation	1833783	9815

name	num_bytes	num_examples
test	1848654	9796

2435055

3682437

config_name

features

splits

download_size

dataset_size

mnli_mismatched

name	dtype
premise	string

name	dtype
hypothesis	string

name

dtype

label

class_label

names

0	1	2
entailment	neutral	contradiction

name	dtype
idx	int32

name	num_bytes	num_examples
validation	1949231	9832

name	num_bytes	num_examples
test	1950703	9847

2509009

3899934

config_name

features

splits

download_size

dataset_size

mrpc

name	dtype
sentence1	string

name	dtype
sentence2	string

name

dtype

label

class_label

names

0	1
not_equivalent	equivalent

name	dtype
idx	int32

name	num_bytes	num_examples
train	943843	3668

name	num_bytes	num_examples
validation	105879	408

name	num_bytes	num_examples
test	442410	1725

1033400

1492132

config_name

features

splits

download_size

dataset_size

qnli

name	dtype
question	string

name	dtype
sentence	string

name

dtype

label

class_label

names

0	1
entailment	not_entailment

name	dtype
idx	int32

name	num_bytes	num_examples
train	25612443	104743

name	num_bytes	num_examples
validation	1368304	5463

name	num_bytes	num_examples
test	1373093	5463

19278324

28353840

config_name

features

splits

download_size

dataset_size

qqp

name	dtype
question1	string

name	dtype
question2	string

name

dtype

label

class_label

names

0	1
not_duplicate	duplicate

name	dtype
idx	int32

name	num_bytes	num_examples
train	50900820	363846

name	num_bytes	num_examples
validation	5653754	40430

name	num_bytes	num_examples
test	55171111	390965

73982265

111725685

config_name

features

splits

download_size

dataset_size

rte

name	dtype
sentence1	string

name	dtype
sentence2	string

name

dtype

label

class_label

names

0	1
entailment	not_entailment

name	dtype
idx	int32

name	num_bytes	num_examples
train	847320	2490

name	num_bytes	num_examples
validation	90728	277

name	num_bytes	num_examples
test	974053	3000

1274409

1912101

config_name

features

splits

download_size

dataset_size

sst2

name	dtype
sentence	string

name

dtype

label

class_label

names

0	1
negative	positive

name	dtype
idx	int32

name	num_bytes	num_examples
train	4681603	67349

name	num_bytes	num_examples
validation	106252	872

name	num_bytes	num_examples
test	216640	1821

3331080

5004495

config_name

features

splits

download_size

dataset_size

stsb

name	dtype
sentence1	string

name	dtype
sentence2	string

name	dtype
label	float32

name	dtype
idx	int32

name	num_bytes	num_examples
train	754791	5749

name	num_bytes	num_examples
validation	216064	1500

name	num_bytes	num_examples
test	169974	1379

766983

1140829

config_name

features

splits

download_size

dataset_size

wnli

name	dtype
sentence1	string

name	dtype
sentence2	string

name

dtype

label

class_label

names

0	1
not_entailment	entailment

name	dtype
idx	int32

name	num_bytes	num_examples
train	107109	635

name	num_bytes	num_examples
validation	12162	71

name	num_bytes	num_examples
test	37889	146

63522

157160

config_name

data_files

ax

split	path
test	ax/test-*

config_name

data_files

cola

split	path
train	cola/train-*

split	path
validation	cola/validation-*

split	path
test	cola/test-*

config_name

data_files

mnli

split	path
train	mnli/train-*

split	path
validation_matched	mnli/validation_matched-*

split	path
validation_mismatched	mnli/validation_mismatched-*

split	path
test_matched	mnli/test_matched-*

split	path
test_mismatched	mnli/test_mismatched-*

config_name

data_files

mnli_matched

split	path
validation	mnli_matched/validation-*

split	path
test	mnli_matched/test-*

config_name

data_files

mnli_mismatched

split	path
validation	mnli_mismatched/validation-*

split	path
test	mnli_mismatched/test-*

config_name

data_files

mrpc

split	path
train	mrpc/train-*

split	path
validation	mrpc/validation-*

split	path
test	mrpc/test-*

config_name

data_files

qnli

split	path
train	qnli/train-*

split	path
validation	qnli/validation-*

split	path
test	qnli/test-*

config_name

data_files

qqp

split	path
train	qqp/train-*

split	path
validation	qqp/validation-*

split	path
test	qqp/test-*

config_name

data_files

rte

split	path
train	rte/train-*

split	path
validation	rte/validation-*

split	path
test	rte/test-*

config_name

data_files

sst2

split	path
train	sst2/train-*

split	path
validation	sst2/validation-*

split	path
test	sst2/test-*

config_name

data_files

stsb

split	path
train	stsb/train-*

split	path
validation	stsb/validation-*

split	path
test	stsb/test-*

config_name

data_files

wnli

split	path
train	wnli/train-*

split	path
validation	wnli/validation-*

split	path
test	wnli/test-*

config

task

task_id

splits

col_mapping

cola

text-classification

binary_classification

train_split	eval_split
train	validation

sentence	label
text	target

config

task

task_id

splits

col_mapping

sst2

text-classification

binary_classification

train_split	eval_split
train	validation

sentence	label
text	target

config

task

task_id

splits

col_mapping

mrpc

text-classification

natural_language_inference

train_split	eval_split
train	validation

sentence1	sentence2	label
text1	text2	target

config

task

task_id

splits

col_mapping

qqp

text-classification

natural_language_inference

train_split	eval_split
train	validation

question1	question2	label
text1	text2	target

config

task

task_id

splits

col_mapping

stsb

text-classification

natural_language_inference

train_split	eval_split
train	validation

sentence1	sentence2	label
text1	text2	target

config

task

task_id

splits

col_mapping

mnli

text-classification

natural_language_inference

train_split	eval_split
train	validation_matched

premise	hypothesis	label
text1	text2	target

config

task

task_id

splits

col_mapping

mnli_mismatched

text-classification

natural_language_inference

train_split	eval_split
train	validation

premise	hypothesis	label
text1	text2	target

config

task

task_id

splits

col_mapping

mnli_matched

text-classification

natural_language_inference

train_split	eval_split
train	validation

premise	hypothesis	label
text1	text2	target

config

task

task_id

splits

col_mapping

qnli

text-classification

natural_language_inference

train_split	eval_split
train	validation

question	sentence	label
text1	text2	target

config

task

task_id

splits

col_mapping

rte

text-classification

natural_language_inference

train_split	eval_split
train	validation

sentence1	sentence2	label
text1	text2	target

config

task

task_id

splits

col_mapping

wnli

text-classification

natural_language_inference

train_split	eval_split
train	validation

sentence1	sentence2	label
text1	text2	target

Dataset Card for GLUE

Dataset Card for GLUE
- Table of Contents
- Dataset Description
  - Dataset Summary
  - Supported Tasks and Leaderboards
    - ax
    - cola
    - mnli
    - mnli_matched
    - mnli_mismatched
    - mrpc
    - qnli
    - qqp
    - rte
    - sst2
    - stsb
    - wnli
  - Languages
- Dataset Structure
  - Data Instances
    - ax
    - cola
    - mnli
    - mnli_matched
    - mnli_mismatched
    - mrpc
    - qnli
    - qqp
    - rte
    - sst2
    - stsb
    - wnli
  - Data Fields
    - ax
    - cola
    - mnli
    - mnli_matched
    - mnli_mismatched
    - mrpc
    - qnli
    - qqp
    - rte
    - sst2
    - stsb
    - wnli
  - Data Splits
    - ax
    - cola
    - mnli
    - mnli_matched
    - mnli_mismatched
    - mrpc
    - qnli
    - qqp
    - rte
    - sst2
    - stsb
    - wnli
- Dataset Creation
- Considerations for Using the Data
- Additional Information

Dataset Description

Homepage: https://gluebenchmark.com/
Repository: https://github.com/nyu-mll/GLUE-baselines
Paper: https://arxiv.org/abs/1804.07461
Leaderboard: https://gluebenchmark.com/leaderboard
Point of Contact: More Information Needed
Size of downloaded dataset files: 1.00 GB
Size of the generated dataset: 240.84 MB
Total amount of disk used: 1.24 GB

Dataset Summary

GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

Supported Tasks and Leaderboards

The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks:

ax

A manually-curated evaluation dataset for fine-grained analysis of system performance on a broad range of linguistic phenomena. This dataset evaluates sentence understanding through Natural Language Inference (NLI) problems. Use a model trained on MulitNLI to produce predictions for this dataset.

cola

The Corpus of Linguistic Acceptability consists of English acceptability judgments drawn from books and journal articles on linguistic theory. Each example is a sequence of words annotated with whether it is a grammatical English sentence.

mnli

The Multi-Genre Natural Language Inference Corpus is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. The authors of the benchmark use the standard test set, for which they obtained private labels from the RTE authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) section. They also uses and recommend the SNLI corpus as 550k examples of auxiliary training data.

mnli_matched

The matched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mnli_mismatched

The mismatched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mrpc

The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

qnli

The Stanford Question Answering Dataset is a question-answering dataset consisting of question-paragraph pairs, where one of the sentences in the paragraph (drawn from Wikipedia) contains the answer to the corresponding question (written by an annotator). The authors of the benchmark convert the task into sentence pair classification by forming a pair between each question and each sentence in the corresponding context, and filtering out pairs with low lexical overlap between the question and the context sentence. The task is to determine whether the context sentence contains the answer to the question. This modified version of the original task removes the requirement that the model select the exact answer, but also removes the simplifying assumptions that the answer is always present in the input and that lexical overlap is a reliable cue.

qqp

The Quora Question Pairs2 dataset is a collection of question pairs from the community question-answering website Quora. The task is to determine whether a pair of questions are semantically equivalent.

rte

The Recognizing Textual Entailment (RTE) datasets come from a series of annual textual entailment challenges. The authors of the benchmark combined the data from RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007), and RTE5 (Bentivogli et al., 2009). Examples are constructed based on news and Wikipedia text. The authors of the benchmark convert all datasets to a two-class split, where for three-class datasets they collapse neutral and contradiction into not entailment, for consistency.

sst2

The Stanford Sentiment Treebank consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence. It uses the two-way (positive/negative) class split, with only sentence-level labels.

stsb

The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5.

wnli

The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task in which a system must read a sentence with a pronoun and select the referent of that pronoun from a list of choices. The examples are manually constructed to foil simple statistical methods: Each one is contingent on contextual information provided by a single word or phrase in the sentence. To convert the problem into sentence pair classification, the authors of the benchmark construct sentence pairs by replacing the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. They use a small evaluation set consisting of new examples derived from fiction books that was shared privately by the authors of the original corpus. While the included training set is balanced between two classes, the test set is imbalanced between them (65% not entailment). Also, due to a data quirk, the development set is adversarial: hypotheses are sometimes shared between training and development examples, so if a model memorizes the training examples, they will predict the wrong label on corresponding development set example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence between a model's score on this task and its score on the unconverted original task. The authors of the benchmark call converted dataset WNLI (Winograd NLI).

Languages

The language data in GLUE is in English (BCP-47 en)

Dataset Structure

Data Instances

ax

Size of downloaded dataset files: 0.22 MB
Size of the generated dataset: 0.24 MB
Total amount of disk used: 0.46 MB

An example of 'test' looks as follows.

{
  "premise": "The cat sat on the mat.",
  "hypothesis": "The cat did not sit on the mat.",
  "label": -1,
  "idx: 0
}

cola

Size of downloaded dataset files: 0.38 MB
Size of the generated dataset: 0.61 MB
Total amount of disk used: 0.99 MB

An example of 'train' looks as follows.

{
  "sentence": "Our friends won't buy this analysis, let alone the next one we propose.",
  "label": 1,
  "id": 0
}

mnli

Size of downloaded dataset files: 312.78 MB
Size of the generated dataset: 82.47 MB
Total amount of disk used: 395.26 MB

An example of 'train' looks as follows.

{
  "premise": "Conceptually cream skimming has two basic dimensions - product and geography.",
  "hypothesis": "Product and geography are what make cream skimming work.",
  "label": 1,
  "idx": 0
}

mnli_matched

Size of downloaded dataset files: 312.78 MB
Size of the generated dataset: 3.69 MB
Total amount of disk used: 316.48 MB

An example of 'test' looks as follows.

{
  "premise": "Hierbas, ans seco, ans dulce, and frigola are just a few names worth keeping a look-out for.",
  "hypothesis": "Hierbas is a name worth looking out for.",
  "label": -1,
  "idx": 0
}

mnli_mismatched

Size of downloaded dataset files: 312.78 MB
Size of the generated dataset: 3.91 MB
Total amount of disk used: 316.69 MB

An example of 'test' looks as follows.

{
  "premise": "What have you decided, what are you going to do?",
  "hypothesis": "So what's your decision?",
  "label": -1,
  "idx": 0
}

mrpc

Size of downloaded dataset files: ??
Size of the generated dataset: 1.5 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "sentence1": "Amrozi accused his brother, whom he called "the witness", of deliberately distorting his evidence.",
  "sentence2": "Referring to him as only "the witness", Amrozi accused his brother of deliberately distorting his evidence.",
  "label": 1,
  "idx": 0
}

qnli

Size of downloaded dataset files: ??
Size of the generated dataset: 28 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "question": "When did the third Digimon series begin?",
  "sentence": "Unlike the two seasons before it and most of the seasons that followed, Digimon Tamers takes a darker and more realistic approach to its story featuring Digimon who do not reincarnate after their deaths and more complex character development in the original Japanese.",
  "label": 1,
  "idx": 0
}

qqp

Size of downloaded dataset files: ??
Size of the generated dataset: 107 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "question1": "How is the life of a math student? Could you describe your own experiences?",
  "question2": "Which level of prepration is enough for the exam jlpt5?",
  "label": 0,
  "idx": 0
}

rte

Size of downloaded dataset files: ??
Size of the generated dataset: 1.9 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "sentence1": "No Weapons of Mass Destruction Found in Iraq Yet.",
  "sentence2": "Weapons of Mass Destruction Found in Iraq.",
  "label": 1,
  "idx": 0
}

sst2

Size of downloaded dataset files: ??
Size of the generated dataset: 4.9 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "sentence": "hide new secretions from the parental units",
  "label": 0,
  "idx": 0
}

stsb

Size of downloaded dataset files: ??
Size of the generated dataset: 1.2 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "sentence1": "A plane is taking off.",
  "sentence2": "An air plane is taking off.",
  "label": 5.0,
  "idx": 0
}

wnli

Size of downloaded dataset files: ??
Size of the generated dataset: 0.18 MB
Total amount of disk used: ??

An example of 'train' looks as follows.

{
  "sentence1": "I stuck a pin through a carrot. When I pulled the pin out, it had a hole.",
  "sentence2": "The carrot had a hole.",
  "label": 1,
  "idx": 0
}

Data Fields

The data fields are the same among all splits.

ax

premise: a string feature.
hypothesis: a string feature.
label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
idx: a int32 feature.

cola

sentence: a string feature.
label: a classification label, with possible values including unacceptable (0), acceptable (1).
idx: a int32 feature.

mnli

premise: a string feature.
hypothesis: a string feature.
label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
idx: a int32 feature.

mnli_matched

premise: a string feature.
hypothesis: a string feature.
label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
idx: a int32 feature.

mnli_mismatched

premise: a string feature.
hypothesis: a string feature.
label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
idx: a int32 feature.

mrpc

sentence1: a string feature.
sentence2: a string feature.
label: a classification label, with possible values including not_equivalent (0), equivalent (1).
idx: a int32 feature.

qnli

question: a string feature.
sentence: a string feature.
label: a classification label, with possible values including entailment (0), not_entailment (1).
idx: a int32 feature.

qqp

question1: a string feature.
question2: a string feature.
label: a classification label, with possible values including not_duplicate (0), duplicate (1).
idx: a int32 feature.

rte

sentence1: a string feature.
sentence2: a string feature.
label: a classification label, with possible values including entailment (0), not_entailment (1).
idx: a int32 feature.

sst2

sentence: a string feature.
label: a classification label, with possible values including negative (0), positive (1).
idx: a int32 feature.

stsb

sentence1: a string feature.
sentence2: a string feature.
label: a float32 regression label, with possible values from 0 to 5.
idx: a int32 feature.

wnli

sentence1: a string feature.
sentence2: a string feature.
label: a classification label, with possible values including not_entailment (0), entailment (1).
idx: a int32 feature.

Data Splits

ax

	test
ax	1104

cola

	train	validation	test
cola	8551	1043	1063

mnli

	train	validation_matched	validation_mismatched	test_matched	test_mismatched
mnli	392702	9815	9832	9796	9847

mnli_matched

	validation	test
mnli_matched	9815	9796

mnli_mismatched

	validation	test
mnli_mismatched	9832	9847

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

More Information Needed

Additional Information

Dataset Curators

More Information Needed

Licensing Information

The primary GLUE tasks are built on and derived from existing datasets. We refer users to the original licenses accompanying each dataset.

Citation Information

If you use GLUE, please cite all the datasets you use.

In addition, we encourage you to use the following BibTeX citation for GLUE itself:

@inproceedings{wang2019glue,
  title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

If you evaluate using GLUE, we also highly recommend citing the papers that originally introduced the nine GLUE tasks, both to give the original authors their due credit and because venues will expect papers to describe the data they evaluate on. The following provides BibTeX for all of the GLUE tasks, except QQP, for which we recommend adding a footnote to this page: https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs

@article{warstadt2018neural,
  title={Neural Network Acceptability Judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R.},
  journal={arXiv preprint 1805.12471},
  year={2018}
}
@inproceedings{socher2013recursive,
  title={Recursive deep models for semantic compositionality over a sentiment treebank},
  author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew and Potts, Christopher},
  booktitle={Proceedings of EMNLP},
  pages={1631--1642},
  year={2013}
}
@inproceedings{dolan2005automatically,
  title={Automatically constructing a corpus of sentential paraphrases},
  author={Dolan, William B and Brockett, Chris},
  booktitle={Proceedings of the International Workshop on Paraphrasing},
  year={2005}
}
@book{agirre2007semantic,
  editor    = {Agirre, Eneko and M`arquez, Llu'{i}s and Wicentowski, Richard},
  title     = {Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)},
  month     = {June},
  year      = {2007},
  address   = {Prague, Czech Republic},
  publisher = {Association for Computational Linguistics},
}
@inproceedings{williams2018broad,
  author    = {Williams, Adina and Nangia, Nikita and Bowman, Samuel R.},
  title = {A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference},
  booktitle = {Proceedings of NAACL-HLT},
  year = 2018
}
@inproceedings{rajpurkar2016squad,
  author = {Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy}
  title = {{SQ}u{AD}: 100,000+ Questions for Machine Comprehension of Text},
  booktitle = {Proceedings of EMNLP}
  year = {2016},
  publisher = {Association for Computational Linguistics},
  pages = {2383--2392},
  location = {Austin, Texas},
}
@incollection{dagan2006pascal,
  title={The {PASCAL} recognising textual entailment challenge},
  author={Dagan, Ido and Glickman, Oren and Magnini, Bernardo},
  booktitle={Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising tectual entailment},
  pages={177--190},
  year={2006},
  publisher={Springer}
}
@article{bar2006second,
  title={The second {PASCAL} recognising textual entailment challenge},
  author={Bar Haim, Roy and Dagan, Ido and Dolan, Bill and Ferro, Lisa and Giampiccolo, Danilo and Magnini, Bernardo and Szpektor, Idan},
  year={2006}
}
@inproceedings{giampiccolo2007third,
  title={The third {PASCAL} recognizing textual entailment challenge},
  author={Giampiccolo, Danilo and Magnini, Bernardo and Dagan, Ido and Dolan, Bill},
  booktitle={Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing},
  pages={1--9},
  year={2007},
  organization={Association for Computational Linguistics},
}
@article{bentivogli2009fifth,
  title={The Fifth {PASCAL} Recognizing Textual Entailment Challenge},
  author={Bentivogli, Luisa and Dagan, Ido and Dang, Hoa Trang and Giampiccolo, Danilo and Magnini, Bernardo},
  booktitle={TAC},
  year={2009}
}
@inproceedings{levesque2011winograd,
  title={The {W}inograd schema challenge},
  author={Levesque, Hector J and Davis, Ernest and Morgenstern, Leora},
  booktitle={{AAAI} Spring Symposium: Logical Formalizations of Commonsense Reasoning},
  volume={46},
  pages={47},
  year={2011}
}

Contributions

Thanks to @patpizio, @jeswan, @thomwolf, @patrickvonplaten, @mariamabarham for adding this dataset.

Hugging Face 数据集镜像/glue .gitee-modal { width: 500px !important; }

Dataset Card for GLUE

Table of Contents

Dataset Description

Dataset Summary

Supported Tasks and Leaderboards

ax

cola

mnli

mnli_matched

mnli_mismatched

mrpc

qnli

qqp

rte

sst2

stsb

wnli

Languages

Dataset Structure

Data Instances

ax

cola

mnli

mnli_matched

mnli_mismatched

mrpc

qnli

qqp

rte

sst2

stsb

wnli

Data Fields

ax

cola

mnli

mnli_matched

mnli_mismatched

mrpc

qnli

qqp

rte

sst2

stsb

wnli

Data Splits

ax

cola

mnli

mnli_matched

mnli_mismatched

mrpc

qnli

qqp

rte

sst2

stsb

wnli

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions

简介

发行版

贡献者

Hugging Face 数据集镜像/glue