Dataset Card for top-american-universities-on-reddit

annotations_creators

language_creators

language

license

multilinguality

size_categories

source_datasets

paperswithcode_id

lexyr

crowdsourced

en

cc-by-4.0

monolingual

100K

original

Dataset Card for top-american-universities-on-reddit

Dataset Description

Homepage: https://socialgrep.com/datasets
Point of Contact: Website

Dataset Summary

This corpus contains the complete data for the activity of the subreddits of the top 10 US colleges, according to the 2019 Forbes listing.

Languages

Mainly English.

Dataset Structure

Data Instances

A data point is a post or a comment. Due to the separate nature of the two, those exist in two different files - even though many fields are shared.

Data Fields

'type': the type of the data point. Can be 'post' or 'comment'.
'id': the base-36 Reddit ID of the data point. Unique when combined with type.
'subreddit.id': the base-36 Reddit ID of the data point's host subreddit. Unique.
'subreddit.name': the human-readable name of the data point's host subreddit.
'subreddit.nsfw': a boolean marking the data point's host subreddit as NSFW or not.
'created_utc': a UTC timestamp for the data point.
'permalink': a reference link to the data point on Reddit.
'score': score of the data point on Reddit.
'domain': (Post only) the domain of the data point's link.
'url': (Post only) the destination of the data point's link, if any.
'selftext': (Post only) the self-text of the data point, if any.
'title': (Post only) the title of the post data point.
'body': (Comment only) the body of the comment data point.
'sentiment': (Comment only) the result of an in-house sentiment analysis pipeline. Used for exploratory analysis.

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

CC-BY v4.0

Contributions

[Needs More Information]

Hugging Face 数据集镜像 / top-american-universities-on-reddit

Dataset Card for top-american-universities-on-reddit

Table of Contents

Dataset Description

Dataset Summary

Languages

Dataset Structure

Data Instances

Data Fields

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Contributions

简介

发行版

贡献者

近期动态

Hugging Face 数据集镜像 / top-american-universities-on-reddit .gitee-modal { width: 500px !important; }

Dataset Card for top-american-universities-on-reddit

Table of Contents

Dataset Description

Dataset Summary

Languages

Dataset Structure

Data Instances

Data Fields

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection and Normalization

Who are the source language producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Contributions

简介

发行版

贡献者

近期动态

搜索帮助

Hugging Face 数据集镜像 / top-american-universities-on-reddit