1 Star 0 Fork 0

Hugging Face 数据集镜像 / top-american-universities-on-reddit

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
annotations_creators language_creators language license multilinguality size_categories source_datasets paperswithcode_id
lexyr
crowdsourced
en
cc-by-4.0
monolingual
100K
original

Dataset Card for top-american-universities-on-reddit

Table of Contents

Dataset Description

Dataset Summary

This corpus contains the complete data for the activity of the subreddits of the top 10 US colleges, according to the 2019 Forbes listing.

Languages

Mainly English.

Dataset Structure

Data Instances

A data point is a post or a comment. Due to the separate nature of the two, those exist in two different files - even though many fields are shared.

Data Fields

  • 'type': the type of the data point. Can be 'post' or 'comment'.

  • 'id': the base-36 Reddit ID of the data point. Unique when combined with type.

  • 'subreddit.id': the base-36 Reddit ID of the data point's host subreddit. Unique.

  • 'subreddit.name': the human-readable name of the data point's host subreddit.

  • 'subreddit.nsfw': a boolean marking the data point's host subreddit as NSFW or not.

  • 'created_utc': a UTC timestamp for the data point.

  • 'permalink': a reference link to the data point on Reddit.

  • 'score': score of the data point on Reddit.

  • 'domain': (Post only) the domain of the data point's link.

  • 'url': (Post only) the destination of the data point's link, if any.

  • 'selftext': (Post only) the self-text of the data point, if any.

  • 'title': (Post only) the title of the post data point.

  • 'body': (Comment only) the body of the comment data point.

  • 'sentiment': (Comment only) the result of an in-house sentiment analysis pipeline. Used for exploratory analysis.

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

CC-BY v4.0

Contributions

[Needs More Information]

空文件

简介

Mirror of https://huggingface.co/datasets/SocialGrep/top-american-universities-on-reddit 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/hf-datasets/top-american-universities-on-reddit.git
git@gitee.com:hf-datasets/top-american-universities-on-reddit.git
hf-datasets
top-american-universities-on-reddit
top-american-universities-on-reddit
main

搜索帮助