annotations_creators | language_creators | language | license | multilinguality | size_categories | source_datasets | paperswithcode_id | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
This corpus contains the complete data for the activity of the subreddits of the top 10 US colleges, according to the 2019 Forbes listing.
Mainly English.
A data point is a post or a comment. Due to the separate nature of the two, those exist in two different files - even though many fields are shared.
'type': the type of the data point. Can be 'post' or 'comment'.
'id': the base-36 Reddit ID of the data point. Unique when combined with type.
'subreddit.id': the base-36 Reddit ID of the data point's host subreddit. Unique.
'subreddit.name': the human-readable name of the data point's host subreddit.
'subreddit.nsfw': a boolean marking the data point's host subreddit as NSFW or not.
'created_utc': a UTC timestamp for the data point.
'permalink': a reference link to the data point on Reddit.
'score': score of the data point on Reddit.
'domain': (Post only) the domain of the data point's link.
'url': (Post only) the destination of the data point's link, if any.
'selftext': (Post only) the self-text of the data point, if any.
'title': (Post only) the title of the post data point.
'body': (Comment only) the body of the comment data point.
'sentiment': (Comment only) the result of an in-house sentiment analysis pipeline. Used for exploratory analysis.
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
CC-BY v4.0
[Needs More Information]
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。