license | multilinguality | source_datasets | task_categories | task_ids | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mit |
|
|
|
|
This dataset is part of the Bittensor Subnet 13 decentralized network, containing preprocessed data from X (formerly Twitter). The data is continuously updated by network miners, providing a real-time stream of tweets for various analytical and machine learning tasks. For more information about the dataset, please visit the official repository.
The versatility of this dataset allows researchers and data scientists to explore various aspects of social media dynamics and develop innovative applications. Users are encouraged to leverage this data creatively for their specific research or business needs. For example:
Primary language: Datasets are mostly English, but can be multilingual due to decentralized ways of creation.
Each instance represents a single tweet with the following fields:
text
(string): The main content of the tweet.label
(string): Sentiment or topic category of the tweet.tweet_hashtags
(list): A list of hashtags used in the tweet. May be empty if no hashtags are present.datetime
(string): The date when the tweet was posted.username_encoded
(string): An encoded version of the username to maintain user privacy.url_encoded
(string): An encoded version of any URLs included in the tweet. May be empty if no URLs are present.This dataset is continuously updated and does not have fixed splits. Users should create their own splits based on their requirements and the data's timestamp.
Data is collected from public tweets on X (Twitter), adhering to the platform's terms of service and API usage guidelines.
All usernames and URLs are encoded to protect user privacy. The dataset does not intentionally include personal or sensitive information.
Users should be aware of potential biases inherent in X (Twitter) data, including demographic and content biases. This dataset reflects the content and opinions expressed on X and should not be considered a representative sample of the general population.
The dataset is released under the MIT license. The use of this dataset is also subject to X Terms of Use.
If you use this dataset in your research, please cite it as follows:
@misc{momo19422025datauniversex_dataset_59332,
title={The Data Universe Datasets: The finest collection of social media data the web has to offer},
author={momo1942},
year={2025},
url={https://huggingface.co/datasets/momo1942/x_dataset_59332},
}
To report issues or contribute to the dataset, please contact the miner or use the Bittensor Subnet 13 governance mechanisms.
[This section is automatically updated]
For full statistics, please refer to the stats.json
file in the repository.
Rank | Topic | Total Count | Percentage |
---|---|---|---|
1 | NULL | 36254573 | 60.81% |
2 | #riyadh | 356224 | 0.60% |
3 | #zelena | 269147 | 0.45% |
4 | #tiktok | 228996 | 0.38% |
5 | #bbb25 | 205819 | 0.35% |
6 | #ad | 130757 | 0.22% |
7 | #jhope_at_galadespiècesjaunes | 112491 | 0.19% |
8 | #bbmzansi | 85594 | 0.14% |
9 | #sanremo2025 | 71178 | 0.12% |
10 | #pr | 70187 | 0.12% |
Date | New Instances | Total Instances |
---|---|---|
2025-01-27T01:36:35Z | 2263860 | 2263860 |
2025-01-30T13:48:42Z | 12228672 | 14492532 |
2025-02-03T01:51:52Z | 9023929 | 23516461 |
2025-02-06T13:55:18Z | 8719773 | 32236234 |
2025-02-10T01:59:01Z | 9780958 | 42017192 |
2025-02-13T14:04:00Z | 6763066 | 48780258 |
2025-02-17T03:24:16Z | 9506366 | 58286624 |
2025-02-18T03:08:16Z | 692083 | 58978707 |
2025-02-18T18:58:23Z | 639920 | 59618627 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。