1 Star 0 Fork 0

Hugging Face 数据集镜像/x_dataset_59332

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
licensemultilingualitysource_datasetstask_categoriestask_ids
mit
multilingual
original
text-classificationtoken-classificationquestion-answeringsummarizationtext-generation
sentiment-analysistopic-classificationnamed-entity-recognitionlanguage-modelingtext-scoringmulti-class-classificationmulti-label-classificationextractive-qanews-articles-summarization

Bittensor Subnet 13 X (Twitter) Dataset

Data-universe: The finest collection of social media data the web has to offer Data-universe: The finest collection of social media data the web has to offer

Dataset Description

  • Repository: momo1942/x_dataset_59332
  • Subnet: Bittensor Subnet 13
  • Miner Hotkey: 5HakFWgDJq6VD7cMBJ9Qhc6GN1kJzwd17Mofw4vQmw4iACVV

Dataset Summary

This dataset is part of the Bittensor Subnet 13 decentralized network, containing preprocessed data from X (formerly Twitter). The data is continuously updated by network miners, providing a real-time stream of tweets for various analytical and machine learning tasks. For more information about the dataset, please visit the official repository.

Supported Tasks

The versatility of this dataset allows researchers and data scientists to explore various aspects of social media dynamics and develop innovative applications. Users are encouraged to leverage this data creatively for their specific research or business needs. For example:

  • Sentiment Analysis
  • Trend Detection
  • Content Analysis
  • User Behavior Modeling

Languages

Primary language: Datasets are mostly English, but can be multilingual due to decentralized ways of creation.

Dataset Structure

Data Instances

Each instance represents a single tweet with the following fields:

Data Fields

  • text (string): The main content of the tweet.
  • label (string): Sentiment or topic category of the tweet.
  • tweet_hashtags (list): A list of hashtags used in the tweet. May be empty if no hashtags are present.
  • datetime (string): The date when the tweet was posted.
  • username_encoded (string): An encoded version of the username to maintain user privacy.
  • url_encoded (string): An encoded version of any URLs included in the tweet. May be empty if no URLs are present.

Data Splits

This dataset is continuously updated and does not have fixed splits. Users should create their own splits based on their requirements and the data's timestamp.

Dataset Creation

Source Data

Data is collected from public tweets on X (Twitter), adhering to the platform's terms of service and API usage guidelines.

Personal and Sensitive Information

All usernames and URLs are encoded to protect user privacy. The dataset does not intentionally include personal or sensitive information.

Considerations for Using the Data

Social Impact and Biases

Users should be aware of potential biases inherent in X (Twitter) data, including demographic and content biases. This dataset reflects the content and opinions expressed on X and should not be considered a representative sample of the general population.

Limitations

  • Data quality may vary due to the decentralized nature of collection and preprocessing.
  • The dataset may contain noise, spam, or irrelevant content typical of social media platforms.
  • Temporal biases may exist due to real-time collection methods.
  • The dataset is limited to public tweets and does not include private accounts or direct messages.
  • Not all tweets contain hashtags or URLs.

Additional Information

Licensing Information

The dataset is released under the MIT license. The use of this dataset is also subject to X Terms of Use.

Citation Information

If you use this dataset in your research, please cite it as follows:

@misc{momo19422025datauniversex_dataset_59332,
        title={The Data Universe Datasets: The finest collection of social media data the web has to offer},
        author={momo1942},
        year={2025},
        url={https://huggingface.co/datasets/momo1942/x_dataset_59332},
        }

Contributions

To report issues or contribute to the dataset, please contact the miner or use the Bittensor Subnet 13 governance mechanisms.

Dataset Statistics

[This section is automatically updated]

  • Total Instances: 59618627
  • Date Range: 2025-01-21T00:00:00Z to 2025-02-13T00:00:00Z
  • Last Updated: 2025-02-18T18:58:23Z

Data Distribution

  • Tweets with hashtags: 39.19%
  • Tweets without hashtags: 60.81%

Top 10 Hashtags

For full statistics, please refer to the stats.json file in the repository.

Rank Topic Total Count Percentage
1 NULL 36254573 60.81%
2 #riyadh 356224 0.60%
3 #zelena 269147 0.45%
4 #tiktok 228996 0.38%
5 #bbb25 205819 0.35%
6 #ad 130757 0.22%
7 #jhope_at_galadespiècesjaunes 112491 0.19%
8 #bbmzansi 85594 0.14%
9 #sanremo2025 71178 0.12%
10 #pr 70187 0.12%

Update History

Date New Instances Total Instances
2025-01-27T01:36:35Z 2263860 2263860
2025-01-30T13:48:42Z 12228672 14492532
2025-02-03T01:51:52Z 9023929 23516461
2025-02-06T13:55:18Z 8719773 32236234
2025-02-10T01:59:01Z 9780958 42017192
2025-02-13T14:04:00Z 6763066 48780258
2025-02-17T03:24:16Z 9506366 58286624
2025-02-18T03:08:16Z 692083 58978707
2025-02-18T18:58:23Z 639920 59618627

空文件

简介

Mirror of https://huggingface.co/datasets/momo1942/x_dataset_59332 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/hf-datasets/x_dataset_59332.git
git@gitee.com:hf-datasets/x_dataset_59332.git
hf-datasets
x_dataset_59332
x_dataset_59332
main

搜索帮助