Bittensor Subnet 13 X (Twitter) Dataset

license

multilinguality

source_datasets

task_categories

task_ids

mit

multilingual

original

text-classification

token-classification

question-answering

summarization

text-generation

sentiment-analysis

topic-classification

named-entity-recognition

language-modeling

text-scoring

multi-class-classification

multi-label-classification

extractive-qa

news-articles-summarization

Bittensor Subnet 13 X (Twitter) Dataset

Data-universe: The finest collection of social media data the web has to offer

Dataset Description

Repository: momo1942/x_dataset_59332
Subnet: Bittensor Subnet 13
Miner Hotkey: 5HakFWgDJq6VD7cMBJ9Qhc6GN1kJzwd17Mofw4vQmw4iACVV

Dataset Summary

This dataset is part of the Bittensor Subnet 13 decentralized network, containing preprocessed data from X (formerly Twitter). The data is continuously updated by network miners, providing a real-time stream of tweets for various analytical and machine learning tasks. For more information about the dataset, please visit the official repository.

Supported Tasks

The versatility of this dataset allows researchers and data scientists to explore various aspects of social media dynamics and develop innovative applications. Users are encouraged to leverage this data creatively for their specific research or business needs. For example:

Sentiment Analysis
Trend Detection
Content Analysis
User Behavior Modeling

Languages

Primary language: Datasets are mostly English, but can be multilingual due to decentralized ways of creation.

Dataset Structure

Data Instances

Each instance represents a single tweet with the following fields:

Data Fields

text (string): The main content of the tweet.
label (string): Sentiment or topic category of the tweet.
tweet_hashtags (list): A list of hashtags used in the tweet. May be empty if no hashtags are present.
datetime (string): The date when the tweet was posted.
username_encoded (string): An encoded version of the username to maintain user privacy.
url_encoded (string): An encoded version of any URLs included in the tweet. May be empty if no URLs are present.

Data Splits

This dataset is continuously updated and does not have fixed splits. Users should create their own splits based on their requirements and the data's timestamp.

Dataset Creation

Source Data

Data is collected from public tweets on X (Twitter), adhering to the platform's terms of service and API usage guidelines.

Personal and Sensitive Information

All usernames and URLs are encoded to protect user privacy. The dataset does not intentionally include personal or sensitive information.

Considerations for Using the Data

Social Impact and Biases

Users should be aware of potential biases inherent in X (Twitter) data, including demographic and content biases. This dataset reflects the content and opinions expressed on X and should not be considered a representative sample of the general population.

Limitations

Data quality may vary due to the decentralized nature of collection and preprocessing.
The dataset may contain noise, spam, or irrelevant content typical of social media platforms.
Temporal biases may exist due to real-time collection methods.
The dataset is limited to public tweets and does not include private accounts or direct messages.
Not all tweets contain hashtags or URLs.

Additional Information

Licensing Information

The dataset is released under the MIT license. The use of this dataset is also subject to X Terms of Use.

Citation Information

If you use this dataset in your research, please cite it as follows:

@misc{momo19422025datauniversex_dataset_59332,
        title={The Data Universe Datasets: The finest collection of social media data the web has to offer},
        author={momo1942},
        year={2025},
        url={https://huggingface.co/datasets/momo1942/x_dataset_59332},
        }

Contributions

To report issues or contribute to the dataset, please contact the miner or use the Bittensor Subnet 13 governance mechanisms.

Dataset Statistics

[This section is automatically updated]

Total Instances: 59618627
Date Range: 2025-01-21T00:00:00Z to 2025-02-13T00:00:00Z
Last Updated: 2025-02-18T18:58:23Z

Data Distribution

Tweets with hashtags: 39.19%
Tweets without hashtags: 60.81%

Top 10 Hashtags

For full statistics, please refer to the stats.json file in the repository.

Rank	Topic	Total Count	Percentage
1	NULL	36254573	60.81%
2	#riyadh	356224	0.60%
3	#zelena	269147	0.45%
4	#tiktok	228996	0.38%
5	#bbb25	205819	0.35%
6	#ad	130757	0.22%
7	#jhope_at_galadespiècesjaunes	112491	0.19%
8	#bbmzansi	85594	0.14%
9	#sanremo2025	71178	0.12%
10	#pr	70187	0.12%

Update History

Date	New Instances	Total Instances
2025-01-27T01:36:35Z	2263860	2263860
2025-01-30T13:48:42Z	12228672	14492532
2025-02-03T01:51:52Z	9023929	23516461
2025-02-06T13:55:18Z	8719773	32236234
2025-02-10T01:59:01Z	9780958	42017192
2025-02-13T14:04:00Z	6763066	48780258
2025-02-17T03:24:16Z	9506366	58286624
2025-02-18T03:08:16Z	692083	58978707
2025-02-18T18:58:23Z	639920	59618627

Hugging Face 数据集镜像/x_dataset_59332

Bittensor Subnet 13 X (Twitter) Dataset

Dataset Description

Dataset Summary

Supported Tasks

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Source Data

Personal and Sensitive Information

Considerations for Using the Data

Social Impact and Biases

Limitations

Additional Information

Licensing Information

Citation Information

Contributions

Dataset Statistics

Data Distribution

Top 10 Hashtags

Update History

简介

发行版

贡献者

近期动态

Hugging Face 数据集镜像/x_dataset_59332 .gitee-modal { width: 500px !important; }

Bittensor Subnet 13 X (Twitter) Dataset

Dataset Description

Dataset Summary

Supported Tasks

Languages

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Source Data

Personal and Sensitive Information

Considerations for Using the Data

Social Impact and Biases

Limitations

Additional Information

Licensing Information

Citation Information

Contributions

Dataset Statistics

Data Distribution

Top 10 Hashtags

Update History

简介

发行版

贡献者

近期动态

搜索帮助

Hugging Face 数据集镜像/x_dataset_59332