★★★多视图数据集整理

自己半监督论文所用数据集

To verify the effectiveness of the algorithm, in this paper, five simulation datasets uci-digit[1], 3-sources[2], MSRC_v1[3], BBCSport[4], BBC[4] and one real dataset Elevator are used, as shown in Table 1.

• uci-digit:

This dataset consists of 2000 examples of handwritten digits (0-9) extracted from Dutch utility maps. There are 200 examples in each class, each represented with six feature sets.

• 3-Sources:

It is 948 news articles dataset from three well-known online news sources, where each source is seen as one view. The 169 of these articles, in all three sources, are selected as multi-view dataset to validate our proposed method.

• MSRC_v1:

It consists of 210 images from 7 distinct subjects. We extract five types of features (i.e., CM feature, GIST feature, HOG feature, CENT feature and LBP feature) to construct five-view dataset in the experiments.

• BBCSport:

It contains 544 sports new articles collected from five topical areas, which correspond to five classes (athletics,cricket, football, rugby and tennis). The document is described by two views, and their dimensions are 3283 and 3183 respectively.

• BBC:

It contains 685 documents from the BBC news which can be grouped into five classes (business, entertainment, politics, sport, tech). The document is described by four views, and their dimensions are 4659, 4633, 4665 and 4684 respectively.

缩略版描述：

· uci-digit: Contains 2000 samples of handwritten numbers 0 to 9 with 10 classes.

· 3-Sources: Contains datasets from BBC, Reuter, guardian news articles, using 169 of them.

· MSRC_v1: Containing 210 images of a total of 7 categories.

· BBCSport: Contains 544 sports news items from 2 different areas.

· BBC: Contains 685 documents from BBC News, which can be divided into 5 categories.

[1] http://archive.ics.uci.edu/ml/datasets

[2] http://mlg.ucd.ie/datasets/3sources.html

[3] https://www.cnblogs.com/picassooo/p/12890078.html

[4] http://mlg.ucd.ie/datasets/segment.html

以下数据集信息摘录自：Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization

Newsgroup dataset：

It contains approximately 20,000 news-group documents, classified into twenty classes. Here, we select five classes and each class contains 100 documents with 3 different preprocessing. And the preprocessing steps can be found in ICMLA 2010 publication. By the above preprocessing, we obtain 500 samples dataset with three view.

http://lig-membres.imag.fr/grimal/data.html

以下数据集信息摘录自：Multi-View Learning a Decomposable Affinity Matrix via Tensor Self-Representation on Grassmann Manifold

Caltech-7:

This dataset contains 1474 pictures of objects belonging to 7 classes (i.e., Face, Dollar-Bill, Motorbikes, Stop-Sign, Garfield, Snoopy, and Windsor-Chair).All images are described with six types of features (254 CENTRIST, 48 Gabor, 512 GIST, 1984 HOG,928 LBP, and 40 wavelet moments).

http://www.vision.caltech.edu/archive.html

COIL-20:

This dataset is composed of 1440 images covering 20 categories. These images are represented by three kinds of features (i.e., 1024 Intensity, 3304 LBP and 6750 Gabor)

http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php

ORL:

This dataset is composed of 40 distinct subjects with 10 different images for each. These images are taken at different times with varied lighting, facial expressions, and facial details. Three types of feature sets (i.e., 4096 Intensity, 3304 LBP , and 6750 Gabor) are utilized for the experiment.

https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

Extended Yale-B:

This dataset comprises 38 individuals. Each individual has 64 near frontal images under different illuminations. We use the first 10 classes for the experiment, which has 640 frontal face images in total.

http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html

以下数据集信息摘录自：Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding

Outdoor Scene :

The dataset has 2688 images consisting of 8 groups. For each image, we extract four different feature vectors including 512-D GIST, 432-D color moment, 256-D HOG, and 48-D LBP.

以下数据集信息摘录自：Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization

Newsgroup dataset：

contains approximately 20,000 newsgroup documents, classified into twenty classes. Here, we select five classes and each class contains 100 documents with 3 different preprocessing. And the preprocessing steps can be found in ICMLA 2010 publication. By the above preprocessing, we obtain 500 samples dataset with three view.

http://lig-membres.imag.fr/grimal/data.html 【同时还包括：Reuters、Cora 、CiteSeer、WebKB、Newsgroup 】

Reuters dataset ：

contains 6 samples of 1200 documents, and each document is described by five different languages (English , French, German, Spanish and Italian). Here, each sample corresponds to one class, and each language is seen as one view. We can randomly select 600 documents (100 per class) in each

http://lig-membres.imag.fr/grimal/data.html 【同时还包括：Reuters、Cora 、CiteSeer、WebKB、Newsgroup 】

以下数据集信息摘录自：A study of graph-based system for multi-view clustering

WebKB data set4 (WebKB)

consists of 203 web-pages of 4 classes. Each web-page is described by the content of the page, the anchor text of the hyper-link, and the text in its title.

https://linqs.soe.ucsc.edu/data. 【包括很多数据集】

(100leaves)One-hundred plant species leaves

is from the UCI repository. The dataset consists of 1600 samples with three views. Each sample is one of the one hundred plant species.

https://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+data+set.

Handwritten digit (HW)

is from the UCI repository.The dataset consists of 2000 samples with six views.Each sample is one of the handwritten digits (0–9).

http://archive.ics.uci.edu/ml/datasets/Multiple+Features

以下数据集信息摘录自：Multi-view Graph Learning by Joint Modeling of Consistency and Inconsistency

ALOI

it is a collection of 110250 images of 1000 small objects [33].

https://elki-project.github.io/datasets/multiview

Caltech101

contains pictures of objects belonging to 101 categories

http://www.vision.caltech.edu/ImageDatasets/Caltech101/

弓长/多视图数据集

★★★多视图数据集整理

• uci-digit:

• 3-Sources:

• MSRC_v1:

• BBCSport:

• BBC:

Newsgroup dataset：

Caltech-7:

COIL-20:

ORL:

Extended Yale-B:

Outdoor Scene :

Newsgroup dataset：

Reuters dataset ：

WebKB data set4 (WebKB)

(100leaves)One-hundred plant species leaves

Handwritten digit (HW)

ALOI

Caltech101

简介

发行版

贡献者

近期动态

弓长/多视图数据集 .gitee-modal { width: 500px !important; }

★★★多视图数据集整理

• uci-digit:

• 3-Sources:

• MSRC_v1:

• BBCSport:

• BBC:

Newsgroup dataset：

Caltech-7:

COIL-20:

ORL:

Extended Yale-B:

Outdoor Scene :

Newsgroup dataset：

Reuters dataset ：

WebKB data set4 (WebKB)

(100leaves)One-hundred plant species leaves

Handwritten digit (HW)

ALOI

Caltech101

简介

发行版

贡献者

近期动态

搜索帮助

弓长/多视图数据集