自己半监督论文所用数据集
To verify the effectiveness of the algorithm, in this paper, five simulation datasets uci-digit[1], 3-sources[2], MSRC_v1[3], BBCSport[4], BBC[4] and one real dataset Elevator are used, as shown in Table 1.
This dataset consists of 2000 examples of handwritten digits (0-9) extracted from Dutch utility maps. There are 200 examples in each class, each represented with six feature sets.
It is 948 news articles dataset from three well-known online news sources, where each source is seen as one view. The 169 of these articles, in all three sources, are selected as multi-view dataset to validate our proposed method.
It consists of 210 images from 7 distinct subjects. We extract five types of features (i.e., CM feature, GIST feature, HOG feature, CENT feature and LBP feature) to construct five-view dataset in the experiments.
It contains 544 sports new articles collected from five topical areas, which correspond to five classes (athletics,cricket, football, rugby and tennis). The document is described by two views, and their dimensions are 3283 and 3183 respectively.
It contains 685 documents from the BBC news which can be grouped into five classes (business, entertainment, politics, sport, tech). The document is described by four views, and their dimensions are 4659, 4633, 4665 and 4684 respectively.
缩略版描述:
· uci-digit: Contains 2000 samples of handwritten numbers 0 to 9 with 10 classes.
· 3-Sources: Contains datasets from BBC, Reuter, guardian news articles, using 169 of them.
· MSRC_v1: Containing 210 images of a total of 7 categories.
· BBCSport: Contains 544 sports news items from 2 different areas.
· BBC: Contains 685 documents from BBC News, which can be divided into 5 categories.
[1] http://archive.ics.uci.edu/ml/datasets
[2] http://mlg.ucd.ie/datasets/3sources.html
[3] https://www.cnblogs.com/picassooo/p/12890078.html
[4] http://mlg.ucd.ie/datasets/segment.html
以下数据集信息摘录自:Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization
It contains approximately 20,000 news-group documents, classified into twenty classes. Here, we select five classes and each class contains 100 documents with 3 different preprocessing. And the preprocessing steps can be found in ICMLA 2010 publication. By the above preprocessing, we obtain 500 samples dataset with three view.
http://lig-membres.imag.fr/grimal/data.html
以下数据集信息摘录自:Multi-View Learning a Decomposable Affinity Matrix via Tensor Self-Representation on Grassmann Manifold
This dataset contains 1474 pictures of objects belonging to 7 classes (i.e., Face, Dollar-Bill, Motorbikes, Stop-Sign, Garfield, Snoopy, and Windsor-Chair).All images are described with six types of features (254 CENTRIST, 48 Gabor, 512 GIST, 1984 HOG,928 LBP, and 40 wavelet moments).
http://www.vision.caltech.edu/archive.html
This dataset is composed of 1440 images covering 20 categories. These images are represented by three kinds of features (i.e., 1024 Intensity, 3304 LBP and 6750 Gabor)
http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php
This dataset is composed of 40 distinct subjects with 10 different images for each. These images are taken at different times with varied lighting, facial expressions, and facial details. Three types of feature sets (i.e., 4096 Intensity, 3304 LBP , and 6750 Gabor) are utilized for the experiment.
https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
This dataset comprises 38 individuals. Each individual has 64 near frontal images under different illuminations. We use the first 10 classes for the experiment, which has 640 frontal face images in total.
http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html
以下数据集信息摘录自:Multi-view spectral clustering via integrating nonnegative embedding and spectral embedding
The dataset has 2688 images consisting of 8 groups. For each image, we extract four different feature vectors including 512-D GIST, 432-D color moment, 256-D HOG, and 48-D LBP.
以下数据集信息摘录自:Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization
contains approximately 20,000 newsgroup documents, classified into twenty classes. Here, we select five classes and each class contains 100 documents with 3 different preprocessing. And the preprocessing steps can be found in ICMLA 2010 publication. By the above preprocessing, we obtain 500 samples dataset with three view.
http://lig-membres.imag.fr/grimal/data.html 【同时还包括:Reuters、Cora 、CiteSeer、WebKB、Newsgroup 】
contains 6 samples of 1200 documents, and each document is described by five different languages (English , French, German, Spanish and Italian). Here, each sample corresponds to one class, and each language is seen as one view. We can randomly select 600 documents (100 per class) in each
http://lig-membres.imag.fr/grimal/data.html 【同时还包括:Reuters、Cora 、CiteSeer、WebKB、Newsgroup 】
以下数据集信息摘录自:A study of graph-based system for multi-view clustering
consists of 203 web-pages of 4 classes. Each web-page is described by the content of the page, the anchor text of the hyper-link, and the text in its title.
https://linqs.soe.ucsc.edu/data. 【包括很多数据集】
is from the UCI repository. The dataset consists of 1600 samples with three views. Each sample is one of the one hundred plant species.
https://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+data+set.
is from the UCI repository.The dataset consists of 2000 samples with six views.Each sample is one of the handwritten digits (0–9).
http://archive.ics.uci.edu/ml/datasets/Multiple+Features
以下数据集信息摘录自:Multi-view Graph Learning by Joint Modeling of Consistency and Inconsistency
it is a collection of 110250 images of 1000 small objects [33].
https://elki-project.github.io/datasets/multiview
contains pictures of objects belonging to 101 categories
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。