# Pebble-Dataset **Repository Path**: bit2atom/Pebble-Dataset ## Basic Information - **Project Name**: Pebble-Dataset - **Description**: A machine learning dataset consisting of 5000 images of pebbles - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-25 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README  # PEBBLE DATASET > "Hoard and praise the verity of gravel./Gems for the undeluded. Milt of the earth." > — Seamus Heaney, *The Gravel Walks*, 1972 > "We must always bear in mind that a pebble is a transient thing. It is th the half-way stage of a long existence. Beginning as a fragment of rock, which itself is millions of year old, it ends its existence by being pounded into minute particles or grains." > — Clarence Ellis, *Pebbles on the Beach*, 1954 In order to train a machine learning system to recognize an object, one needs lots and lots of examples to train it on. The [ImageNet dataset](https://en.wikipedia.org/wiki/ImageNet), a project started in 2009 at Princeton University, is now the defacto standard for training and testing object-detection systems. It contains hand-annotated URLs to 14-million images in 1000 categories, ranging from animals (dozens of different dog breeds and bird species) to everyday objects. Datasets abound across the internet, from massive sets of Tweets to videos of people performing household tasks. If a neural network is a *system* wherein we do abstract work, a dataset is an *ecosystem* made up of things of and embedded in the world. (Or, in the [words of UC Berkely professor Alexei Efros](https://www.newyorker.com/magazine/2018/11/12/in-the-age-of-ai-is-seeing-still-believing): "data, data, data... the gunk, the dirt, the complexity of the world.") Of course, what we choose to gather data of, and subsequently train our networks on, creates a listing of the things we care about. We put our labor towards teaching computers to know the difference between 23 different kinds of terrier found in the ImageNet dataset, but what about the things we haven't yet compiled datasets of? This project is a machine learning dataset consisting of 5000 images of pebbles gathered in Cambridge, England in the fall of 2018. Pebbles are literally cast-offs from something else, not the stone of buildings or triumphal sculptures but the thing left behind in the mud by scraping glaciers and on the beach by rolling waves and collecting along roadside ditches. A dataset of pebbles is a poetic addition to the overwhelmingly ultilitarian datasets that already exist. *This project was developed at University of Cambridge, at both King's College and the Computer Laboratory, where I was a Visiting Fellow and artist-in-residence, respetively.*