# gharchive-languages **Repository Path**: mirrors_mikeal/gharchive-languages ## Basic Information - **Project Name**: gharchive-languages - **Description**: Pull language data for all repos mentioned in gharchive data. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-09 - **Last Updated**: 2025-08-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # gharchive languages This repository contains programming language information from GitHub's API. Every hour of gharchive activity references a number of repositories and all of those repositores can be found in each our of data in this repository. ## About this Data Hourly collection began on September 27th 2019. From that point forward, the repos for an hour of activity are collected from the API in less than 24 hours after the activity. This means that the langauge data is fairly consistent with the state of the repository at that time. All data prior to September 27th 2019 was back filled (a process that is still ongoing) and is **not** the langauge data at the time the activity took place but is instead from some time after September 27th. The languages used in a repository don't shift that much over time so this old data is still useful in approximating macro trends like language market share even though the data is not perfect. Another factor to consider is that the older the data we are back filling the more likely a repository is to have been deleted, moved, or renamed which will give us a null value for the language data in the API. It's very important when calculating market share estimatest to remove this null data from the market share since it will distort old data more than new data and also because changes in GitHub's product could effect this as well. In other words, a null value means "no data available" and not "cannot determine languages."