# Final **Repository Path**: zhang----yan/final ## Basic Information - **Project Name**: Final - **Description**: Final Project Template Repo for DA402 2025 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 73 - **Created**: 2025-06-19 - **Last Updated**: 2025-06-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Final Project: Gitee Repository Analysis & Visualization # 期末项目:Gitee仓库分析与可视化 This final project requires your team to programmatically analyze multiple Gitee repositories chosen by your team. 本期末项目要求你的团队以编程方式分析团队选择的多个Gitee仓库。 You will use Python with the GitPython, Seaborn, and Matplotlib libraries to extract data, generate insightful visualizations, and produce comprehensive Markdown reports. 你将使用Python配合GitPython、Seaborn和Matplotlib库来提取数据、生成深度可视化图表并制作全面的Markdown报告。 **The project emphasizes:** **项目重点强调:** - Deep repository analysis - 深度仓库分析 - Effective data visualization - 高效数据可视化 - Comparative analysis across projects - 跨项目对比分析 - Team collaboration - 团队协作 - Adherence to proper Git workflow - 遵循规范的Git工作流程 - Creative exploration beyond core requirements - 超越核心需求的创造性探索 **25% of your grade will be based on creative extensions beyond base requirements, such as:** **25%的分数将基于超出基础要求的创造性扩展,例如:** - Implementing additional insightful statistics - 实现额外的深度统计指标 - Creating supplementary visualizations - 创建补充性可视化图表 - Enhancing report style and presentation - 增强报告样式和呈现效果 You may refer to `https://gitee.com/ArsalaBangash/RepoReport` for sample code and reports. 你可参考 `https://gitee.com/ArsalaBangash/RepoReport` 获取示例代码和报告。 ## **Repository Selection** ## **仓库选择** ### **Team Choice**: Your team will select the Gitee repositories to analyze. ### **团队选择**:你的团队将选择要分析的Gitee仓库。 * **2-member groups**: Select and analyze **two (2)** distinct Gitee repositories. **2人小组**:选择并分析**两个(2)**不同的Gitee仓库。 * **3-member groups**: Select and analyze **three (3)** distinct Gitee repositories. **3人小组**:选择并分析**三个(3)**不同的Gitee仓库。 ### **Selection Criteria**: Chosen repositories must: ### **选择标准**:所选仓库必须满足: * Be publicly accessible on Gitee. 在Gitee上公开可访问。 * Have at least 1 year of activity. 拥有至少1年的活跃记录。 * Contain over 250 commits. 包含超过250次提交。 * Have more than 10 authors 拥有超过10位作者 ## Project README Content ## 项目README内容 Your team's main project `README.md` (the one in your forked assignment repository) must be updated to include the following sections, written by your team: 团队的主要项目`README.md`(位于你fork的任务仓库中)必须更新为包含以下由团队撰写的部分: ### 1. Chosen Repositories ### 1. 所选仓库 * List the full Gitee URL for each chosen repository. 列出每个所选仓库的完整Gitee URL。 * Specify the short `reponame` your team will use for each repository (e.g., `captcha`, `jeepay`) for file naming purposes. This `reponame` must be used consistently. 为每个仓库指定团队将使用的简短`reponame`(例如`captcha`, `jeepay`)用于文件命名,该`reponame`必须保持一致性。 * Provide a brief (2-3 sentences per repository) justification for why your team selected each repository. 为每个仓库提供简要选择理由(每仓库2-3句话)。 ### 2. Comparative Analysis & Key Findings ### 2. 对比分析与关键发现 * After all scripts have run and reports/visualizations are generated, your team will collaboratively write this section. 所有脚本运行且报告/可视化生成后,团队需协作撰写此部分。 * Discuss any interesting patterns, significant differences, or similarities observed across the analyzed repositories, drawing upon the data in `comparison_report.md` and the individual repository reports. 基于`comparison_report.md`和单个仓库报告的数据,讨论跨仓库分析中发现的任何有趣模式、显著差异或相似点。 * Highlight 2-3 key insights or major takeaways from your overall project analysis. 突出整个项目分析中的2-3个关键洞察或主要结论。 ## **Project Structure and Script Execution** ## **项目结构与脚本执行** * **Individual Repository Analysis Scripts (`report-{reponame}.py`)**: For each chosen repository, create a Python script named `report-{reponame}.py`. This script contains the logic to analyze one repository and generate its report. **独立仓库分析脚本(`report-{reponame}.py`)**:为每个所选仓库创建Python脚本,包含分析单个仓库并生成报告的代码逻辑。 * **Individual Repository Reports (`report-{reponame}.md`)**: Each `report-{reponame}.py` script must generate a corresponding Markdown report named `report-{reponame}.md`. **独立仓库报告(`report-{reponame}.md`)**:每个`report-{reponame}.py`脚本必须生成对应的Markdown报告。 * **Cross-Repository Comparison Script (`compare.py`)**: Your team will create one Python script named `compare.py`. This script contains the logic to perform cross-repository comparisons. **跨仓库对比脚本(`compare.py`)**:团队需创建Python脚本,包含执行跨仓库对比的逻辑。 * **Cross-Repository Comparison Report (`comparison_report.md`)**: The `compare.py` script must generate a Markdown report named `comparison_report.md`. **跨仓库对比报告(`comparison_report.md`)**:`compare.py`脚本必须生成Markdown格式的对比报告。 * **Visualizations**: All generated chart images must be stored in a `visuals/` subdirectory within your project. **可视化图表**:所有生成的图表必须存储在项目的`visuals/`子目录中。 ## **Team Implementation and Task Division (General Guidance)** ## **团队实施与任务分工(通用指南)** * **For 3-member groups**: Each member is primarily responsible for developing the analysis logic and report generation for one `report-{reponame}.py` script and its corresponding `report-{reponame}.md`. The `compare.py` script and `comparison_report.md` should be collaborative efforts. **3人小组**:每位成员主要负责开发一个`report-{reponame}.py`脚本及其对应报告的分析逻辑,`compare.py`脚本和对比报告需协作完成。 * **For 2-member groups**: Each member is primarily responsible for developing the analysis logic and report generation for one `report-{reponame}.py` script and its corresponding `report-{reponame}.md`. The `compare.py` script and `comparison_report.md` should be collaborative efforts. **2人小组**:每位成员主要负责开发一个`report-{reponame}.py`脚本及其对应报告的分析逻辑,`compare.py`脚本和对比报告需协作完成。 * The `.gitignore` file and overall project structure are shared responsibilities. `.gitignore`文件和整体项目结构为共同责任。 * The content for the main project `README.md` ("Chosen Repositories" and "Overall Comparative Analysis") is a collaborative writing effort. 主项目`README.md`的内容("所选仓库"和"整体对比分析")需协作撰写。 * All code contributions (Python scripts, `.gitignore` updates, etc.), generated report files (`.md`), and generated image files (`visuals/*.png`) must be submitted via Pull Requests (PRs) within your group's forked repository. 所有代码贡献(Python脚本、`.gitignore`更新等)、生成的报告文件(`.md`)和图像文件(`visuals/*.png`)必须通过Pull Requests(PR)提交到团队fork的仓库。 * Develop new work on feature branches. 在特性分支(feature branches)上进行开发。 * Ensure clear commit messages and PR descriptions. 确保清晰的提交消息和PR描述。 ## **Visualization Requirements** ## **可视化要求** - All generated charts must be saved as PNG files in the `visuals/` subdirectory. 所有生成的图表必须保存为PNG格式到`visuals/`子目录。 - Chart titles, axis labels, legends (where appropriate), and overall presentation should be clear, professional, and informative. 图表标题、坐标轴标签、图例(如适用)及整体呈现应清晰、专业且信息明确。 #### **A. For Each Individual Repository Report (`report-{reponame}.md`)**: #### **A. 每个独立仓库报告(`report-{reponame}.md`)**: 1. **Top 10 Contributors:** **前10位贡献者:** * **Description:** A horizontal bar chart displaying the top 10 contributors to the repository. **描述:** 展示仓库前10位贡献者的水平条形图。 * **Y-axis:** Contributor names. **Y轴:** 贡献者姓名。 * **X-axis:** Number of commits made by each contributor. **X轴:** 每位贡献者的提交次数。 * **Ordering:** Contributors should be ordered from most commits to fewest commits (descending). **排序:** 贡献者按提交次数从多到少降序排列。 2. **Commit Activity Over Last 12 Recorded Months:** **过去12个月提交活动:** * **Description:** A line chart illustrating the trend of commit activity per month. **描述:** 展示每月提交活动趋势的折线图。 * **X-axis:** Months, covering the last 12 months for which commit data exists in the repository. Format as "Month YYYY" (e.g., "Jun 2025"). **X轴:** 月份(仓库中存在提交数据的最近12个月),格式为"月 年"(如"Jun 2025")。 * **Y-axis:** Total number of commits made in each respective month. **Y轴:** 每月提交总数。 * **Markers:** Data points on the line should be clearly marked. **标记:** 折线上的数据点需清晰标注。 3. **Commit Activity by Day of the Week:** **按星期提交活动分布:** * **Description:** A bar chart showing the distribution of total commits across the days of the week. **描述:** 展示一周中各天提交总数分布的条形图。 * **X-axis:** Days of the week (e.g., Monday, Tuesday, ..., Sunday). **X轴:** 星期几(周一至周日)。 * **Y-axis:** Total number of commits made on each respective day of the week over the project's history. **Y轴:** 项目历史上每周各天的提交总数。 4. **Cumulative Commit Growth:** **累计提交增长:** * **Description:** A line chart depicting the cumulative growth of commits over the entire history of the project. **描述:** 展示项目整个历史中提交累计增长的折线图。 * **X-axis:** Time (dates of commits, chronologically ordered). **X轴:** 时间(按时间顺序排列的提交日期)。 * **Y-axis:** The cumulative (total) number of commits up to each point in time. This line should always be non-decreasing. **Y轴:** 截至每个时间点的累计提交总数,该线应始终保持非递减。 5. **Distribution of Lines Changed Per Commit:** **每次提交的代码行变更分布:** * **Description:** A box plot summarizing the distribution of the total number of lines (added + deleted) changed per commit. **描述:** 总结每次提交变更总行数(添加+删除)分布的箱线图。 * **Y-axis:** Number of lines changed. **Y轴:** 变更行数。 * **Details:** The box should represent the interquartile range (IQR), with a line for the median. Whiskers should extend to show the range of the data, but consider excluding extreme outliers from the main plot scale for better readability of the core distribution (e.g., by using Seaborn's `showfliers=False` parameter or similar techniques if outliers heavily skew the plot). **细节:** 箱体表示四分位距(IQR),中线为众数。须线应延伸展示数据范围,但为提升核心分布可读性,可考虑排除极端离群值(例如使用Seaborn的`showfliers=False`参数)。 6. **Distribution of Files Changed Per Commit:** **每次提交的文件变更分布:** * **Description:** A box plot summarizing the distribution of the number of files changed per commit. **描述:** 总结每次提交变更文件数分布的箱线图。 * **Y-axis:** Number of files changed. **Y轴:** 变更文件数。 * **Details:** Similar to the lines changed box plot, this should clearly show the median, IQR, and range, potentially managing extreme outliers for visual clarity. **细节:** 类似代码行变更箱线图,需清晰展示众数、IQR和范围,可管理极端离群值以提升可视化清晰度。 #### **B. For `comparison_report.md`**: #### **B. 对比报告(`comparison_report.md`)**: 1. **Cumulative Commit Growth Comparison:** **累计提交增长对比:** * **Description:** A single line chart comparing the cumulative commit growth of all analyzed repositories. **描述:** 对比所有分析仓库累计提交增长的折线图。 * **X-axis:** Time (dates, normalized or aligned appropriately if start dates differ significantly, or plotted against a common time axis). **X轴:** 时间(若起始日期差异显著则需标准化对齐,或使用统一时间轴)。 * **Y-axis:** The cumulative (total) number of commits. **Y轴:** 累计提交总数。 * **Legend:** A clear legend is required to distinguish the lines for each repository. **图例:** 需清晰图例区分各仓库曲线。 2. **Total Commits Comparison:** **提交总数对比:** * **Description:** A grouped bar chart comparing the total number of commits for each analyzed repository. **描述:** 对比各仓库提交总数的分组条形图。 * **X-axis:** Repository names **X轴:** 仓库名称 * **Y-axis:** Total number of commits. **Y轴:** 提交总数。 ## **Technical Requirements** ## **技术要求** ### **Core Libraries** ### **核心库** * You **must** use `git` (GitPython) Python library to interact with Git repositories. 必须使用`git`(GitPython)库操作Git仓库。 * You **must** use `seaborn` and `matplotlib` for generating all data visualizations. 必须使用`seaborn`和`matplotlib`生成所有数据可视化。 * You **must** use `numpy` for statistical calculations if needed. 如需要统计计算必须使用`numpy`。 * You will construct the Markdown content directly using Python strings. 需直接使用Python字符串构建Markdown内容。 ### **Repository Cloning** ### **仓库克隆** * Your Python scripts will need to clone the target Gitee repositories. Python脚本需克隆目标Gitee仓库。 * These cloned repositories should reside as subdirectories within your main project folder. 克隆的仓库应作为子目录存放在主项目文件夹中。 ### **Data Aggregation for Comparison** ### **对比数据聚合** * The `compare.py` script must programmatically access key summary statistics from each of the repositories analyzed by the team. Your individual `report-{reponame}.py` scripts must save small, structured temporary data files with key metrics that `compare.py` can then use. These temporary files **must be ignored by `.gitignore`**. `compare.py`脚本必须以编程方式访问团队分析的各仓库关键统计指标。单个`report-{reponame}.py`脚本需保存带关键指标的结构化临时数据文件供`compare.py`使用,这些临时文件必须被`.gitignore`忽略。 ### **Visualization Requirements** ### **可视化要求** * All charts must be saved as PNG files in the `visuals/` subdirectory. 所有图表必须保存为PNG格式到`visuals/`子目录。 * Chart titles, axis labels, legends (where appropriate), and overall presentation should be clear, professional, and informative. 图表标题、坐标轴标签、图例(如适用)及整体呈现应清晰、专业且信息明确。 ### **`.gitignore` Configuration** ### **`.gitignore`配置** * Your project must include a `.gitignore` file. 项目必须包含`.gitignore`文件。 * This file must be configured to ignore: 该文件必须配置为忽略: * The subdirectories containing the cloned repositories (e.g., `jeepay/`). 包含克隆仓库的子目录(如`jeepay/`)。 * The `.idea/` directory (used by PyCharm). PyCharm使用的`.idea/`目录。 * Any temporary intermediate data files 所有临时中间数据文件