# Final **Repository Path**: gxt_1/final ## Basic Information - **Project Name**: Final - **Description**: Final Project Template Repo for DA402 2025 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 73 - **Created**: 2025-06-12 - **Last Updated**: 2025-06-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Final Project: Gitee Repository Analysis & Visualization # 期末项目:Gitee仓库分析与可视化 This final project requires your team to programmatically analyze multiple Gitee repositories chosen by your team. 本期末项目要求你的团队以编程方式分析团队选择的多个Gitee仓库。 You will use Python with the GitPython, Seaborn, and Matplotlib libraries to extract data, generate insightful visualizations, and produce comprehensive Markdown reports. 你将使用Python配合GitPython、Seaborn和Matplotlib库来提取数据、生成深度可视化图表并制作全面的Markdown报告。 **The project emphasizes:** **项目重点强调:** - Deep repository analysis - 深度仓库分析 - Effective data visualization - 高效数据可视化 - Comparative analysis across projects - 跨项目对比分析 - Team collaboration - 团队协作 - Adherence to proper Git workflow - 遵循规范的Git工作流程 - Creative exploration beyond core requirements - 超越核心需求的创造性探索 **25% of your grade will be based on creative extensions beyond base requirements, such as:** **25%的分数将基于超出基础要求的创造性扩展,例如:** - Implementing additional insightful statistics - 实现额外的深度统计指标 - Creating supplementary visualizations - 创建补充性可视化图表 - Enhancing report style and presentation - 增强报告样式和呈现效果 You may refer to `https://gitee.com/ArsalaBangash/RepoReport` for sample code and reports. 你可参考 `https://gitee.com/ArsalaBangash/RepoReport` 获取示例代码和报告。 ## **Repository Selection** ## **仓库选择** ### **Team Choice**: Your team will select the Gitee repositories to analyze. ### **团队选择**:你的团队将选择要分析的Gitee仓库。 * **2-member groups**: Select and analyze **two (2)** distinct Gitee repositories. **2人小组**:选择并分析**两个(2)**不同的Gitee仓库。 * **3-member groups**: Select and analyze **three (3)** distinct Gitee repositories. **3人小组**:选择并分析**三个(3)**不同的Gitee仓库。 ### **Selection Criteria**: Chosen repositories must: ### **选择标准**:所选仓库必须满足: * Be publicly accessible on Gitee. 在Gitee上公开可访问。 * Have at least 1 year of activity. 拥有至少1年的活跃记录。 * Contain over 250 commits. 包含超过250次提交。 * Have more than 10 authors 拥有超过10位作者 ## Project README Content ## 项目README内容 Your team's main project `README.md` (the one in your forked assignment repository) must be updated to include the following sections, written by your team: 团队的主要项目`README.md`(位于你fork的任务仓库中)必须更新为包含以下由团队撰写的部分: ### 1. Chosen Repositories ### 1. 所选仓库 * List the full Gitee URL for each chosen repository. https://gitee.com/dromara/sa-token https://gitee.com/milvus-io/milvus * Specify the short `reponame` your team will use for each repository for file naming purposes. This `reponame` is used consistently. 为每个仓库指定团队将使用的简短`reponame`用于文件命名,该`reponame`保持一致性。 report-sa-token.py report-milvus.py * Reasons for Choosing a Warehouse. 选择仓库的理由。 sa-token: Ease of use: Sa-Token provides a concise API that is easy to integrate into existing projects. Developers can quickly get started and implement basic authentication features. Powerful features: Sa-Token supports a variety of authentication methods, including token authentication and session authentication, to meet the needs of different scenarios. It supports functions such as permission control and role management to help developers better manage user permissions. 简单易用: Sa-Token 提供了简洁的 API,易于集成到现有的项目中。开发者可以快速上手并实现基本的身份验证功能。 强大的功能: Sa-Token 支持多种认证方式,包括 Token 认证、Session 认证等,满足不同场景的需求。 它支持权限控制、角色管理等功能,帮助开发者更好地管理用户权限。 milvus: Performance: Milvus is specifically designed to work with large-scale vector data and is capable of efficient vector similarity searches. It supports high-throughput and low-latency query operations, which is suitable for real-time application scenarios. Ease of Use: Milvus provides a clean API and easy-to-use client libraries that support multiple programming languages (e.g., Python, Java, Go, etc.). Documentation is detailed and comprehensive to help developers get started quickly and integrate into existing projects. 高性能: Milvus 是专门为处理大规模向量数据而设计的,能够高效地进行向量相似性搜索。 支持高吞吐量和低延迟的查询操作,适合实时应用场景。 易用性: Milvus 提供了简洁的 API 和易于使用的客户端库,支持多种编程语言(如 Python、Java、Go 等)。 文档详细且全面,帮助开发者快速上手并集成到现有项目中。 ### 2. Comparative Analysis & Key Findings ### 2. 对比分析与关键发现 * Discuss any interesting patterns, significant differences, or similarities observed across the analyzed repositories, drawing upon the data in `comparison_report.md` and the individual repository reports. 基于`comparison_report.md`和单个仓库报告的数据,讨论跨仓库分析中发现的任何有趣模式、显著差异或相似点。 Both warehouses have a lot of co-creators. 两个仓库都有很多的共创者 * Highlight 2-3 key insights or major takeaways from your overall project analysis. 突出整个项目分析中的2-3个关键洞察或主要结论。 Both warehouses are very active. 两个仓库都很活跃 There has been a period of rapid growth in cumulative submissions. 累计提交量都有急速增长的阶段 The total commit volume is very high, and there is little difference in the total commit volume between the two repositories. 总提交量都很多并且两个仓库的总提交量几乎没有差别 ## **Project Structure and Script Execution** ## **项目结构与脚本执行** * **Individual Repository Analysis Scripts ('report-sa-token.py','report-milvus.py')**: For each chosen repository, create a Python script named `report-{reponame}.py`. This script contains the logic to analyze one repository and generate its report. **独立仓库分析脚本('report-sa-token.py''report-milvus.py')**:为每个所选仓库创建Python脚本,包含分析单个仓库并生成报告的代码逻辑。 * **Individual Repository Reports (`report-sa-token.md`'report-milvus.md')**: `report-sa-token.py`,'report-milvus.py' script have generated corresponding Markdown report named `report-sa-token.md`,'report-milvus.md'. **独立仓库报告('report-sa-token.md''report-milvus.md')**:`report-sa-token.py`,'report-milvus.py'脚本已经生成对应的Markdown报告。 * **Cross-Repository Comparison Script (`compare.py`)**: Your team will create one Python script named `compare.py`. This script contains the logic to perform cross-repository comparisons. **跨仓库对比脚本(`compare.py`)**:团队需创建Python脚本,包含执行跨仓库对比的逻辑。 * **Cross-Repository Comparison Report (`comparison_report.md`)**: The 'compare.py' script has generated a comparison report in Markdown format. **跨仓库对比报告(`comparison_report.md`)**:`compare.py`脚本已经生成Markdown格式的对比报告。 * **Visualizations**: All generated charts are already stored in the project's 'visuals/' subdirectory. **可视化图表**:所有生成的图表已经存储在项目的`visuals/`子目录中。 ## **Team Implementation and Task Division (General Guidance)** ## **团队实施与任务分工(通用指南)** **Team of 2**: Each member is mainly responsible for developing a 'report-sa-token.py', 'report-milvus.py' script and its analysis logic for the corresponding report, and the 'compare.py' script and the comparison report need to be completed collaboratively **2人小组**:每位成员主要负责开发一个`report-sa-token.py`,'report-milvus.py'脚本及其对应报告的分析逻辑,`compare.py`脚本和对比报告需协作完成。 * The `.gitignore` file and overall project structure are shared responsibilities. `.gitignore`文件和整体项目结构为共同责任。 * The content of the main project 'README.md' ('Selected Repositories' and 'Overall Comparative Analysis') has been collaboratively written. 主项目`README.md`的内容("所选仓库"和"整体对比分析")已经协作撰写。 * All code contributions (Python scripts, .gitignore updates, etc.), generated report files (.md') and image files ('visuals/*.png') have been submitted to the team's forked repository via Pull Requests (PRs). 所有代码贡献(Python脚本、`.gitignore`更新等)、生成的报告文件(`.md`)和图像文件(`visuals/*.png`)已经通过Pull Requests(PR)提交到团队fork的仓库。 * Develop new work on feature branches. 在特性分支(feature branches)上进行开发。 * Ensure clear commit messages and PR descriptions. 确保清晰的提交消息和PR描述。 ## **Visualization Requirements** ## **可视化要求** - All generated charts must be saved as PNG files in the `visuals/` subdirectory. 所有生成的图表必须保存为PNG格式到`visuals/`子目录。 - Chart titles, axis labels, legends (where appropriate), and overall presentation should be clear, professional, and informative. 图表标题、坐标轴标签、图例(如适用)及整体呈现应清晰、专业且信息明确。 #### **A. For Each Individual Repository Report (`report-sa-token.md`,'report-milvus.md')**: #### **A. 每个独立仓库报告(`report-sa-token.md`,''report-milvus.md)**: 1. **Top 10 Contributors:** **前10位贡献者:** * **Description:** A horizontal bar chart displaying the top 10 contributors to the repository. **描述:** 展示仓库前10位贡献者的水平条形图。 * **Y-axis:** Contributor names. **Y轴:** 贡献者姓名。 * **X-axis:** Number of commits made by each contributor. **X轴:** 每位贡献者的提交次数。 * **Ordering:** Contributors should be ordered from most commits to fewest commits (descending). **排序:** 贡献者按提交次数从多到少降序排列。 2. **Commit Activity Over Last 12 Recorded Months:** **过去12个月提交活动:** * **Description:** A line chart illustrating the trend of commit activity per month. **描述:** 展示每月提交活动趋势的折线图。 * **X-axis:** Months, covering the last 12 months for which commit data exists in the repository. Format as "Month YYYY" (e.g., "Jun 2025"). **X轴:** 月份(仓库中存在提交数据的最近12个月),格式为"月 年"(如"Jun 2025")。 * **Y-axis:** Total number of commits made in each respective month. **Y轴:** 每月提交总数。 * **Markers:** Data points on the line should be clearly marked. **标记:** 折线上的数据点需清晰标注。 3. **Commit Activity by Day of the Week:** **按星期提交活动分布:** * **Description:** A bar chart showing the distribution of total commits across the days of the week. **描述:** 展示一周中各天提交总数分布的条形图。 * **X-axis:** Days of the week (e.g., Monday, Tuesday, ..., Sunday). **X轴:** 星期几(周一至周日)。 * **Y-axis:** Total number of commits made on each respective day of the week over the project's history. **Y轴:** 项目历史上每周各天的提交总数。 4. **Cumulative Commit Growth:** **累计提交增长:** * **Description:** A line chart depicting the cumulative growth of commits over the entire history of the project. **描述:** 展示项目整个历史中提交累计增长的折线图。 * **X-axis:** Time (dates of commits, chronologically ordered). **X轴:** 时间(按时间顺序排列的提交日期)。 * **Y-axis:** The cumulative (total) number of commits up to each point in time. This line should always be non-decreasing. **Y轴:** 截至每个时间点的累计提交总数,该线应始终保持非递减。 5. **Distribution of Lines Changed Per Commit:** **每次提交的代码行变更分布:** * **Description:** A box plot summarizing the distribution of the total number of lines (added + deleted) changed per commit. **描述:** 总结每次提交变更总行数(添加+删除)分布的箱线图。 * **Y-axis:** Number of lines changed. **Y轴:** 变更行数。 * **Details:** The box should represent the interquartile range (IQR), with a line for the median. Whiskers should extend to show the range of the data, but consider excluding extreme outliers from the main plot scale for better readability of the core distribution (e.g., by using Seaborn's `showfliers=False` parameter or similar techniques if outliers heavily skew the plot). **细节:** 箱体表示四分位距(IQR),中线为众数。须线应延伸展示数据范围,但为提升核心分布可读性,可考虑排除极端离群值(例如使用Seaborn的`showfliers=False`参数)。 6. **Distribution of Files Changed Per Commit:** **每次提交的文件变更分布:** * **Description:** A box plot summarizing the distribution of the number of files changed per commit. **描述:** 总结每次提交变更文件数分布的箱线图。 * **Y-axis:** Number of files changed. **Y轴:** 变更文件数。 * **Details:** Similar to the lines changed box plot, this should clearly show the median, IQR, and range, potentially managing extreme outliers for visual clarity. **细节:** 类似代码行变更箱线图,需清晰展示众数、IQR和范围,可管理极端离群值以提升可视化清晰度。 #### **B. For `comparison_report.md`**: #### **B. 对比报告(`comparison_report.md`)**: 1. **Cumulative Commit Growth Comparison:** **累计提交增长对比:** * **Description:** A single line chart comparing the cumulative commit growth of all analyzed repositories. **描述:** 对比所有分析仓库累计提交增长的折线图。 * **X-axis:** Time (dates, normalized or aligned appropriately if start dates differ significantly, or plotted against a common time axis). **X轴:** 时间(若起始日期差异显著则需标准化对齐,或使用统一时间轴)。 * **Y-axis:** The cumulative (total) number of commits. **Y轴:** 累计提交总数。 * **Legend:** A clear legend is required to distinguish the lines for each repository. **图例:** 需清晰图例区分各仓库曲线。 2. **Total Commits Comparison:** **提交总数对比:** * **Description:** A grouped bar chart comparing the total number of commits for each analyzed repository. **描述:** 对比各仓库提交总数的分组条形图。 * **X-axis:** Repository names **X轴:** 仓库名称 * **Y-axis:** Total number of commits. **Y轴:** 提交总数。 ## **Technical Requirements** ## **技术要求** ### **Core Libraries** ### **核心库** * You **must** use `git` (GitPython) Python library to interact with Git repositories. 必须使用`git`(GitPython)库操作Git仓库。 * You **must** use `seaborn` and `matplotlib` for generating all data visualizations. 必须使用`seaborn`和`matplotlib`生成所有数据可视化。 * You **must** use `numpy` for statistical calculations if needed. 如需要统计计算必须使用`numpy`。 * You will construct the Markdown content directly using Python strings. 需直接使用Python字符串构建Markdown内容。 ### **Repository Cloning** ### **仓库克隆** * Your Python scripts will need to clone the target Gitee repositories. Python脚本需克隆目标Gitee仓库。 * These cloned repositories should reside as subdirectories within your main project folder. 克隆的仓库应作为子目录存放在主项目文件夹中。 ### **Data Aggregation for Comparison** ### **对比数据聚合** * The `compare.py` script must programmatically access key summary statistics from each of the repositories analyzed by the team. Your individual `report-{reponame}.py` scripts must save small, structured temporary data files with key metrics that `compare.py` can then use. These temporary files **must be ignored by `.gitignore`**. `compare.py`脚本必须以编程方式访问团队分析的各仓库关键统计指标。单个`report-{reponame}.py`脚本需保存带关键指标的结构化临时数据文件供`compare.py`使用,这些临时文件必须被`.gitignore`忽略。 ### **Visualization Requirements** ### **可视化要求** * All charts must be saved as PNG files in the `visuals/` subdirectory. 所有图表必须保存为PNG格式到`visuals/`子目录。 * Chart titles, axis labels, legends (where appropriate), and overall presentation should be clear, professional, and informative. 图表标题、坐标轴标签、图例(如适用)及整体呈现应清晰、专业且信息明确。 ### **`.gitignore` Configuration** ### **`.gitignore`配置** * Your project must include a `.gitignore` file. 项目必须包含`.gitignore`文件。 * This file must be configured to ignore: 该文件必须配置为忽略: * The subdirectories containing the cloned repositories (e.g., `jeepay/`). 包含克隆仓库的子目录(如`jeepay/`)。 * The `.idea/` directory (used by PyCharm). PyCharm使用的`.idea/`目录。 * Any temporary intermediate data files 所有临时中间数据文件