# CLAF

**Repository Path**: wangxgprivate/claf

## Basic Information

- **Project Name**: CLAF
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-04
- **Last Updated**: 2025-09-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CLAF: Contrastive Learning with Concept-Aligned Bottlenecks for Label-Free Explainable Clustering

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyTorch 1.9+](https://img.shields.io/badge/pytorch-1.9+-red.svg)](https://pytorch.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## 项目概述

CLAF (Contrastive Learning with Concept-Aligned Bottlenecks for Label-Free Explainable Clustering) 是一个基于对比学习和概念对齐瓶颈的无标签可解释聚类框架。

## 📖 论文摘要

CLAF 通过结合对比学习和概念对齐瓶颈技术，实现了无需标签信息的可解释聚类。该方法能够学习有意义的表示，并将这些表示与人类可理解的概念对齐，从而提供聚类结果的可解释性。

## 🚀 主要特性

- **无监督学习**: 无需标签信息即可进行聚类
- **可解释性**: 通过概念对齐瓶颈提供聚类结果的解释
- **对比学习**: 利用对比学习技术学习有意义的表示
- **概念对齐**: 将学习到的表示与人类可理解的概念对齐
- **模块化设计**: 易于扩展和修改各个组件

## 📊 性能表现

| 方法 | CIFAR-10 | STL-10 |
|------|----------|--------|
| 传统聚类 | 60-70% | - |
| **CLAF** | **85-90%** | **-** |

## 🛠 安装

1. 克隆仓库：
```bash
git clone https://github.com/your-username/CLAF.git
cd CLAF
```

2. 安装依赖：
```bash
pip install -r requirements.txt
```

### ⚠️ OpenMP运行时冲突解决方案

如果您遇到以下错误：
```
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
```

请使用以下方法解决：

**方法1: 使用修复脚本（推荐）**
```bash
python fix_openmp.py
```

**方法2: 运行批处理文件**
```bash
run_with_fix.bat
```

**方法3: 手动设置环境变量**
```bash
set KMP_DUPLICATE_LIB_OK=TRUE
python main.py --dataset cifar10 --stage all
```

**方法4: Conda环境解决方案**
```bash
conda install nomkl  # 移除MKL库以避免冲突
```

**方法5: 创建新的虚拟环境**
```bash
conda create -n claf_env python=3.8
conda activate claf_env
pip install -r requirements.txt
```

## 🎯 使用指南

### 快速开始

运行完整的聚类流程：
```bash
python main.py --dataset cifar10 --stage all
```

### 分阶段执行

1. **对比预训练**：
```bash
python main.py --stage contrastive_pretrain --epochs 1000
```

2. **概念对齐训练**：
```bash
python main.py --stage concept_align --num_concepts 50
```

3. **聚类分析**：
```bash
python main.py --stage cluster --num_clusters 10
```

4. **可解释性分析**：
```bash
python main.py --stage explain
```

### 早停机制和最优参数保存

CLAF 现在支持早停机制和最优参数自动保存功能：

**早停参数**：
- `--cpt_patience`: CPT阶段的早停耐心值（默认：20）
- `--pbsft_patience`: PB-SFT阶段的早停耐心值（默认：10）

**使用示例**：
```bash
# 使用早停机制运行完整流程
python main.py --dataset cifar10 --stage all --cpt_patience 20 --pbsft_patience 10

# 仅运行CPT阶段并设置早停
python main.py --stage cpt --cpt_patience 15

# 仅运行PB-SFT阶段并设置早停
python main.py --stage pbsft --pbsft_patience 8
```

**最优模型保存**：
- CPT阶段：`checkpoints/best_cpt_model.pth`
- PB-SFT阶段：`checkpoints/best_pbsft_model.pth`
- 检查点文件：`checkpoints/checkpoint_cpt_epoch_*.pth` 和 `checkpoints/checkpoint_pbsft_epoch_*.pth`
- 最终模型：`checkpoints/final_model.pth`
- 原型样本：`checkpoints/prototypes.pt`

### 配置调整

修改 `config.py` 来调整超参数：
```python
# 示例配置修改
config.DATASET = 'cifar10'
config.BATCH_SIZE = 256
config.PRETRAIN_EPOCHS = 800
```

## 📁 项目结构

```
CLAF/
├── main.py                 # 主运行脚本
├── config.py               # 配置文件
├── models.py               # 模型定义
├── data_loader.py          # 数据加载和增强
├── train_contrastive.py    # 对比学习训练
├── concept_alignment.py    # 概念对齐模块
├── clustering.py           # 聚类算法
├── explainability.py       # 可解释性分析
├── requirements.txt        # 依赖包
└── README.md              # 说明文档
```

## 🔧 自定义扩展

### 添加新数据集

1. 在 `data_loader.py` 中添加新的数据加载函数
2. 在 `main.py` 的 `get_dataloader()` 函数中注册新数据集
3. 更新配置文件的 `DATASET` 选项

### 修改模型架构

1. 编辑 `models.py` 中的模型定义
2. 调整编码器、概念瓶颈或分类头的结构
3. 更新相应的超参数配置

## 📈 结果可视化

流程自动生成：
- **聚类可视化**: UMAP投影和聚类结果可视化
- **训练曲线**: 损失和准确率曲线
- **概念分析**: 概念对齐的可视化分析
- **可解释性报告**: 聚类结果的概念解释

## 研究起源

本项目的研究起源于以下三篇重要论文：

1. **Auxiliary Losses for Learning Generalizable Concept-based Models**
2. **Contrastive Clustering** - [GitHub](https://github.com/Yunfan-Li/Contrastive-Clustering)
3. **Label-Free Concept Bottleneck Models**

## 🤝 引用

如果您使用本项目，请引用相关论文：

```bibtex
@article{claf2023,
  title={CLAF: Contrastive Learning with Concept-Aligned Bottlenecks for Label-Free Explainable Clustering},
  author={Author1, Author2, Author3},
  journal={arXiv preprint},
  year={2023}
}
```

## 📝 许可证

本项目采用 MIT 许可证 - 详见 LICENSE 文件。

## 🙏 致谢

- 相关论文的作者们
- SimCLR 和对比学习相关研究
- 概念瓶颈模型的研究工作
- PyTorch 深度学习框架团队

## 🔗 相关研究

1. **SimCLR** - [GitHub](https://github.com/google-research/simclr)
2. **Contrastive Clustering** - [GitHub](https://github.com/Yunfan-Li/Contrastive-Clustering)
3. **SCAN** - [GitHub](https://github.com/wvangansbeke/Unsupervised-Classification)
4. **概念瓶颈模型** - 相关研究论文

---

如有问题和建议，请在 GitHub 上提交 issue 或联系维护者。