# CTRL_Project

**Repository Path**: poetic_Coding/ctrl_-project

## Basic Information

- **Project Name**: CTRL_Project
- **Description**: 银行信贷客户LTV预测建模与营销动作决策优化项目
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-05-07
- **Last Updated**: 2026-05-07

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CTRL：面向银行信贷场景的因果时序表示学习框架

**Causal Temporal Representation Learning for Bank Credit LTV Optimization**

> Target venues: KDD / NeurIPS / ICML  
> Core contribution: **CS-CQL** — Causally-Shaped Conservative Q-Learning

---

## 项目结构

```
ctrl_project/
├── configs/              # 各模块超参数配置 (YAML)
├── data/
│   ├── raw/              # 原始业务表 (CSV, 只读)
│   └── processed/        # 特征工程输出 (Parquet / HDF5)
├── src/
│   ├── utils/            # 公共工具 (指标/IO/随机种子/常量)
│   ├── module1_data_quality/   # 数据质量检验
│   ├── module2_feature_eng/    # 特征工程 Pipeline
│   ├── module3_supervised/     # LightGBM 监督学习基线
│   ├── module4_ctrl_encoder/   # CTRL Causal Transformer 编码器
│   ├── module5_cate/           # CATE 估计 (R-Learner + 时序倾向分)
│   ├── module6_cscql/          # CS-CQL 核心算法
│   ├── module7_ope/            # DR 离线策略评估
│   ├── module8_ablation/       # 消融实验 E0–E9
│   ├── module9_application/    # 月度业务报告生成
│   ├── module10_monitoring/    # 模型监控与滚动更新
│   └── module11_public_data/   # 公开数据集泛化验证
├── scripts/
│   ├── run_pipeline.py         # 端到端 Pipeline 入口
│   └── run_experiments.py      # 论文实验 / Figure 1-5 / Table 1-4
├── tests/                # 单元测试 (pytest)
├── outputs/              # 模型权重 / 图表 / 指标 JSON
├── reports/              # 月度业务报告
└── requirements.txt
```

---

## 快速开始

### 1. 环境安装

```bash
pip install -r requirements.txt
```

### 2. 准备原始数据

将以下 CSV 文件放入 `data/raw/`：

| 文件名 | 对应业务表 |
|---|---|
| `t1_client_info.csv`    | 客户基本信息表 |
| `t3_credit.csv`         | 授信表 |
| `t4_loan.csv`           | 借款表 |
| `t5_repay.csv`          | 还款表 |
| `t6_telemarketing.csv`  | 电销表 |
| `t7_coupon.csv`         | 优惠券表 |
| `t11_macro.csv`         | 宏观经济指标表 |
| `t12_churn_label.csv`   | 流失标签表 (可选) |

公开数据集 (可选，用于 Module 11 泛化验证)：

```
data/raw/ihdp.npz
data/raw/criteo_uplift_v2.csv
data/raw/lending_club_loans.csv
```

### 3. 运行完整 Pipeline

```bash
# 全流程一键运行
python scripts/run_pipeline.py --phase all

# 分阶段运行
python scripts/run_pipeline.py --phase data    # Module 1+2
python scripts/run_pipeline.py --phase train   # Module 3+4
python scripts/run_pipeline.py --phase cate    # Module 5
python scripts/run_pipeline.py --phase rl      # Module 6
python scripts/run_pipeline.py --phase eval    # Module 7+8
python scripts/run_pipeline.py --phase report  # Module 9+10

# 演习模式（不执行）
python scripts/run_pipeline.py --phase all --dry-run
```

### 4. 复现论文实验与图表

```bash
# 复现所有图表和表格
python scripts/run_experiments.py --exp all

# 单独复现
python scripts/run_experiments.py --exp figure1   # 奖励密度对比图
python scripts/run_experiments.py --exp figure2   # Theorem 2 ε_τ 实验
python scripts/run_experiments.py --exp figure3   # 课程调度消融
python scripts/run_experiments.py --exp figure4   # t-SNE 表示可视化
python scripts/run_experiments.py --exp figure5   # CATE质量 vs 策略性能
python scripts/run_experiments.py --exp table1    # LTV 预测对比
python scripts/run_experiments.py --exp table2    # CATE 估计对比
python scripts/run_experiments.py --exp table3    # 策略离线评估
python scripts/run_experiments.py --exp ablation  # E0–E9 消融实验
```

### 5. 运行单元测试

```bash
pytest tests/ -v
```

---

## 模块说明

### Module 4: CTRL Causal Transformer Encoder (贡献一)

联合训练损失：

```
L = L_pred + α·L_churn + β·L_CF + γ·L_IPM(MMD²)
```

- `L_pred`：LTV 回归损失
- `L_churn`：流失预测二分类损失  
- `L_CF`：反事实头监督损失（在观测动作上）
- `L_IPM`：MMD² 因果平衡正则化（处理组/对照组表示对齐）

### Module 5: CATE 估计 (贡献二)

使用时序倾向分 $\hat{e}(A_t | h_{i,t})$ 替代截面倾向分，
通过 R-Learner 伪结果精炼 CATE 估计，降低时序混淆偏差。

**Theorem 1**: 时序倾向分相比截面倾向分的 CATE 偏差降低量：

$$B(\hat{e}_{\text{cs}}) - B(\hat{e}_{\text{seq}}) \geq \frac{c \cdot \mathbb{E}[\delta_t]}{\eta^2} - O(\epsilon_e)$$

### Module 6: CS-CQL (贡献三 — 核心)

势函数型奖励整形：

```
Φ(h_t) = max_a τ̂*(h_t, a)
r'_t = r_t + λ(k) · [γ · Φ(h_{t+1}) − Φ(h_t)]
λ(k) = λ₀ · exp(−κ · k)   # 课程整形调度
```

**Theorem 2**: CS-CQL 次优性上界：

$$V^*(s) - V^{\hat{\pi}}(s) \leq \frac{2\gamma\epsilon_\tau}{(1-\gamma)^2} + \delta_{\text{CQL}}$$

---

## 关键配置

所有超参数集中在 `configs/` 目录，修改后重新运行对应 phase 即可。

| 文件 | 控制范围 |
|---|---|
| `base_config.yaml` | 数据路径、时序切割、特征分组、动作空间 |
| `module3_config.yaml` | LightGBM 超参搜索空间 |
| `module4_config.yaml` | CTRL 编码器架构 + 损失权重 α/β/γ |
| `module5_config.yaml` | 倾向分模型 + R-Learner 配置 |
| `module6_config.yaml` | CS-CQL + 课程整形 λ₀/κ 搜索空间 |

---

## 实验结果输出

```
outputs/
├── module3/   LGB 基线 metrics + SHAP 图
├── module4/   CTRL 编码器权重 + 训练曲线
├── module5/   CATE 模型 + Qini 曲线 + 安慰剂检验
├── module6/   Q-Network 权重 + Figure 1/2/3
├── module7/   Table 3 策略对比 CSV
├── module8/   Table 4 消融实验 CSV
└── figures/   Figure 1–5 (PNG)
reports/
├── report_a_value_map_YYYY-MM.csv
├── report_b_execution_list_YYYY-MM.csv
└── report_c_model_health_YYYY-MM.json
```

---

## 引用

如使用本框架，请引用：

```bibtex
@inproceedings{ctrl2025,
  title  = {CTRL: Causal Temporal Representation Learning for
            Bank Credit Customer LTV Optimization},
  author = {...},
  booktitle = {Proceedings of KDD 2025},
  year   = {2025},
}
```

---

## 依赖版本

```
Python >= 3.10
PyTorch >= 2.0.0
LightGBM >= 4.0.0
d3rlpy == 2.8.0
econml >= 0.15.0
scikit-learn >= 1.3.0
```

详见 `requirements.txt`。