# DSTC **Repository Path**: xue-ce/DSTC ## Basic Information - **Project Name**: DSTC - **Description**: A Deep Spatiotemporal Trajectory Representation Learning Framework for Clustering - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2024-07-22 - **Last Updated**: 2024-09-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # A Deep Spatiotemporal Trajectory Representation Learning Framework for Clustering ## Install 运行 `pip install -r requirements.txt` ## Preprocess 在预处理步骤中,我们将时空空间划分为标记。提供了两种划分方法:**基于网格的划分**和**基于密度的划分**。 1. 将你的数据放在 `preprocessing/originData` 文件夹中。一个来自[CVRR](http://cvrr-nas.ucsd.edu/bmorris/datasets/dataset_trajectory_clustering.html) 的示例数据 `cross.h5` 已放置在此文件夹中。 2. 修改 `preprocessing/conf/preprocess_conf.json` 中的预处理配置文件。 配置参数如下: ``` "dataName": 数据文件的 h5 格式 "method": "density" 或 "grid" # parameters for grid-based division(网格划分参数) "gridCellSize": grid size in spatial dimension(空间维度的网格大小) "timeCellSize": grid size in temporal dimension, set to -1 to ignore the time dimension(时间维度的网格大小,设置 为 -1 则忽略时间维度) "minLon": minimum longitude(最小经度) "minLat": minimum latitude(最小纬度) "maxLon": maximum longitude "maxLat": maximum latitude "minTime": minimum time "maxTime": maximum time # parameters for density-based division "stme_k": 参数 k,用于密度划分算法 STME "stme_kt": 参数 kt,用于密度划分算法 STME "stme_min_pts": 参数 min_pts,用于密度划分算法 STME "hasLabel": 数据是否具有真实标签 ``` 3. 运行以下命令生成标记化的轨迹数据 ``` cd preprocessing python preprocess.py ``` 预处理之后,标记序列会生成在 `train/val/test.src` 和 `train/val/test.tar` 文件中,这些文件用于训练、验证和测试。区域信息和表示划分结果的词汇表分别保存在 `region.pkl` 和 `*-knearestvocabs.h5` 文件中。 ## Training 训练包括2中模式: `pretraining` 和 `joint-training`. 1. 修改 `train/config.yaml` 文件中的训练配置。 ``` "expId": experiment ID (实验ID( "dataName": dataset name (数据集名称) "mode": "pretraining" 或 "joint-training" "vocabSize": vocabulary size, output by the preprocess step (词汇大小,由预处理步骤输出) "clusterNum": number of clusters you want to get (聚类数量) "sourceData": dataset path (数据集路径) "epochs": epochs (训练轮数) "embeddingSize": length of embedding vector (嵌入大小) "hiddenSize": size of hidden layers (隐藏层大小) "batch": batch size (批处理大小) "t2vecBatch": pre-train batch (T2Vec批处理大小,预训练批大小) "learningRate": learning rate (学习率) "dropout": dropout parameter (dropout 率) "m2LearningRate": joint-learning learning rate (M2学习率,联合训练学习率) "distDecaySpeed": penalty parameter for far distance cell (距离衰减速度,远距离单元格的惩罚参数) "alpha": reconstruction loss (重建损失) "beta": soft cluster assignment loss (软聚类分配损失) "gamma": inter-cluster distance loss (聚类间距损失) "delta": neighbor loss (邻近损失) "kmeans": k-means loss (k-means损失) "hasLabel": if data have ground truth label (数据是否具有真实标签) "needSave": if save results (是否保存结果) "saveFreq": save frequency (保存频率) ``` 2. 运行以下命令开始训练 ``` cd training python dstc.py ``` 3. 检查结果。运行以下命令: ``` cd showResult python predict.py ``` t-SNE 结果显示在 `training/showResult/cluster_png/` 目录中。 # Acknowledgements 感谢 t2vec 的作者。我们使用了他们的代码,并受到了他们工作的启发 (https://github.com/boathit/t2vec). # BibTex ``` @ARTICLE{10403544, author={Wang, Chao and Huang, Jiahui and Wang, Yongheng and Lin, Zhengxuan and Jin, Xiongnan and Jin, Xing and Weng, Di and Wu, Yingcai}, journal={IEEE Transactions on Intelligent Transportation Systems}, title={A Deep Spatiotemporal Trajectory Representation Learning Framework for Clustering}, year={2024}, volume={25}, number={7}, pages={7687-7700}, doi={10.1109/TITS.2024.3350339}} ```