@pengxiaopeng1
pengxiaopeng 暂无简介
针对训练&大模型场景,提供端到端命令行&可视化调试调优工具,帮助用户快速提高模型开发效率
TensorProbe (code name: kj600) is a LLM pretrain debugger with model's torch module , optimizer status, collective communication tensor collection and aggregation. It also supports rule-based alerts.