Attention近似计算:1)CoRe: An Efficient Coarse-refined Training Framework for BERT 2)Efficient content-based sparse attention with routing transformers 3)Fast transformers with clustered attention
FFN 近似计算: oRe: An Efficient Coarse-refined Training Framework for BERT
Attention近似计算:1)CoRe: An Efficient Coarse-refined Training Framework for BERT 2)Efficient content-based sparse attention with routing transformers 3)Fast transformers with clustered attention
FFN 近似计算: oRe: An Efficient Coarse-refined Training Framework for BERT
Attention近似计算:1)CoRe: An Efficient Coarse-refined Training Framework for BERT 2)Efficient content-based sparse attention with routing transformers 3)Fast transformers with clustered attention
FFN 近似计算: oRe: An Efficient Coarse-refined Training Framework for BERT
【导师及邮箱】
qianjiahong@huawei.com
新值
【任务分值】 50分
【背景描述】
MoE(Mixture of Expert,混合专家)结构是扩展大模型参数量的主要手段。MoE中重要的组件是路由机制(gate),负责把token发送给各个专家。不同的路由机制有不同的通信开销和模型收敛效果。探索高效的路由机制有着重要意义。