[Bug-Report|缺陷反馈]: 算子入图失败

### Describe the current behavior / 问题描述 (Mandatory / 必填)

使用cann-ops编译算子，然后使用torch.compile调用该算子

### Environment / 环境信息 (Mandatory / 必填)

8.2.RC1

### Steps to reproduce the issue / 重现步骤 (Mandatory / 必填)

使用cann-ops编译算子，然后使用torch.compile调用该算子

### Describe the expected behavior / 预期结果 (Mandatory / 必填)

正常输出

### Related log / screenshot / 日志 / 截图 (Mandatory / 必填)

[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.954.090 [npu_fx_compiler.py:330]52242 compiler inputs
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.954.248 [npu_fx_compiler.py:332]52242   input 0: FakeTensor(..., device='npu:0', size=(1536, 7168), dtype=torch.uint8)
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.954.430 [npu_fx_compiler.py:332]52242   input 1: FakeTensor(..., device='npu:0', size=(12, 56))
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.954.533 [npu_fx_compiler.py:333]52242   graph: graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%arg0_1, 2), kwargs = {})
    %empty : [num_users=1] = call_function[target=torch.ops.aten.empty.memory_format](args = ([1],), kwargs = {dtype: torch.int64, device: npu:0, pin_memory: False})
    %npu_dequant_fp8_weight : [num_users=1] = call_function[target=torch.ops.npu.npu_dequant_fp8_weight.default](args = (%mul, %arg1_1, %empty), kwargs = {})
    %mul_1 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%npu_dequant_fp8_weight, 2), kwargs = {})
    return (mul_1,)
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.955.340 [npu_fx_compiler.py:206]52242 before sym input optimization, graph is graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%arg0_1, 2), kwargs = {})
    %empty : [num_users=1] = call_function[target=torch.ops.aten.empty.memory_format](args = ([1],), kwargs = {dtype: torch.int64, device: npu:0, pin_memory: False})
    %npu_dequant_fp8_weight : [num_users=1] = call_function[target=torch.ops.npu.npu_dequant_fp8_weight.default](args = (%mul, %arg1_1, %empty), kwargs = {})
    %mul_1 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%npu_dequant_fp8_weight, 2), kwargs = {})
    return (mul_1,)
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.955.499 [npu_fx_compiler.py:201]52242 after sym input optimization, graph is graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%arg0_1, 2), kwargs = {})
    %empty : [num_users=1] = call_function[target=torch.ops.aten.empty.memory_format](args = ([1],), kwargs = {dtype: torch.int64, device: npu:0, pin_memory: False})
    %npu_dequant_fp8_weight : [num_users=1] = call_function[target=torch.ops.npu.npu_dequant_fp8_weight.default](args = (%mul, %arg1_1, %empty), kwargs = {})
    %mul_1 : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%npu_dequant_fp8_weight, 2), kwargs = {})
    return (mul_1,)
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.955.699 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.955.738 [npu_fx_compiler.py:86]52242 target: arg0_1
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.113 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.uint8, size=[1536, 7168] npu:Tensor(arg0_1:0, dtype=DT_UINT8, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.260 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.288 [npu_fx_compiler.py:86]52242 target: arg1_1
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.456 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.float32, size=[12, 56] npu:Tensor(arg1_1:0, dtype=DT_FLOAT, size=[12, 56]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.540 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.566 [npu_fx_compiler.py:86]52242 target: aten.mul.Tensor
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.671 [npu_fx_compiler.py:88]52242 input 0: Pack(meta:FakeTensor(dtype=torch.uint8, size=[1536, 7168] npu:Tensor(arg0_1:0, dtype=DT_UINT8, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.956.695 [npu_fx_compiler.py:88]52242 input 1: 2
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.421 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.uint8, size=[1536, 7168] npu:Tensor(Cast:0, dtype=DT_UINT8, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.525 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.554 [npu_fx_compiler.py:86]52242 target: aten.empty.memory_format
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.583 [npu_fx_compiler.py:88]52242 input 0: [1]
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.605 [npu_fx_compiler.py:90]52242 input dtype: torch.int64
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.629 [npu_fx_compiler.py:90]52242 input device: npu:0
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.957.648 [npu_fx_compiler.py:90]52242 input pin_memory: False
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.129 [_ge_graph.py:1160]52242 ge.Cast promote input 0 value 0.0 to dtype DT_FLOAT
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.415 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.int64, size=[1] npu:Tensor(Fill:0, dtype=DT_INT64, size=[1]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.518 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.545 [npu_fx_compiler.py:86]52242 target: npu.npu_dequant_fp8_weight.default
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.588 [npu_fx_compiler.py:88]52242 input 0: Pack(meta:FakeTensor(dtype=torch.uint8, size=[1536, 7168] npu:Tensor(Cast:0, dtype=DT_UINT8, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.619 [npu_fx_compiler.py:88]52242 input 1: Pack(meta:FakeTensor(dtype=torch.float32, size=[12, 56] npu:Tensor(arg1_1:0, dtype=DT_FLOAT, size=[12, 56]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.958.663 [npu_fx_compiler.py:88]52242 input 2: Pack(meta:FakeTensor(dtype=torch.int64, size=[1] npu:Tensor(Fill:0, dtype=DT_INT64, size=[1]))
[WARNING] TORCHAIR(52242,python):2025-08-26 15:15:04.969.930 [utils.py:53]52242 The usage of torchair.ge_concrete_graph .* will not be supported in the future, please complete the API switch as soon as possible.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.980.472 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.bfloat16, size=[1536, 7168] npu:Tensor(DequantFP8Weight:0, dtype=DT_BF16, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.980.618 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.980.652 [npu_fx_compiler.py:86]52242 target: aten.mul.Tensor
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.980.705 [npu_fx_compiler.py:88]52242 input 0: Pack(meta:FakeTensor(dtype=torch.bfloat16, size=[1536, 7168] npu:Tensor(DequantFP8Weight:0, dtype=DT_BF16, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.980.731 [npu_fx_compiler.py:88]52242 input 1: 2
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.981.302 [npu_fx_compiler.py:92]52242 output Pack(meta:FakeTensor(dtype=torch.bfloat16, size=[1536, 7168] npu:Tensor(Cast_2:0, dtype=DT_BF16, size=[1536, 7168]))
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.981.406 [npu_fx_compiler.py:85]52242 -------------------
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.981.434 [npu_fx_compiler.py:86]52242 target: output
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.981.478 [npu_fx_compiler.py:88]52242 input 0: (Pack(meta:FakeTensor(dtype=torch.bfloat16, size=[1536, 7168] npu:Tensor(Cast_2:0, dtype=DT_BF16, size=[1536, 7168])),)
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.981.638 [npu_fx_compiler.py:92]52242 output [Tensor(Cast_2:0, dtype=DT_BF16, size=[1536, 7168])]
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.981.959 [graph_pass.py:26]52242 find all host data ops dict_keys([]), and sym pack ops [].
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.982.049 [graph_pass.py:326]52242 before removing dead data, graph all inputs size=2.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.982.079 [graph_pass.py:338]52242 update ge graph data index from 0 to 0.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.982.105 [graph_pass.py:338]52242 update ge graph data index from 1 to 1.
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.982.129 [graph_pass.py:348]52242 after update index, graph all inputs size=2.
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.982.181 [utils.py:365]52242 Get graph graph_1 input placements [1, 1].
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.982.253 [graph_pass.py:113]52242 Find all TensorMove ops: dict_keys([]), reference ops: dict_keys([]) and Assign ops: dict_keys([]).
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.982.362 [graph_pass.py:382]52242 No side effect op found in graph, skip strict order optimization.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.982.403 [graph_pass.py:452]52242 No Cmo found in graph, skip Cmo strict order optimization.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.990.683 [npu_fx_compiler.py:254]52242 runtime inputs
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.990.760 [npu_fx_compiler.py:256]52242   input 0: <class 'torch.Tensor'>(torch.Size([1536, 7168]), torch.uint8, contiguous=True)
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.990.793 [npu_fx_compiler.py:256]52242   input 1: <class 'torch.Tensor'>(torch.Size([12, 56]), torch.float32, contiguous=True)
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.990.926 [fx2ge_converter.py:1078]52242 input process func is:
import torch
import numpy
def kernel(*args):
    ge_inputs = list(args)
    return ge_inputs

[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.991.189 [fx2ge_converter.py:431]52242 update the Format of output TensorDesc for input_0 to Format FORMAT_ND.
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.991.251 [fx2ge_converter.py:431]52242 update the Format of output TensorDesc for input_1 to Format FORMAT_ND.
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.377 [fx2ge_converter.py:993]52242 global compile options:
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.404 [fx2ge_converter.py:995]52242   ge.exec.enableEngineParallel: 0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.423 [fx2ge_converter.py:995]52242   ge.tiling_schedule_optimize: 0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.440 [fx2ge_converter.py:995]52242   ge.enableSingleStream: false
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.457 [fx2ge_converter.py:995]52242   ge.oo.level: O3
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.475 [fx2ge_converter.py:995]52242   ge.exportCompileStat: 2
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.492 [fx2ge_converter.py:995]52242   ge.exec.staticMemoryPolicy: 2
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.544 [utils.py:329]52242 Append ge.inputHintShape: 0:[1536, 7168];1:[12, 56] to local compile options.
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.570 [fx2ge_converter.py:1023]52242 local compile options:
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.591 [fx2ge_converter.py:1025]52242   ge.featureBaseRefreshable: 0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.612 [fx2ge_converter.py:1025]52242   ge.topoSortingMode: 1
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.629 [fx2ge_converter.py:1025]52242   ge.jit_compile: 2
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.647 [fx2ge_converter.py:1025]52242   mode: max-autotune
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.664 [fx2ge_converter.py:1025]52242   ge.exec.outputReuseMemIndexes: 0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.681 [fx2ge_converter.py:1025]52242   ge.deterministic: 0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.699 [fx2ge_converter.py:1025]52242   ge.exec.atomicCleanPolicy: 1
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.716 [fx2ge_converter.py:1025]52242   frozenInput: 0,0
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.733 [fx2ge_converter.py:1025]52242   ge.exec.allTensorNotEmpty: 1
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.991.750 [fx2ge_converter.py:1025]52242   ge.inputHintShape: 0:[1536, 7168];1:[12, 56]
[DEBUG] TORCHAIR(52242,python):2025-08-26 15:15:04.997.358 [_backend.py:120]52242 Load graph set_hint_shape input shape: [[1536, 7168], [12, 56]] , output shape: [[1536, 7168]]
[INFO] TORCHAIR(52242,python):2025-08-26 15:15:04.997.446 [fx2ge_converter.py:660]52242 start compile graph: graph_1.
[ERROR] TORCHAIR(52242,python):2025-08-26-15:15:10.772.013 [concrete_graph/concrete_graph.cpp:106]52242 The dim size of Ascend GE graph NetOutput: [] is not equal to FX graph NetOutput: [1536 7168]
Traceback (most recent call last):
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 125, in compile
    return super(TorchNpuGraph, self).compile()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The dim size of Ascend GE graph NetOutput: [] is not equal to FX graph NetOutput: [1536, 7168]. FX graph NetOutput shapes is : [[1536, 7168]], Ascend GE graph NetOutput shapes is : [[]]

During handling of the above exception, another exception occurred:

### Special notes for this issue/备注 (Optional / 选填)

[WARNING] GE(59283,python3):2025-08-28-09:15:48.994.580 [op_impl_space_registry.cc:259]59283 ConvertSoToRegistry:Failed to dlopen /usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/op_proto/lib/linux/x86_64//libcust_opsproto_rt2.0.so! errmsg:/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/customize/op_proto/lib/linux/x86_64//libcust_opsproto_rt2.0.so: undefined symbol: _ZNK16platform_ascendc15PlatformAscendC13GetCoreNumAivEv

Ascend/cann-ops
暂停

内容风险标识

评论 (0)

Ascend/cann-ops暂停 .gitee-modal { width: 500px !important; }

内容风险标识