74 Star 218 Fork 167

Ascend / modelzoo

 / 详情

[哈工大] tf.image.extract_image_patches算子出错

DONE
Bug-Report
创建于  
2020-12-10 15:17

代码如下,其中mask_pad的shape为(16, 1, 319, 319, 16),g_sz=127

    mask_uf = tf.image.extract_image_patches(mask_pad, ksizes=[1, g_sz, g_sz, 1],
                                             strides=[1, 8, 8, 1],
                                             rates=[1, 1, 1, 1],
                                             padding='VALID')

报错信息如下:

[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.220 [tensor_engine/te_fusion/fusion_op.cc:5713]GetFinishedCompilationTask FinishedTask[0]: taskID[281446916809200:1637], status[1], kernel[None] res: prebuild failed. module[impl.extract_image_patches] func[extract_image_patches], compile_info: 0
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.334 [tensor_engine/te_fusion/fusion_op.cc:5716]GetFinishedCompilationTask Op error args: args:(), input:({'shape': (16, 1, 319, 319, 16), 'ori_shape': (16, 319, 319, 1), 'format': 'NC1HWC0', 'ori_format': 'NHWC', 'dtype': 'float16', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0},), outputs:({'shape': (16, 25, 25, 16129), 'ori_shape': (16, 25, 25, 16129), 'format': 'NHWC', 'ori_format': 'NHWC', 'dtype': 'float16', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0},), attrs:((1, 127, 127, 1), (1, 8, 8, 1), (1, 1, 1, 1), 'VALID')
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.369 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask Op python exception: Traceback (most recent call last):
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.390 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask   File "/usr/local/lib/python3.7/site-packages/te/platform/parallel_compilation.py", line 1184, in run
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.408 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask     int64_mode=self._int64_mode)
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.426 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask   File "/usr/local/lib/python3.7/site-packages/te/platform/fusion_manager.py", line 866, in build_single_op
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.443 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask     compile_info = call_op()
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.460 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask   File "/usr/local/lib/python3.7/site-packages/te/platform/fusion_manager.py", line 851, in call_op
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.478 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask     opfunc(*inputs, *outputs, *attrs, **kwargs)
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.496 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask   File "/usr/local/lib/python3.7/site-packages/te/utils/op_utils.py", line 597, in _in_wrapper
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.513 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask     return func(*args, **kwargs)
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.530 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask   File "/home/HwHiAiUser/Ascend/ascend-toolkit/latest/arm64-linux/opp/op_impl/built-in/ai_core/tbe/impl/extract_image_patches.py", line 876, in extract_image_patches
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.548 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask     (cut_h_col * fmap_w * fmap_c0 * type_size * DOUBLE_BUFFER))
[ERROR] TEFUSION(194884,python3.7):2020-12-10-15:11:56.811.565 [tensor_engine/te_fusion/fusion_op.cc:5720]GetFinishedCompilationTask RuntimeError: Input size is too large load to L1, while cut h, need size: 3756544
[ERROR] FE(194884,python3.7):2020-12-10-15:11:56.821.805 [fusion_engine/adapter/tbe_adapter/tbe_op_store_adapter.cpp:135]196108 ProcessFailPreCompTask:"tid[281446916809200], taskId[1637], node[model/ExtractImagePatches], precompile failed"

评论 (15)

zx 创建了Bug-Report
zx 关联仓库设置为Ascend/modelzoo
展开全部操作日志

tf.not_equal 之前这个算子也报错,不知道是不是没有实现

从您提供的日志来看,extract_image_patches在算子编译过程中报错,应该是算子泛化问题(该算子对于您的模型支持度不够),请您提供下整网中算子的详细信息(输入输出类型、格式与shape信息),谢谢!

zhengtao 负责人设置为刘俊
zhengtao 负责人刘俊 修改为wangyulong

初步定位为该规格已超出算子目前实现的约束限制,请问你这个输入规格是来自哪个网络场景吗?谢谢 @zx

zhengtao 任务状态TODO 修改为Analysing

初步定位为该规格已超出算子目前实现的约束限制,请问你这个输入规格是来自哪个网络场景吗?谢谢 @zx

@wangyulong 目标跟踪任务,siammask模型,开源pytorch代码 使用的函数是nn.unfold,这个操作可以视作卷积的一部分,用滑动窗口取patch,但没有filter,也没有将patch和filter作卷积的操作。对应tf中的tf.image.extract_image_patches 算子

目前是单通道图片,代码可以参考:

def extract_image_patches_v2(image, ksizes=127, strides=8):
    b, h, w, c = image.shape
    # print(b, h, w, c)
    extract_image_h = (h - ksizes) // strides + 1
    extract_image_w = (w - ksizes) // strides + 1
    # print(b, extract_image_h, extract_image_w, ksizes * ksizes * c)
    extract_image = np.zeros([b, extract_image_h, extract_image_w, ksizes * ksizes * c], dtype=np.float32)
    for i in range(extract_image_h):
        for j in range(extract_image_w):
            patch = image[:, i * strides:i * strides + ksizes, j * strides:j * strides + ksizes, :]
            extract_image[:, i, j, :] = np.reshape(patch, newshape=(b, -1,)).copy()
    return extract_image

调用如下:
mask_uf = tf.py_func(extract_image_patches_v2, [mask_pad], tf.float32)

也可以使用卷积来做:图片太大就有内存溢出问题

def tf_extract_image_patches_v2(image=None, ksizes=127, strides=8):
    image = np.random.normal(size=(8, 255, 255, 1)) * 255
    feature_map = tf.pad(image, paddings=[[0, 0], [32, 32], [32, 32], [0, 0]], mode="CONSTANT")
    g_sz = 127

    filter_data = np.zeros((g_sz, g_sz, 1, g_sz * g_sz), dtype=np.float64)
    for i in range(g_sz):
        for j in range(g_sz):
            filter_data[i][j][0][i * g_sz + j] = 1.0
    filters = tf.Variable(filter_data, dtype=np.float64)
    result = tf.nn.conv2d(feature_map, filters, strides=(1, 8, 8, 1), padding="VALID")
    print(result.shape)
    init = tf.global_variables_initializer()
    with tf.Session() as sess:
        print(sess.run(init))
        print(sess.run(result))

也可以使用循环来做


def tf_extract_image_patches_v3(image=None, ksizes=127, strides=8):
    print(3)
    print("tf_extract_image_patches_v3")
    b, h, w, c = image.shape
    # b, h, w, c = 8, 319, 319, 1
    print(b, h, w, c)
    extract_image_h = (h - ksizes) // strides + 1
    extract_image_w = (w - ksizes) // strides + 1
    print(b, extract_image_h, extract_image_w, ksizes * ksizes * c)
    # extract_image = np.zeros([b, extract_image_h, extract_image_w, ksizes * ksizes * c], dtype=np.float32)
    output_list = []
    for batch_ in range(b):
        for i in range(extract_image_h):
            for j in range(extract_image_w):
                patch = image[batch_, i * strides:i * strides + ksizes, j * strides:j * strides + ksizes, :]
                output_list.append(tf.reshape(patch, (-1,)))
    return tf.stack(output_list)

目前是单通道图片,代码可以参考:

def extract_image_patches_v2(image, ksizes=127, strides=8):
b, h, w, c = image.shape
# print(b, h, w, c)
extract_image_h = (h - ksizes) // strides + 1
extract_image_w = (w - ksizes) // strides + 1
# print(b, extract_image_h, extract_image_w, ksizes * ksizes * c)
extract_image = np.zeros([b, extract_image_h, extract_image_w, ksizes * ksizes * c], dtype=np.float32)
for i in range(extract_image_h):
for j in range(extract_image_w):
patch = image[:, i * strides:i * strides + ksizes, j * strides:j * strides + ksizes, :]
extract_image[:, i, j, :] = np.reshape(patch, newshape=(b, -1,)).copy()
return extract_image
调用如下:
mask_uf = tf.py_func(extract_image_patches_v2, [mask_pad], tf.float32)

也可以使用卷积来做:图片太大就有内存溢出问题

def tf_extract_image_patches_v2(image=None, ksizes=127, strides=8):
image = np.random.normal(size=(8, 255, 255, 1)) * 255
feature_map = tf.pad(image, paddings=[[0, 0], [32, 32], [32, 32], [0, 0]], mode="CONSTANT")
g_sz = 127
filter_data = np.zeros((g_sz, g_sz, 1, g_sz * g_sz), dtype=np.float64)
for i in range(g_sz):
for j in range(g_sz):
filter_data[i][j][0][i * g_sz + j] = 1.0
filters = tf.Variable(filter_data, dtype=np.float64)
result = tf.nn.conv2d(feature_map, filters, strides=(1, 8, 8, 1), padding="VALID")
print(result.shape)
init = tf.global_variables_initializer()
with tf.Session() as sess:
print(sess.run(init))
print(sess.run(result))

@zx 你好哇, 感谢你们分享的代码。我们针对第一个函数,改成了tensorflow版本。但是实际运行发现速度较慢。你们有针对tensorflow的版本吗 :question:

@zx 你好哇, 感谢你们分享的代码。我们针对第一个函数,改成了tensorflow版本。但是实际运行发现速度较慢。你们有针对tensorflow的版本吗 :question:

@forechoni 确实会影响速度,按照我的理解的话,tf.py_func会把数据拷到CPU,然后又拷贝回NPU,但是不知道这理解是否正确。我们这边GPU对比的话,应该是慢了3倍左右。 TensorFlow版本的就是这份代码tf_extract_image_patches_v2,但是我们用不了,图片太大了

@zx 我们也用不了tensorflow版本

目前是单通道图片,代码可以参考:

def extract_image_patches_v2(image, ksizes=127, strides=8):
b, h, w, c = image.shape
# print(b, h, w, c)
extract_image_h = (h - ksizes) // strides + 1
extract_image_w = (w - ksizes) // strides + 1
# print(b, extract_image_h, extract_image_w, ksizes * ksizes * c)
extract_image = np.zeros([b, extract_image_h, extract_image_w, ksizes * ksizes * c], dtype=np.float32)
for i in range(extract_image_h):
for j in range(extract_image_w):
patch = image[:, i * strides:i * strides + ksizes, j * strides:j * strides + ksizes, :]
extract_image[:, i, j, :] = np.reshape(patch, newshape=(b, -1,)).copy()
return extract_image
调用如下:
mask_uf = tf.py_func(extract_image_patches_v2, [mask_pad], tf.float32)

也可以使用卷积来做:图片太大就有内存溢出问题

def tf_extract_image_patches_v2(image=None, ksizes=127, strides=8):
image = np.random.normal(size=(8, 255, 255, 1)) * 255
feature_map = tf.pad(image, paddings=[[0, 0], [32, 32], [32, 32], [0, 0]], mode="CONSTANT")
g_sz = 127
filter_data = np.zeros((g_sz, g_sz, 1, g_sz * g_sz), dtype=np.float64)
for i in range(g_sz):
for j in range(g_sz):
filter_data[i][j][0][i * g_sz + j] = 1.0
filters = tf.Variable(filter_data, dtype=np.float64)
result = tf.nn.conv2d(feature_map, filters, strides=(1, 8, 8, 1), padding="VALID")
print(result.shape)
init = tf.global_variables_initializer()
with tf.Session() as sess:
print(sess.run(init))
print(sess.run(result))

@zx

发现无法使用-1 后续网络崩溃

@zx

发现无法使用-1 后续网络崩溃

@forechoni 修改一下代码

@forechoni 确实会影响速度,按照我的理解的话,tf.py_func会把数据拷到CPU,然后又拷贝回NPU,但是不知道这理解是否正确。我们这边GPU对比的话,应该是慢了3倍左右。 TensorFlow版本的就是这份代码tf_extract_image_patches_v2,但是我们用不了,图片太大了

@zx 最新的master已经上了一版issue中能够支持的版本,有尝试直接使用ExtractImagePatches算子吗

@zx 最新的master已经上了一版issue中能够支持的版本,有尝试直接使用ExtractImagePatches算子吗

@wangyulong 是直接新建一个docker,就能直接跑吗?

@wangyulong 是直接新建一个docker,就能直接跑吗?

@zx 您好,extract_image_patches已支持您在上述描述中提到的规格,您可以使用最新的软件版本进行验证。在docker中验证的话,需要替换extract_image_patches.py文件,该文件可以在/目录下使用find命令查找对应的路径。
find / -name extract_image_patches.py

目前已解决,感谢各位老师的支持!

zx 任务状态WIP 修改为DONE
王位 添加协作者AtlasAccount
王位 取消协作者AtlasAccount
吴定远 关联仓库Ascend/modelzoo-his 修改为Ascend/modelzoo

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(6)
1
https://gitee.com/ascend/modelzoo.git
git@gitee.com:ascend/modelzoo.git
ascend
modelzoo
modelzoo

搜索帮助