335 Star 1.5K Fork 854

MindSpore / docs

 / 详情

mindspore.dataset.GeneratorDataset收到的sampler的index不能是字符串类型

DONE
Bug-Report
创建于  
2022-11-22 11:29

有时候希望自定义的sampler能产生字符串类型的index传入__getitem__函数,但是目前貌似只能传入数值作为index,字符串类型的index会报错。下面是输入与报错信息

import mindspore.dataset as ds
import numpy as np
class MyAccessible:
    def __init__(self):
        self._data = np.arange(0,40).reshape(20,2)
        self._label = np.arange(40,60).reshape(20,1)
    def __getitem__(self, index):
        return self._data[index], self._label[index]
    def __len__(self):
        return len(self._data)
    
class MySampler(ds.Sampler):
    def __iter__(self):
        for i in range(0, 10, 2):
            yield str(i)
            
sampler = ds.IterSampler(sampler=MySampler())
dataset = ds.GeneratorDataset(source=MyAccessible(), column_names=["data", "label"],sampler=sampler)
for i in dataset:
    print(i)
RuntimeError                              Traceback (most recent call last)
Cell In [167], line 19
     17 sampler = ds.IterSampler(sampler=MySampler())
     18 dataset = ds.GeneratorDataset(source=MyAccessible(), column_names=["data", "label"],sampler=sampler)
---> 19 for i in dataset:
     20     print(i)

File D:\Anaconda\envs\ms\lib\site-packages\mindspore\dataset\engine\iterators.py:141, in Iterator.__next__(self)
    138     raise RuntimeError("Iterator does not have a running C++ pipeline.")
    140 # Note offload is applied inside _get_next() if applicable since get_next converts to output format
--> 141 data = self._get_next()
    142 if not data:
    143     if self.__index == 0:

File D:\Anaconda\envs\ms\lib\site-packages\mindspore\dataset\engine\iterators.py:253, in TupleIterator._get_next(self)
    245 """
    246 Returns the next record in the dataset as a list
    247 
    248 Returns:
    249     List, the next record in the dataset.
    250 """
    252 if self.offload_model is None:
--> 253     return [self._transform_md_to_output(t) for t in self._iterator.GetNextAsList()]
    254 data = [self._transform_md_to_tensor(t) for t in self._iterator.GetNextAsList()]
    255 if data:

RuntimeError: Unexpected error. TypeCast: TypeCast does not support cast from string to int64
Line of code : 369
File         : mindspore\ccsrc\minddata\dataset\kernels\data\data_utils.cc

把sampler中的 str(i)改成 i就能正常运行。希望能增加对字符串的支持

评论 (4)

singing4you 创建了Question

Please assign maintainer to check this issue.
请为此issue分配处理人。
@fangwenyi @chengxiaoli

Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!

fangwenyi 任务状态TODO 修改为ACCEPTED

你好,问题收到,我们已安排人员分析

fangwenyi 任务类型Question 修改为Bug-Report
fangwenyi 关联项目设置为MindSpore Issue Assistant
fangwenyi 添加了
 
mindspore-assistant
标签
fangwenyi 负责人设置为luoyang

您好,由于问题单时间较长可能会有版本gap暂时关闭,如您尝试新版本仍无法解决,可以反馈下具体信息,并将ISSUE状态修改为WIP,我们这边会进一步跟踪,谢谢

Shawny 任务状态ACCEPTED 修改为DONE

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
8108889 shawny233 1628167362
1
https://gitee.com/mindspore/docs.git
git@gitee.com:mindspore/docs.git
mindspore
docs
docs

搜索帮助