CosyVoice2并发推理一直测试有问题

cosyvoice2并发推理测试有问题，不知道是模型问题还是环境包的版本问题，请大佬帮我看看，困扰好几天了。正好这几天看到cosyvoice2适配也有更新动作，重新部署了下。

## 环境清单：
1.机子是Atlas 900 RCK A2 Compute Node, 显卡是910B2 八卡
2.华为镜像swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:2.1.RC1-800I-A2-py311-openeuler24.03-lts
3.ascend-toolkit：8.2.RC1
4.启动挂载的是宿主机的驱动：npu-smi 24.1.rc2 Version: 24.1.rc2 ，说明下这里没有更新为官方要求的25.X,宿主机的驱动不敢随意更新，但是老驱动单线程也能推理

## 安装环节：
照着官方文档一步步操作下来的，唯一比较麻烦的是容器中操作cosyvoice2的requirements依赖库都是一个个手工安装的，哪个冲突了解决哪个，中间torch版本的问题试了很久。

官方提供的torch版本，启动会有错误，网上说需要替换torch_npu版本
#torch==2.3.1
#torch_npu==2.3.1.post6
#torchaudio==2.4.0

替换成如下版本问题解决
torch==2.3.1
torch-npu==2.3.1.post4
torchaudio==2.3.1

这是更换torch_audio版本前的日志
```log
Traceback (most recent call last):
  File "/data/unicom/CosyVoice2/CosyVoice/infer3.py", line 5, in <module>
    import torchaudio
  File "/usr/local/lib64/python3.11/site-packages/torchaudio/__init__.py", line 2, in <module>
    from . import _extension  # noqa  # usort: skip
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torchaudio/_extension/__init__.py", line 38, in <module>
    _load_lib("libtorchaudio")
  File "/usr/local/lib64/python3.11/site-packages/torchaudio/_extension/utils.py", line 60, in _load_lib
    torch.ops.load_library(path)
  File "/usr/local/lib64/python3.11/site-packages/torch/_ops.py", line 1032, in load_library
    ctypes.CDLL(path)
  File "/usr/lib64/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /usr/local/lib64/python3.11/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK3c105Error4whatEv
```

这是更换torch_npu前的错误日志
```log
RuntimeError: currentStreamCaptureStatusMayInitCtx:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:77 NPU function error: c10_npu::acl::Ac
lmdlRICaptureGetInfo(s.stream(false), &is_capturing, &model_ri), error code is 107003
[ERROR] 2025-08-15-13:40:45 (PID:268298, Device:0, RankID:-1) ERR00100 PTA call acl api failed
[Error]: The stream is not in the current context.
        Check whether the context where the stream is located is the same as the current context.
EE9999: Inner Error!
EE9999: [PID: 268298] 2025-08-15-13:40:45.008.497 stream is not in current ctx, stream_id=207.[FUNC:StreamGetCaptureInfo][FILE:api_impl.cc][
LINE:7889]
        TraceBack (most recent call last):
       rtStreamGetCaptureInfo execute failed, reason=[stream not in current context][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LIN
E:53]
```

## 测试环节
单线程测试下来没有问题，问题在多线程这里，用官方提供的测试脚本infer.py，写了一下线程代码模拟2个线程进行推理。出现如下错误：Executor不知道什么错误，本地也装不了这个包，所以源码看不到，还有看下来好像是第一个线程处理报错，第二个线程推理还是出来结果。另外我设置的export ASCEND_RT_VISIBLE_DEVICES=3显卡是3，为什么日志中open device 0 success永远是第一块显卡。

多线程测试error日志。
```log
/usr/local/lib64/python3.11/site-packages/torch_npu/contrib/transfer_to_npu.py:292: ImportWarning: 
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Generator, torch.set_default_device, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************
    
  warnings.warn(msg, ImportWarning)
/usr/local/lib64/python3.11/site-packages/torch_npu/contrib/transfer_to_npu.py:247: RuntimeWarning: torch.jit.script and torch.jit.script_method will be disabled by transfer_to_npu, which currently does not support them, if you need to enable them, please do not use transfer_to_npu.
  warnings.warn(msg, RuntimeWarning)
/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/__init__.py:3: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
/usr/lib/python3.11/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('zope')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/usr/lib/python3.11/site-packages/pkg_resources/__init__.py:2871: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('mpl_toolkits')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
2025-08-15 16:41:44,820 - modelscope - INFO - PyTorch version 2.3.1 Found.
2025-08-15 16:41:44,821 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2025-08-15 16:41:44,884 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 fa7696cf74b7b64919d15de95d1d70cc and a total number of 980 components indexed
/usr/local/lib/python3.11/site-packages/lightning/fabric/__init__.py:41: Deprecated call to `pkg_resources.declare_namespace('lightning.fabric')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
/usr/lib/python3.11/site-packages/pkg_resources/__init__.py:2350: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/usr/local/lib/python3.11/site-packages/lightning/pytorch/__init__.py:37: Deprecated call to `pkg_resources.declare_namespace('lightning.pytorch')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
/usr/lib/python3.11/site-packages/pkg_resources/__init__.py:2350: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/usr/local/lib/python3.11/site-packages/diffusers/models/lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
failed to import ttsfrd, use WeTextProcessing instead
Initializing CosyVoice2 model...
[INFO] input frame rate=25
/usr/local/lib/python3.11/site-packages/librosa/core/intervals.py:15: DeprecationWarning: path is deprecated. Use files() instead. Refer to https://importlib-resources.readthedocs.io/en/latest/using.html#migrating-from-legacy for migration advice.
  with resources.path("librosa.core", "intervals.msgpack") as imsgpack:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib64/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:65: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
  warnings.warn(
2025-08-15 16:42:10,870 WETEXT INFO building fst for zh_normalizer ...
[INFO] building fst for zh_normalizer ...
2025-08-15 16:43:05,344 WETEXT INFO done
[INFO] done
2025-08-15 16:43:05,345 WETEXT INFO fst path: /usr/local/lib/python3.11/site-packages/tn/zh_tn_tagger.fst
[INFO] fst path: /usr/local/lib/python3.11/site-packages/tn/zh_tn_tagger.fst
2025-08-15 16:43:05,345 WETEXT INFO           /usr/local/lib/python3.11/site-packages/tn/zh_tn_verbalizer.fst
[INFO]           /usr/local/lib/python3.11/site-packages/tn/zh_tn_verbalizer.fst
2025-08-15 16:43:05,355 WETEXT INFO found existing fst: /usr/local/lib/python3.11/site-packages/tn/en_tn_tagger.fst
[INFO] found existing fst: /usr/local/lib/python3.11/site-packages/tn/en_tn_tagger.fst
2025-08-15 16:43:05,356 WETEXT INFO                     /usr/local/lib/python3.11/site-packages/tn/en_tn_verbalizer.fst
[INFO]                     /usr/local/lib/python3.11/site-packages/tn/en_tn_verbalizer.fst
2025-08-15 16:43:05,356 WETEXT INFO skip building fst for en_normalizer ...
[INFO] skip building fst for en_normalizer ...
[WARN] acl repeat initialize
[INFO] acl init success
[INFO] open device 0 success
[INFO] get current context
[INFO] load model ./CosyVoice2-0.5B/flow_linux_aarch64.om success
[INFO] create model description success
[INFO] create new context
[INFO] load model ./CosyVoice2-0.5B/flow_static.om success
[INFO] create model description success
[INFO] create new context
[INFO] load model ./CosyVoice2-0.5B/speech_linux_aarch64.om success
[INFO] create model description success
Removing weight norm...
warm up start
warm up end
Starting concurrent inference...
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s][A[INFO] synthesis text 春天的阳光温柔地洒在花园里，各种花朵竞相绽放，蜜蜂和蝴蝶在花丛中翩翩起舞，一片生机勃勃的景象。
[INFO] synthesis text 收到好友从远方寄来的生日礼物，那份意外的惊喜和深深的祝福，让我心中充满了甜蜜的快乐，笑容如花儿般绽放。
  0%|          | 0/1 [00:22<?, ?it/s]
Exception in thread Thread-3 (no_stream_input_inference_thread):
Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 133, in run
    return super(TorchNpuGraph, self).run((inputs, assigned_outputs, stream))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Executor is not initialized.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/data/unicom/CosyVoice2/CosyVoice/infer3.py", line 36, in no_stream_input_inference_thread
    for _, j in enumerate(
  File "/data/unicom/CosyVoice2/CosyVoice/cosyvoice/cli/cosyvoice.py", line 74, in inference_sft
    for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
  File "/data/unicom/CosyVoice2/CosyVoice/cosyvoice/cli/model.py", line 380, in tts
    for i in self.llm.inference(text=text.to(self.device),
  File "/usr/local/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "/data/unicom/CosyVoice2/CosyVoice/cosyvoice/llm/llm.py", line 338, in inference
    y_pred, cache = self.llm.forward_one_step(lm_input,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/unicom/CosyVoice2/CosyVoice/cosyvoice/llm/llm.py", line 234, in forward_one_step
    outs = self.model(
           ^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/unicom/CosyVoice2/CosyVoice/transformers/src/transformers/models/qwen2/modeling_qwen2.py", line 846, in forward
    outputs, logits = self.model(
                      ^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/unicom/CosyVoice2/CosyVoice/transformers/src/transformers/models/qwen2/modeling_qwen2.py", line 512, in forward
    return self.cached_decode(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/inference/_cache_compiler.py", line 500, in __call__
    return self._compiled_model(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/inference/_cache_compiler.py", line 247, in compiled_method
    return compiled_fn(model, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/unicom/CosyVoice2/CosyVoice/transformers/src/transformers/models/qwen2/modeling_qwen2.py", line 528, in decode
    def decode(
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/inference/_cache_compiler.py", line 229, in compiled_fn
    return ge_kernel(*full_args)
           ^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 389, in kernel
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/ge/_ge_graph.py", line 659, in run
    return self._executor.run(inputs, assigned_outputs, stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 46, in wapper
    raise type(e)("\n".join(msg))
RuntimeError: Executor is not initialized.
Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_utils/error_code.py", line 43, in wapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/core/_backend.py", line 133, in run
    return super(TorchNpuGraph, self).run((inputs, assigned_outputs, stream))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Executor is not initialized.

[ERROR] 2025-08-15-16:44:33 (PID:312504, Device:0, RankID:-1) ERR03005 GRAPH internal error
.huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_ge_concrete_graph/fx2ge_converter.py:567: UserWarning: When enable frozen_parameter, Parameters will be considered frozen.Please make sure that the Parameters data address remain the same throughout the program runtime.
  warnings.warn(f'When enable frozen_parameter, Parameters will be considered frozen.'
..[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::unfold_backward' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
[INFO] yield speech len 1.84, rtf 44.46953016778697
/usr/local/lib64/python3.11/site-packages/torch_npu/dynamo/torchair/_ge_concrete_graph/fx2ge_converter.py:567: UserWarning: When enable frozen_parameter, Parameters will be considered frozen.Please make sure that the Parameters data address remain the same throughout the program runtime.
  warnings.warn(f'When enable frozen_parameter, Parameters will be considered frozen.'
..[INFO] yield speech len 2.0, rtf 19.574081778526306
[INFO] yield speech len 2.0, rtf 0.3767409324645996
[INFO] yield speech len 2.0, rtf 0.35327744483947754
[INFO] yield speech len 2.0, rtf 0.3983529806137085
[INFO] yield speech len 1.72, rtf 0.5841967671416527

100%|██████████| 1/1 [02:04<00:00, 124.27s/it][A
100%|██████████| 1/1 [02:04<00:00, 124.27s/it]
Thread 0: step 1 RTF: 10.75277373864989
Thread 0: save out wav file to thread_0_sft_out_1.wav
Thread 0: avg RTF: 10.75277373864989
All inference threads completed!
[INFO] unload model success, model Id is 2147483648
[INFO] unload model success, model Id is 2147483649
[INFO] unload model success, model Id is 2
[WARN] acl repeat destroy
```

Ascend/ModelZoo-PyTorch

内容风险标识

评论 (2)

Ascend/ModelZoo-PyTorch .gitee-modal { width: 500px !important; }

内容风险标识