代码拉取完成,页面将自动刷新
GptManager
ModelRunnerCpp
that wraps C++ gptSession
trtllm-build
command(already applied to blip2 and OPT )StoppingCriteria
and LogitsProcessor
in Python generate API (thanks to the contribution from @zhang-ge-hao)Models
Features
sequence_length
tensor to support proper lengths in beam-search
(when beam-width > 1 - see
tensorrt_llm/batch_manager/GptManager.h)excludeInputInOutput
in
GptManager
)pybind
)GptSession::Config::ctxMicroBatchSize
and
GptSession::Config::genMicroBatchSize
in
tensorrt_llm/runtime/gptSession.h)mComputeContextLogits
and
mComputeGenerationLogits
in
tensorrt_llm/runtime/gptModelConfig.h)logProbs
and cumLogProbs
(see "output_log_probs"
and
"cum_log_probs"
in GptManager
)Bug fixes
host_max_kv_cache_length
) in engine are not the same as expected in
the main branch" #369world_size = 2
("array split
does not result in an equal division") #374end_id
for various
models [C++ and Python]max_batch_size
in the engine
builder and max_num_sequences
in TrtGptModelOptionalParams? #65此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。