同步操作将从 Hugging Face 模型镜像/gemma-7b 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
library_name | tags |
---|---|
transformers |
Model Page: Gemma
Other Links/Technical Documentation:
Disclaimer: Information on Google's overall model safety strategy is highly confidential, commercially sensitive, and proprietary information of Google. Any public availability of this information could expose people who use Google's products and the greater public to security and safety risks. For those reasons, this model card excludes specific details for some of the evaluation processes and metrics used.
License: Terms
Authors: Google
Summary description and brief definition of input(s) / output(s).
This is a family of decoder-only, text-based large language models that are an open-checkpoint variation of the Gemini models from Google DeepMind. These models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. The key benefit of these models is that they offer the capability of advanced text generation in an open ecosystem, making them accessible to a wide developer and researcher community.
Below we share some code snippets on how to get quickly started with running the model. First make sure to pip install -U transformers
, then copy the snippet from the section that is relevant for your usecase.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(**input_text, return_tensors="pt")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b", device_map="auto")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
torch.float16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b", device_map="auto", torch_dtype=torch.float16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
torch.bfloat16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
bitsandbytes
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b", quantization_config=quantization_config)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("gg-hf/gemma-7b")
model = AutoModelForCausalLM.from_pretrained("gg-hf/gemma-7b", quantization_config=quantization_config)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
First make sure to install flash-attn
in your environment pip install flash-attn
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2"
).to(0)
Text string (e.g., a question, a prompt, or a document to be summarized).
Generated text in response to the input (e.g., an answer to the question, a summary of the document).
Data used for model training and how the data was processed.
These models were trained on a massive dataset of text data that includes a wide variety of sources, containing 8 trillion tokens. Here are the key components:
The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats.
Here are the key data cleaning and filtering methods applied to the training data:
Details about the model internals.
For the training of these models, the latest generation of Tensor Processing Unit (TPU) hardware was used (TPUv5e).
Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain:
Training was done using JAX and ML Pathways.
JAX allows researchers to take advantage of the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models.
Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; "the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow."
Model evaluation metrics and results.
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation:
Benchmark | Metric | 2.5B Params | 7B Params |
---|---|---|---|
MMLU | 5-shot, top-1 | 37.3 | 64.3 |
HellaSwag | 0-shot | 70.3 | 81.2 |
PIQA | 0-shot | 78.2 | 81.2 |
SocialIQA | 0-shot | 50.7 | 21.8 |
BooIQ | 0-shot | 71.5 | 83.2 |
WinoGrande | partial score | 64.2 | 72.3 |
CommonsenseQA | 7-shot | 64.7 | 71.3 |
OpenBookQA | 46.6 | 52.8 | |
ARC-e | 69.9 | 81.5 | |
ARC-c | 39.9 | 53.2 | |
[TriviaQA][triviaqa] | 5-shot | 49.5 | 63.4 |
[Natural Questions][naturalq] | 5-shot | 10.3 | 23 |
[HumanEval][humaneval] | pass@1 | 23.2 | 32.3 |
[MBPP][mbpp] | 3-shot | 30.6 | 44.4 |
[GSM8K][gsm8k] | maj@1 | 15.4 | 46.4 |
[MATH][math] | 4-shot | 12 | 24.3 |
[AGIEval][agieval] | 24.7 | 41.7 | |
[BIG-Bench][big-bench] | 35.6 | 55.1 | |
------------------------------ | ------------- | ----------- | --------- |
Average | 44.5 | 56.4 |
Ethics and safety evaluation approach and results.
These models were evaluated against a number of different categories relevant to ethics and safety:
The results of the safety evaluation metrics are within our internal tolerance thresholds.
Like any large language model, the Gemma models have certain limitations that users should be aware of.
Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. This is a non-comprehensive list of potential uses. The purpose of this list is to provide contextual information about what are the possible use-cases that the model creators considered as part of model training and development.
The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Risks identified and mitigations:
At the time of release, this family of models provides one of the best-performing and safest open large language model implementations available compared to similarly sized models.
Using the benchmark evaluation metrics described in this document, these models have been shown to provide superior performance to other, comparably-sized open model alternatives.
{# [lambada]: https://arxiv.org/abs/1606.06031 #} {# [wic]: https://arxiv.org/abs/1808.09121 #} [triviaqa]: https://arxiv.org/abs/1705.03551 [naturalq]: https://github.com/google-research-datasets/natural-questions [humaneval]: https://arxiv.org/abs/2107.03374 [mbpp]: https://arxiv.org/abs/2108.07732 {# [mathqa]: https://arxiv.org/abs/2108.07732 #} [gsm8k]: https://arxiv.org/abs/2110.14168 {# [realtox]: https://arxiv.org/abs/2009.11462 #} {# [bold]: https://arxiv.org/abs/2101.11718 #} {# [crows]: https://aclanthology.org/2020.emnlp-main.154/ #} [bbq]: https://arxiv.org/abs/2110.08193v2 {# [winogender]: https://arxiv.org/abs/1804.09301 #} {# [truthfulqa]: https://arxiv.org/abs/2109.07958 #} [winobias]: https://arxiv.org/abs/1804.06876 [math]: https://arxiv.org/abs/2103.03874 [agieval]: https://arxiv.org/abs/2304.06364 [big-bench]: https://arxiv.org/abs/2206.04615 {# [toxigen]: https://arxiv.org/abs/2203.09509 #} [wh-commitment]: https://www.whitehouse.gov/wp-content/uploads/2023/09/Voluntary-AI-Commitments-September-2023.pdf
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。