# Generation

每个框架都在它们各自的 `GenerationMixin` 类中实现了文本生成的 `generate` 方法：

- PyTorch [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationMixin.generate) 在 [GenerationMixin](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationMixin) 中实现。
- TensorFlow [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.TFGenerationMixin.generate) 在 [TFGenerationMixin](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.TFGenerationMixin) 中实现。
- Flax/JAX [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.FlaxGenerationMixin.generate) 在 [FlaxGenerationMixin](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.FlaxGenerationMixin) 中实现。

无论您选择哪个框架，都可以使用 [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig) 类实例对 generate 方法进行参数化。有关生成方法的控制参数的完整列表，请参阅此类。

要了解如何检查模型的生成配置、默认值是什么、如何临时更改参数以及如何创建和保存自定义生成配置，请参阅 [文本生成策略指南](../generation_strategies)。该指南还解释了如何使用相关功能，如token流。

## GenerationConfig[[transformers.GenerationConfig]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.GenerationConfig</name><anchor>transformers.GenerationConfig</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/configuration_utils.py#L82</source><parameters>[{"name": "**kwargs", "val": ""}]</parameters><paramsdesc></paramsdesc><paramsdesc1title>Parameters that control the length of the output</paramsdesc1title><paramsdesc1>

- **max_length** (`int`, *optional*, defaults to 20) --
  The maximum length the generated tokens can have. Corresponds to the length of the input prompt +
  `max_new_tokens`. Its effect is overridden by `max_new_tokens`, if also set.
- **max_new_tokens** (`int`, *optional*) --
  The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
- **min_length** (`int`, *optional*, defaults to 0) --
  The minimum length of the sequence to be generated. Corresponds to the length of the input prompt +
  `min_new_tokens`. Its effect is overridden by `min_new_tokens`, if also set.
- **min_new_tokens** (`int`, *optional*) --
  The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt.
- **early_stopping** (`bool` or `str`, *optional*, defaults to `False`) --
  Controls the stopping condition for beam-based methods, like beam-search. It accepts the following values:
  `True`, where the generation stops as soon as there are `num_beams` complete candidates; `False`, where an
  heuristic is applied and the generation stops when is it very unlikely to find better candidates;
  `"never"`, where the beam search procedure only stops when there cannot be better candidates (canonical
  beam search algorithm).
- **max_time** (`float`, *optional*) --
  The maximum amount of time you allow the computation to run for in seconds. generation will still finish
  the current pass after allocated time has been passed.
- **stop_strings** (`str or list[str]`, *optional*) --
  A string or a list of strings that should terminate generation if the model outputs them.

</paramsdesc1><paramsdesc2title>Parameters that control the generation strategy used</paramsdesc2title><paramsdesc2>

- **do_sample** (`bool`, *optional*, defaults to `False`) --
  Whether or not to use sampling ; use greedy decoding otherwise.
- **num_beams** (`int`, *optional*, defaults to 1) --
  Number of beams for beam search. 1 means no beam search.

</paramsdesc2><paramsdesc3title>Parameters that control the cache</paramsdesc3title><paramsdesc3>

- **use_cache** (`bool`, *optional*, defaults to `True`) --
  Whether or not the model should use the past last key/values attentions (if applicable to the model) to
  speed up decoding.
- **cache_implementation** (`str`, *optional*, default to `None`) --
  Name of the cache class that will be instantiated in `generate`, for faster decoding. Possible values are:

  - `"dynamic"`: `DynamicCache`
  - `"static"`: `StaticCache`
  - `"offloaded"`: `DynamicCache(offloaded=True)`
  - `"offloaded_static"`: `StaticCache(offloaded=True)`
  - `"quantized"`: `QuantizedCache`

  If none is specified, we will use the default cache for the model (which is often `DynamicCache`). See
  our [cache documentation](https://huggingface.co/docs/transformers/en/kv_cache) for further information.
- **cache_config** (`dict`, *optional*, default to `None`) --
  Arguments used in the key-value cache class can be passed in `cache_config`.
- **return_legacy_cache** (`bool`, *optional*, default to `True`) --
  Whether to return the legacy or new format of the cache when `DynamicCache` is used by default.

</paramsdesc3><paramsdesc4title>Parameters for manipulation of the model output logits</paramsdesc4title><paramsdesc4>

- **temperature** (`float`, *optional*, defaults to 1.0) --
  The value used to module the next token probabilities. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 1.0
- **top_k** (`int`, *optional*, defaults to 50) --
  The number of highest probability vocabulary tokens to keep for top-k-filtering. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 50.
- **top_p** (`float`, *optional*, defaults to 1.0) --
  If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to
  `top_p` or higher are kept for generation. This value is set in a model's `generation_config.json` file. If it isn't set, the default value is 1.0
- **min_p** (`float`, *optional*) --
  Minimum token probability, which will be scaled by the probability of the most likely token. It must be a
  value between 0 and 1. Typical values are in the 0.01-0.2 range, comparably selective as setting `top_p` in
  the 0.99-0.8 range (use the opposite of normal `top_p` values).
- **typical_p** (`float`, *optional*, defaults to 1.0) --
  Local typicality measures how similar the conditional probability of predicting a target token next is to
  the expected conditional probability of predicting a random token next, given the partial text already
  generated. If set to float < 1, the smallest set of the most locally typical tokens with probabilities that
  add up to `typical_p` or higher are kept for generation. See [this
  paper](https://huggingface.co/papers/2202.00666) for more details.
- **epsilon_cutoff** (`float`, *optional*, defaults to 0.0) --
  If set to float strictly between 0 and 1, only tokens with a conditional probability greater than
  `epsilon_cutoff` will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the
  size of the model. See [Truncation Sampling as Language Model
  Desmoothing](https://huggingface.co/papers/2210.15191) for more details.
- **eta_cutoff** (`float`, *optional*, defaults to 0.0) --
  Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between
  0 and 1, a token is only considered if it is greater than either `eta_cutoff` or `sqrt(eta_cutoff) *
  exp(-entropy(softmax(next_token_logits)))`. The latter term is intuitively the expected next token
  probability, scaled by `sqrt(eta_cutoff)`. In the paper, suggested values range from 3e-4 to 2e-3,
  depending on the size of the model. See [Truncation Sampling as Language Model
  Desmoothing](https://huggingface.co/papers/2210.15191) for more details.
- **repetition_penalty** (`float`, *optional*, defaults to 1.0) --
  The parameter for repetition penalty. 1.0 means no penalty. See [this
  paper](https://huggingface.co/papers/1909.05858) for more details.
- **encoder_repetition_penalty** (`float`, *optional*, defaults to 1.0) --
  The parameter for encoder_repetition_penalty. An exponential penalty on sequences that are not in the
  original input. 1.0 means no penalty.
- **length_penalty** (`float`, *optional*, defaults to 1.0) --
  Exponential penalty to the length that is used with beam-based generation. It is applied as an exponent to
  the sequence length, which in turn is used to divide the score of the sequence. Since the score is the log
  likelihood of the sequence (i.e. negative), `length_penalty` > 0.0 promotes longer sequences, while
  `length_penalty` < 0.0 encourages shorter sequences.
- **no_repeat_ngram_size** (`int`, *optional*, defaults to 0) --
  If set to int > 0, all ngrams of that size can only occur once.
- **bad_words_ids** (`list[list[int]]`, *optional*) --
  List of list of token ids that are not allowed to be generated. Check
  [NoBadWordsLogitsProcessor](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.NoBadWordsLogitsProcessor) for further documentation and examples.
- **renormalize_logits** (`bool`, *optional*, defaults to `False`) --
  Whether to renormalize the logits after applying all the logits processors (including the custom
  ones). It's highly recommended to set this flag to `True` as the search algorithms suppose the score logits
  are normalized but some logit processors break the normalization.
- **forced_bos_token_id** (`int`, *optional*, defaults to `model.config.forced_bos_token_id`) --
  The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful for
  multilingual models like [mBART](../model_doc/mbart) where the first generated token needs to be the target
  language token.
- **forced_eos_token_id** (`int` or list[int]`, *optional*, defaults to `model.config.forced_eos_token_id`) --
  The id of the token to force as the last generated token when `max_length` is reached. Optionally, use a
  list to set multiple *end-of-sequence* tokens.
- **remove_invalid_values** (`bool`, *optional*, defaults to `model.config.remove_invalid_values`) --
  Whether to remove possible *nan* and *inf* outputs of the model to prevent the generation method to crash.
  Note that using `remove_invalid_values` can slow down generation.
- **exponential_decay_length_penalty** (`tuple(int, float)`, *optional*) --
  This Tuple adds an exponentially increasing length penalty, after a certain amount of tokens have been
  generated. The tuple shall consist of: `(start_index, decay_factor)` where `start_index` indicates where
  penalty starts and `decay_factor` represents the factor of exponential decay
- **suppress_tokens** (`list[int]`, *optional*) --
  A list of tokens that will be suppressed at generation. The `SuppressTokens` logit processor will set their
  log probs to `-inf` so that they are not sampled.
- **begin_suppress_tokens**  (`list[int]`, *optional*) --
  A list of tokens that will be suppressed at the beginning of the generation. The `SuppressBeginTokens` logit
  processor will set their log probs to `-inf` so that they are not sampled.
- **sequence_bias** (`dict[tuple[int], float]`, *optional*)) --
  Dictionary that maps a sequence of tokens to its bias term. Positive biases increase the odds of the
  sequence being selected, while negative biases do the opposite. Check
  [SequenceBiasLogitsProcessor](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.SequenceBiasLogitsProcessor) for further documentation and examples.
- **token_healing** (`bool`, *optional*, defaults to `False`) --
  Heal tail tokens of prompts by replacing them with their appropriate extensions.
  This enhances the quality of completions for prompts affected by greedy tokenization bias.
- **guidance_scale** (`float`, *optional*) --
  The guidance scale for classifier free guidance (CFG). CFG is enabled by setting `guidance_scale > 1`.
  Higher guidance scale encourages the model to generate samples that are more closely linked to the input
  prompt, usually at the expense of poorer quality.
- **watermarking_config** (`BaseWatermarkingConfig` or `dict`, *optional*) --
  Arguments used to watermark the model outputs by adding a small bias to randomly selected set of "green"
  tokens. See the docs of `SynthIDTextWatermarkingConfig` and `WatermarkingConfig` for more
  details. If passed as `Dict`, it will be converted to a `WatermarkingConfig` internally.

</paramsdesc4><paramsdesc5title>Parameters that define the output variables of generate</paramsdesc5title><paramsdesc5>

- **num_return_sequences** (`int`, *optional*, defaults to 1) --
  The number of independently computed returned sequences for each element in the batch.
- **output_attentions** (`bool`, *optional*, defaults to `False`) --
  Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
  tensors for more details.
- **output_hidden_states** (`bool`, *optional*, defaults to `False`) --
  Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
  more details.
- **output_scores** (`bool`, *optional*, defaults to `False`) --
  Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
- **output_logits** (`bool`, *optional*) --
  Whether or not to return the unprocessed prediction logit scores. See `logits` under returned tensors for
  more details.
- **return_dict_in_generate** (`bool`, *optional*, defaults to `False`) --
  Whether or not to return a [ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput), as opposed to returning exclusively the generated
  sequence. This flag must be set to `True` to return the generation cache (when `use_cache` is `True`)
  or optional outputs (see flags starting with `output_`)

</paramsdesc5><paramsdesc6title>Special tokens that can be used at generation time</paramsdesc6title><paramsdesc6>

- **pad_token_id** (`int`, *optional*) --
  The id of the *padding* token.
- **bos_token_id** (`int`, *optional*) --
  The id of the *beginning-of-sequence* token.
- **eos_token_id** (`Union[int, list[int]]`, *optional*) --
  The id of the *end-of-sequence* token. Optionally, use a list to set multiple *end-of-sequence* tokens.

</paramsdesc6><paramsdesc7title>Generation parameters exclusive to encoder-decoder models</paramsdesc7title><paramsdesc7>

- **encoder_no_repeat_ngram_size** (`int`, *optional*, defaults to 0) --
  If set to int > 0, all ngrams of that size that occur in the `encoder_input_ids` cannot occur in the
  `decoder_input_ids`.
- **decoder_start_token_id** (`int` or `list[int]`, *optional*) --
  If an encoder-decoder model starts decoding with a different token than *bos*, the id of that token or a list of length
  `batch_size`. Indicating a list enables different start ids for each element in the batch
  (e.g. multilingual models with different target languages in one batch)

</paramsdesc7><paramsdesc8title>Generation parameters exclusive to assistant generation</paramsdesc8title><paramsdesc8>
- **is_assistant** (`bool`, *optional*, defaults to `False`) --
  Whether the model is an assistant (draft) model.
- **num_assistant_tokens** (`int`, *optional*, defaults to 20) --
  Defines the number of _speculative tokens_ that shall be generated by the assistant model before being
  checked by the target model at each iteration. Higher values for `num_assistant_tokens` make the generation
  more _speculative_ : If the assistant model is performant larger speed-ups can be reached, if the assistant
  model requires lots of corrections, lower speed-ups are reached.
- **num_assistant_tokens_schedule** (`str`, *optional*, defaults to `"constant"`) --
  Defines the schedule at which max assistant tokens shall be changed during inference.
  - `"heuristic"`: When all speculative tokens are correct, increase `num_assistant_tokens` by 2 else
    reduce by 1. `num_assistant_tokens` value is persistent over multiple generation calls with the same assistant model.
  - `"heuristic_transient"`: Same as `"heuristic"` but `num_assistant_tokens` is reset to its initial value after each generation call.
  - `"constant"`: `num_assistant_tokens` stays unchanged during generation
- **assistant_confidence_threshold** (`float`, *optional*, defaults to 0.4) --
  The confidence threshold for the assistant model. If the assistant model's confidence in its prediction for the current token is lower
  than this threshold, the assistant model stops the current token generation iteration, even if the number of _speculative tokens_
  (defined by `num_assistant_tokens`) is not yet reached. The assistant's confidence threshold is adjusted throughout the speculative iterations to reduce the number of unnecessary draft and target forward passes, biased towards avoiding false negatives.
  `assistant_confidence_threshold` value is persistent over multiple generation calls with the same assistant model.
  It is an unsupervised version of the dynamic speculation lookahead
  from Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models <https://huggingface.co/papers/2405.04304>.
- **prompt_lookup_num_tokens** (`int`, *optional*) --
  The number of tokens to be output as candidate tokens.
- **max_matching_ngram_size** (`int`, *optional*) --
  The maximum ngram size to be considered for matching in the prompt. Default to 2 if not provided.
- **assistant_early_exit(`int`,** *optional*) --
  If set to a positive integer, early exit of the model will be used as an assistant. Can only be used with
  models that support early exit (i.e. models where logits from intermediate layers can be interpreted by the LM head).
- **assistant_lookbehind(`int`,** *optional*, defaults to 10) --
  If set to a positive integer, the re-encodeing process will additionally consider the last `assistant_lookbehind` assistant tokens
  to correctly align tokens. Can only be used with different tokenizers in speculative decoding.
  See this [blog](https://huggingface.co/blog/universal_assisted_generation) for more details.
- **target_lookbehind(`int`,** *optional*, defaults to 10) --
  If set to a positive integer, the re-encodeing process will additionally consider the last `target_lookbehind` target tokens
  to correctly align tokens. Can only be used with different tokenizers in speculative decoding.
  See this [blog](https://huggingface.co/blog/universal_assisted_generation) for more details.

</paramsdesc8><paramsdesc9title>Parameters related to performances and compilation</paramsdesc9title><paramsdesc9>

- **compile_config** (CompileConfig, *optional*) --
  If using a compilable cache, this controls how `generate` will `compile` the forward pass for faster
  inference.
- **disable_compile** (`bool`, *optional*) --
  Whether to disable the automatic compilation of the forward pass. Automatic compilation happens when
  specific criteria are met, including using a compilable cache. Please open an issue if you find the
  need to use this flag.</paramsdesc9><paramgroups>9</paramgroups></docstring>

Class that holds a configuration for a generation task. A `generate` call supports the following generation methods
for text-decoder, text-to-text, speech-to-text, and vision-to-text models:

- *greedy decoding* if `num_beams=1` and `do_sample=False`
- *multinomial sampling* if `num_beams=1` and `do_sample=True`
- *beam-search decoding* if `num_beams>1` and `do_sample=False`
- *beam-search multinomial sampling* if `num_beams>1` and `do_sample=True`
- *assisted decoding* if `assistant_model` or `prompt_lookup_num_tokens` is passed to `.generate()`

To learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies).

<Tip>

A large number of these flags control the logits or the stopping criteria of the generation. Make sure you check
the [generate-related classes](https://huggingface.co/docs/transformers/internal/generation_utils) for a full
description of the possible manipulations, as well as examples of their usage.

</Tip>





<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>from_pretrained</name><anchor>transformers.GenerationConfig.from_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/configuration_utils.py#L768</source><parameters>[{"name": "pretrained_model_name", "val": ": typing.Union[str, os.PathLike]"}, {"name": "config_file_name", "val": ": typing.Union[str, os.PathLike, NoneType] = None"}, {"name": "cache_dir", "val": ": typing.Union[str, os.PathLike, NoneType] = None"}, {"name": "force_download", "val": ": bool = False"}, {"name": "local_files_only", "val": ": bool = False"}, {"name": "token", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **pretrained_model_name** (`str` or `os.PathLike`) --
  This can be either:

  - a string, the *model id* of a pretrained model configuration hosted inside a model repo on
    huggingface.co.
  - a path to a *directory* containing a configuration file saved using the
    [save_pretrained()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig.save_pretrained) method, e.g., `./my_model_directory/`.
- **config_file_name** (`str` or `os.PathLike`, *optional*, defaults to `"generation_config.json"`) --
  Name of the generation configuration JSON file to be loaded from `pretrained_model_name`.
- **cache_dir** (`str` or `os.PathLike`, *optional*) --
  Path to a directory in which a downloaded pretrained model configuration should be cached if the
  standard cache should not be used.
- **force_download** (`bool`, *optional*, defaults to `False`) --
  Whether or not to force to (re-)download the configuration files and override the cached versions if
  they exist.
- **resume_download** --
  Deprecated and ignored. All downloads are now resumed by default when possible.
  Will be removed in v5 of Transformers.
- **proxies** (`dict[str, str]`, *optional*) --
  A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
  'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
- **token** (`str` or `bool`, *optional*) --
  The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
  the token generated when running `hf auth login` (stored in `~/.huggingface`).
- **revision** (`str`, *optional*, defaults to `"main"`) --
  The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
  git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
  identifier allowed by git.

  <Tip>

  To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>"`.

  </Tip>

- **return_unused_kwargs** (`bool`, *optional*, defaults to `False`) --
  If `False`, then this function returns just the final configuration object.

  If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* is a
  dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the
  part of `kwargs` which has not been used to update `config` and is otherwise ignored.
- **subfolder** (`str`, *optional*, defaults to `""`) --
  In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
  specify the folder name here.
- **kwargs** (`dict[str, Any]`, *optional*) --
  The values in kwargs of any keys which are configuration attributes will be used to override the loaded
  values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled
  by the `return_unused_kwargs` keyword parameter.</paramsdesc><paramgroups>0</paramgroups><rettype>[GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig)</rettype><retdesc>The configuration object instantiated from this pretrained model.</retdesc></docstring>

Instantiate a [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig) from a generation configuration file.







<ExampleCodeBlock anchor="transformers.GenerationConfig.from_pretrained.example">

Examples:

```python
>>> from transformers import GenerationConfig

>>> # Download configuration from huggingface.co and cache.
>>> generation_config = GenerationConfig.from_pretrained("openai-community/gpt2")

>>> # E.g. config was saved using *save_pretrained('./test/saved_model/')*
>>> generation_config.save_pretrained("./test/saved_model/")
>>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/")

>>> # You can also specify configuration names to your generation configuration file
>>> generation_config.save_pretrained("./test/saved_model/", config_file_name="my_configuration.json")
>>> generation_config = GenerationConfig.from_pretrained("./test/saved_model/", "my_configuration.json")

>>> # If you'd like to try a minor variation to an existing configuration, you can also pass generation
>>> # arguments to `.from_pretrained()`. Be mindful that typos and unused arguments will be ignored
>>> generation_config, unused_kwargs = GenerationConfig.from_pretrained(
...     "openai-community/gpt2", top_k=1, foo=False, do_sample=True, return_unused_kwargs=True
... )
>>> generation_config.top_k
1

>>> unused_kwargs
{'foo': False}
```

</ExampleCodeBlock>

</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>from_model_config</name><anchor>transformers.GenerationConfig.from_model_config</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/configuration_utils.py#L1108</source><parameters>[{"name": "model_config", "val": ": PretrainedConfig"}]</parameters><paramsdesc>- **model_config** (`PretrainedConfig`) --
  The model config that will be used to instantiate the generation config.</paramsdesc><paramgroups>0</paramgroups><rettype>[GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig)</rettype><retdesc>The configuration object instantiated from those parameters.</retdesc></docstring>

Instantiates a [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig) from a [PretrainedConfig](/docs/transformers/v4.57.0/zh/main_classes/configuration#transformers.PretrainedConfig). This function is useful to convert legacy
[PretrainedConfig](/docs/transformers/v4.57.0/zh/main_classes/configuration#transformers.PretrainedConfig) objects, which may contain generation parameters, into a stand-alone [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig).








</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>save_pretrained</name><anchor>transformers.GenerationConfig.save_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/configuration_utils.py#L695</source><parameters>[{"name": "save_directory", "val": ": typing.Union[str, os.PathLike]"}, {"name": "config_file_name", "val": ": typing.Union[str, os.PathLike, NoneType] = None"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **save_directory** (`str` or `os.PathLike`) --
  Directory where the configuration JSON file will be saved (will be created if it does not exist).
- **config_file_name** (`str` or `os.PathLike`, *optional*, defaults to `"generation_config.json"`) --
  Name of the generation configuration JSON file to be saved in `save_directory`.
- **push_to_hub** (`bool`, *optional*, defaults to `False`) --
  Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
  repository you want to push to with `repo_id` (will default to the name of `save_directory` in your
  namespace).
- **kwargs** (`dict[str, Any]`, *optional*) --
  Additional key word arguments passed along to the [push_to_hub()](/docs/transformers/v4.57.0/zh/main_classes/model#transformers.utils.PushToHubMixin.push_to_hub) method.</paramsdesc><paramgroups>0</paramgroups></docstring>

Save a generation configuration object to the directory `save_directory`, so that it can be re-loaded using the
[from_pretrained()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig.from_pretrained) class method.




</div></div>

## GenerationMixin[[transformers.GenerationMixin]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.GenerationMixin</name><anchor>transformers.GenerationMixin</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/utils.py#L359</source><parameters>[]</parameters></docstring>

A class containing all functions for auto-regressive text generation, to be used as a mixin in model classes.
Inheriting from this class causes the model to have special generation-related behavior, such as loading a
`GenerationConfig` at initialization time or ensuring `generate`-related tests are run in `transformers` CI.

A model class should inherit from `GenerationMixin` to enable calling methods like `generate`, or when it
has defined a custom `generate` method that relies on `GenerationMixin`, directly or indirectly, which
approximately shares the same interface to public methods like `generate`. Three examples:
- `LlamaForCausalLM` should inherit from `GenerationMixin` to enable calling `generate` and other public
  methods in the mixin;
- `BlipForQuestionAnswering` has a custom `generate` method that approximately shares the same interface as
  `GenerationMixin.generate` (it has a few extra arguments, and the same output). That function also calls
  `GenerationMixin.generate` indirectly, through an inner model. As such, `BlipForQuestionAnswering` should
  inherit from `GenerationMixin` to benefit from all generation-related automation in our codebase;
- `BarkModel` has a custom `generate` method and one of its inner models calls `GenerationMixin.generate`.
  However, its `generate` does not share the same interface as `GenerationMixin.generate`. In this case,
  `BarkModel` should NOT inherit from `GenerationMixin`, as it breaks the `generate` interface.

The class exposes [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationMixin.generate), which can be used for:
- *greedy decoding* if `num_beams=1` and `do_sample=False`
- *multinomial sampling* if `num_beams=1` and `do_sample=True`
- *beam-search decoding* if `num_beams>1` and `do_sample=False`
- *beam-search multinomial sampling* if `num_beams>1` and `do_sample=True`
- *assisted decoding* if `assistant_model` or `prompt_lookup_num_tokens` is passed to `.generate()`

To learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies).



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>generate</name><anchor>transformers.GenerationMixin.generate</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/utils.py#L2233</source><parameters>[{"name": "inputs", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "generation_config", "val": ": typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None"}, {"name": "logits_processor", "val": ": typing.Optional[transformers.generation.logits_process.LogitsProcessorList] = None"}, {"name": "stopping_criteria", "val": ": typing.Optional[transformers.generation.stopping_criteria.StoppingCriteriaList] = None"}, {"name": "prefix_allowed_tokens_fn", "val": ": typing.Optional[typing.Callable[[int, torch.Tensor], list[int]]] = None"}, {"name": "synced_gpus", "val": ": typing.Optional[bool] = None"}, {"name": "assistant_model", "val": ": typing.Optional[ForwardRef('PreTrainedModel')] = None"}, {"name": "streamer", "val": ": typing.Optional[ForwardRef('BaseStreamer')] = None"}, {"name": "negative_prompt_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "use_model_defaults", "val": ": typing.Optional[bool] = None"}, {"name": "custom_generate", "val": ": typing.Union[str, typing.Callable, NoneType] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **inputs** (`torch.Tensor` of varying shape depending on the modality, *optional*) --
  The sequence used as a prompt for the generation or as model inputs to the encoder. If `None` the
  method initializes it with `bos_token_id` and a batch size of 1. For decoder-only models `inputs`
  should be in the format of `input_ids`. For encoder-decoder models *inputs* can represent any of
  `input_ids`, `input_values`, `input_features`, or `pixel_values`.
- **generation_config** ([GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig), *optional*) --
  The generation configuration to be used as base parametrization for the generation call. `**kwargs`
  passed to generate matching the attributes of `generation_config` will override them. If
  `generation_config` is not provided, the default will be used, which has the following loading
  priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
  configuration. Please note that unspecified parameters will inherit [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig)'s
  default values, whose documentation should be checked to parameterize generation.
- **logits_processor** (`LogitsProcessorList`, *optional*) --
  Custom logits processors that complement the default logits processors built from arguments and
  generation config. If a logit processor is passed that is already created with the arguments or a
  generation config an error is thrown. This feature is intended for advanced users.
- **stopping_criteria** (`StoppingCriteriaList`, *optional*) --
  Custom stopping criteria that complements the default stopping criteria built from arguments and a
  generation config. If a stopping criteria is passed that is already created with the arguments or a
  generation config an error is thrown. If your stopping criteria depends on the `scores` input, make
  sure you pass `return_dict_in_generate=True, output_scores=True` to `generate`. This feature is
  intended for advanced users.
- **prefix_allowed_tokens_fn** (`Callable[[int, torch.Tensor], list[int]]`, *optional*) --
  If provided, this function constraints the beam search to allowed tokens only at each step. If not
  provided no constraint is applied. This function takes 2 arguments: the batch ID `batch_id` and
  `input_ids`. It has to return a list with the allowed tokens for the next generation step conditioned
  on the batch ID `batch_id` and the previously generated tokens `inputs_ids`. This argument is useful
  for constrained generation conditioned on the prefix, as described in [Autoregressive Entity
  Retrieval](https://huggingface.co/papers/2010.00904).
- **synced_gpus** (`bool`, *optional*) --
  Whether to continue running the while loop until max_length. Unless overridden, this flag will be set
  to `True` if using `FullyShardedDataParallel` or DeepSpeed ZeRO Stage 3 with multiple GPUs to avoid
  deadlocking if one GPU finishes generating before other GPUs. Otherwise, defaults to `False`.
- **assistant_model** (`PreTrainedModel`, *optional*) --
  An assistant model that can be used to accelerate generation. The assistant model must have the exact
  same tokenizer. The acceleration is achieved when forecasting candidate tokens with the assistant model
  is much faster than running generation with the model you're calling generate from. As such, the
  assistant model should be much smaller.
- **streamer** (`BaseStreamer`, *optional*) --
  Streamer object that will be used to stream the generated sequences. Generated tokens are passed
  through `streamer.put(token_ids)` and the streamer is responsible for any further processing.
- **negative_prompt_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) --
  The negative prompt needed for some processors such as CFG. The batch size must match the input batch
  size. This is an experimental feature, subject to breaking API changes in future versions.
- **negative_prompt_attention_mask** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) --
  Attention_mask for `negative_prompt_ids`.
- **use_model_defaults** (`bool`, *optional*) --
  When it is `True`, unset parameters in `generation_config` will be set to the model-specific default
  generation configuration (`model.generation_config`), as opposed to the global defaults
  (`GenerationConfig()`). If unset, models saved starting from `v4.50` will consider this flag to be
  `True`.
- **custom_generate** (`str` or `Callable`, *optional*) --
  One of the following:
  - `str` (Hugging Face Hub repository name): runs the custom `generate` function defined at
    `custom_generate/generate.py` in that repository instead of the standard `generate` method. The
    repository fully replaces the generation logic, and the return type may differ.
  - `str` (local repository path): same as above but from a local path, `trust_remote_code` not required.
  - `Callable`: `generate` will perform the usual input preparation steps, then call the provided callable to
    run the decoding loop.
  For more information, see [the docs](../../generation_strategies#custom-generation-methods).
- **kwargs** (`dict[str, Any]`, *optional*) --
  Ad hoc parametrization of `generation_config` and/or additional model-specific kwargs that will be
  forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder
  specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*.</paramsdesc><paramgroups>0</paramgroups><rettype>[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) or `torch.LongTensor`</rettype><retdesc>A [ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) (if `return_dict_in_generate=True`
or when `config.return_dict_in_generate=True`) or a `torch.LongTensor`.

If the model is *not* an encoder-decoder model (`model.config.is_encoder_decoder=False`), the possible
[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) types are:

- [GenerateDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.GenerateDecoderOnlyOutput),
- [GenerateBeamDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.GenerateBeamDecoderOnlyOutput)

If the model is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible
[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) types are:

- [GenerateEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.GenerateEncoderDecoderOutput),
- [GenerateBeamEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.GenerateBeamEncoderDecoderOutput)</retdesc></docstring>


Generates sequences of token ids for models with a language modeling head.

<Tip warning={true}>

Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the
model's default generation configuration. You can override any `generation_config` by passing the corresponding
parameters to generate(), e.g. `.generate(inputs, num_beams=4, do_sample=True)`.

For an overview of generation strategies and code examples, check out the [following
guide](../generation_strategies).

</Tip>








</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>compute_transition_scores</name><anchor>transformers.GenerationMixin.compute_transition_scores</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/utils.py#L1386</source><parameters>[{"name": "sequences", "val": ": Tensor"}, {"name": "scores", "val": ": tuple"}, {"name": "beam_indices", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "normalize_logits", "val": ": bool = False"}]</parameters><paramsdesc>- **sequences** (`torch.LongTensor`) --
  The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or
  shorter if all batches finished early due to the `eos_token_id`.
- **scores** (`tuple(torch.FloatTensor)`) --
  Transition scores for each vocabulary token at each generation step. Beam transition scores consisting
  of log probabilities of tokens conditioned on log softmax of previously generated tokens in this beam.
  Tuple of `torch.FloatTensor` with up to `max_new_tokens` elements (one element for each generated token),
  with each tensor of shape `(batch_size*num_beams, config.vocab_size)`.
- **beam_indices** (`torch.LongTensor`, *optional*) --
  Beam indices of generated token id at each generation step. `torch.LongTensor` of shape
  `(batch_size*num_return_sequences, sequence_length)`. Only required if a `num_beams>1` at
  generate-time.
- **normalize_logits** (`bool`, *optional*, defaults to `False`) --
  Whether to normalize the logits (which, for legacy reasons, may be unnormalized).</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A `torch.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)` containing
the transition scores (logits)</retdesc></docstring>

Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was
used). This is a convenient method to quickly obtain the scores of the selected tokens at generation time.







<ExampleCodeBlock anchor="transformers.GenerationMixin.compute_transition_scores.example">

Examples:

```python
>>> from transformers import GPT2Tokenizer, AutoModelForCausalLM
>>> import numpy as np

>>> tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
>>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="pt")

>>> # Example 1: Print the scores for each token generated with Greedy Search
>>> outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
>>> transition_scores = model.compute_transition_scores(
...     outputs.sequences, outputs.scores, normalize_logits=True
... )
>>> # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
>>> # encoder-decoder models, like BART or T5.
>>> input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
>>> generated_tokens = outputs.sequences[:, input_length:]
>>> for tok, score in zip(generated_tokens[0], transition_scores[0]):
...     # | token | token string | log probability | probability
...     print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
|   262 |  the     | -1.414 | 24.33%
|  1110 |  day     | -2.609 | 7.36%
|   618 |  when    | -2.010 | 13.40%
|   356 |  we      | -1.859 | 15.58%
|   460 |  can     | -2.508 | 8.14%

>>> # Example 2: Reconstruct the sequence scores from Beam Search
>>> outputs = model.generate(
...     **inputs,
...     max_new_tokens=5,
...     num_beams=4,
...     num_return_sequences=4,
...     return_dict_in_generate=True,
...     output_scores=True,
... )
>>> transition_scores = model.compute_transition_scores(
...     outputs.sequences, outputs.scores, outputs.beam_indices, normalize_logits=False
... )
>>> # If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores.
>>> # Tip 1: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the
>>> # use case, you might want to recompute it with `normalize_logits=True`.
>>> # Tip 2: the output length does NOT include the input length
>>> output_length = np.sum(transition_scores.numpy() < 0, axis=1)
>>> length_penalty = model.generation_config.length_penalty
>>> reconstructed_scores = transition_scores.sum(axis=1) / (output_length**length_penalty)
>>> print(np.allclose(outputs.sequences_scores, reconstructed_scores))
True
```

</ExampleCodeBlock>

</div></div>

## TFGenerationMixin[[transformers.TFGenerationMixin]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.TFGenerationMixin</name><anchor>transformers.TFGenerationMixin</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/tf_utils.py#L444</source><parameters>[]</parameters></docstring>

A class containing all of the functions supporting generation, to be used as a mixin in [TFPreTrainedModel](/docs/transformers/v4.57.0/zh/main_classes/model#transformers.TFPreTrainedModel).

The class exposes [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.TFGenerationMixin.generate), which can be used for:
- *greedy decoding* by calling `greedy_search()` if `num_beams=1` and
  `do_sample=False`
- *contrastive search* by calling `contrastive_search()` if `penalty_alpha>0` and
  `top_k>1`
- *multinomial sampling* by calling `sample()` if `num_beams=1` and
  `do_sample=True`
- *beam-search decoding* by calling `beam_search()` if `num_beams>1`

You do not need to call any of the above methods directly. Pass custom parameter values to 'generate' instead. To
learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies).



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>generate</name><anchor>transformers.TFGenerationMixin.generate</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/tf_utils.py#L645</source><parameters>[{"name": "inputs", "val": ": typing.Optional[tensorflow.python.framework.tensor.Tensor] = None"}, {"name": "generation_config", "val": ": typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None"}, {"name": "logits_processor", "val": ": typing.Optional[transformers.generation.tf_logits_process.TFLogitsProcessorList] = None"}, {"name": "seed", "val": " = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **inputs** (`tf.Tensor` of varying shape depending on the modality, *optional*) --
  The sequence used as a prompt for the generation or as model inputs to the encoder. If `None` the
  method initializes it with `bos_token_id` and a batch size of 1. For decoder-only models `inputs`
  should of in the format of `input_ids`. For encoder-decoder models *inputs* can represent any of
  `input_ids`, `input_values`, `input_features`, or `pixel_values`.
- **generation_config** (`~generation.GenerationConfig`, *optional*) --
  The generation configuration to be used as base parametrization for the generation call. `**kwargs`
  passed to generate matching the attributes of `generation_config` will override them. If
  `generation_config` is not provided, the default will be used, which had the following loading
  priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
  configuration. Please note that unspecified parameters will inherit [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig)'s
  default values, whose documentation should be checked to parameterize generation.
- **logits_processor** (`LogitsProcessorList`, *optional*) --
  Custom logits processors that complement the default logits processors built from arguments and
  generation config. If a logit processor is passed that is already created with the arguments or a
  generation config an error is thrown. This feature is intended for advanced users.
- **seed** (`list[int]`, *optional*) --
  Random seed to control sampling, containing two integers, used when `do_sample` is `True`. See the
  `seed` argument from stateless functions in `tf.random`.
- **kwargs** (`dict[str, Any]`, *optional*) --
  Ad hoc parametrization of `generate_config` and/or additional model-specific kwargs that will be
  forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder
  specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*.</paramsdesc><paramgroups>0</paramgroups><rettype>[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) or `tf.Tensor`</rettype><retdesc>A [ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) (if `return_dict_in_generate=True` or when
`config.return_dict_in_generate=True`) or a `tf.Tensor`.

If the model is *not* an encoder-decoder model (`model.config.is_encoder_decoder=False`), the possible
[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) types are:

- [TFGreedySearchDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFGreedySearchDecoderOnlyOutput),
- [TFSampleDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFSampleDecoderOnlyOutput),
- [TFBeamSearchDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFBeamSearchDecoderOnlyOutput),
- [TFBeamSampleDecoderOnlyOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFBeamSampleDecoderOnlyOutput)

If the model is an encoder-decoder model (`model.config.is_encoder_decoder=True`), the possible
[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput) types are:

- [TFGreedySearchEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFGreedySearchEncoderDecoderOutput),
- [TFSampleEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFSampleEncoderDecoderOutput),
- [TFBeamSearchEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFBeamSearchEncoderDecoderOutput),
- [TFBeamSampleEncoderDecoderOutput](/docs/transformers/v4.57.0/zh/internal/generation_utils#transformers.generation.TFBeamSampleEncoderDecoderOutput)</retdesc></docstring>

Generates sequences of token ids for models with a language modeling head.

<Tip warning={true}>

Most generation-controlling parameters are set in `generation_config` which, if not passed, will be set to the
model's default generation configuration. You can override any `generation_config` by passing the corresponding
parameters to generate, e.g. `.generate(inputs, num_beams=4, do_sample=True)`.

For an overview of generation strategies and code examples, check out the [following
guide](../generation_strategies).

</Tip>








</div>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>compute_transition_scores</name><anchor>transformers.TFGenerationMixin.compute_transition_scores</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/tf_utils.py#L477</source><parameters>[{"name": "sequences", "val": ": Tensor"}, {"name": "scores", "val": ": tuple"}, {"name": "beam_indices", "val": ": typing.Optional[tensorflow.python.framework.tensor.Tensor] = None"}, {"name": "normalize_logits", "val": ": bool = False"}]</parameters><paramsdesc>- **sequences** (`tf.Tensor`) --
  The generated sequences. The second dimension (sequence_length) is either equal to `max_length` or
  shorter if all batches finished early due to the `eos_token_id`.
- **scores** (`tuple(tf.Tensor)`) --
  Transition scores for each vocabulary token at each generation step. Beam transition scores consisting
  of log probabilities of tokens conditioned on log softmax of previously generated tokens Tuple of
  `tf.Tensor` with up to `max_new_tokens` elements (one element for each generated token), with each
  tensor of shape `(batch_size*num_beams, config.vocab_size)`.
- **beam_indices** (`tf.Tensor`, *optional*) --
  Beam indices of generated token id at each generation step. `tf.Tensor` of shape
  `(batch_size*num_return_sequences, sequence_length)`. Only required if a `num_beams>1` at
  generate-time.
- **normalize_logits** (`bool`, *optional*, defaults to `False`) --
  Whether to normalize the logits (which, for legacy reasons, may be unnormalized).</paramsdesc><paramgroups>0</paramgroups><rettype>`tf.Tensor`</rettype><retdesc>A `tf.Tensor` of shape `(batch_size*num_return_sequences, sequence_length)` containing
the transition scores (logits)</retdesc></docstring>

Computes the transition scores of sequences given the generation scores (and beam indices, if beam search was
used). This is a convenient method to quickly obtain the scores of the selected tokens at generation time.







<ExampleCodeBlock anchor="transformers.TFGenerationMixin.compute_transition_scores.example">

Examples:

```python
>>> from transformers import GPT2Tokenizer, TFAutoModelForCausalLM
>>> import numpy as np

>>> tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
>>> model = TFAutoModelForCausalLM.from_pretrained("openai-community/gpt2")
>>> tokenizer.pad_token_id = tokenizer.eos_token_id
>>> inputs = tokenizer(["Today is"], return_tensors="tf")

>>> # Example 1: Print the scores for each token generated with Greedy Search
>>> outputs = model.generate(**inputs, max_new_tokens=5, return_dict_in_generate=True, output_scores=True)
>>> transition_scores = model.compute_transition_scores(
...     outputs.sequences, outputs.scores, normalize_logits=True
... )
>>> # input_length is the length of the input prompt for decoder-only models, like the GPT family, and 1 for
>>> # encoder-decoder models, like BART or T5.
>>> input_length = 1 if model.config.is_encoder_decoder else inputs.input_ids.shape[1]
>>> generated_tokens = outputs.sequences[:, input_length:]
>>> for tok, score in zip(generated_tokens[0], transition_scores[0]):
...     # | token | token string | logits | probability
...     print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.3f} | {np.exp(score.numpy()):.2%}")
|   262 |  the     | -1.414 | 24.33%
|  1110 |  day     | -2.609 | 7.36%
|   618 |  when    | -2.010 | 13.40%
|   356 |  we      | -1.859 | 15.58%
|   460 |  can     | -2.508 | 8.14%

>>> # Example 2: Reconstruct the sequence scores from Beam Search
>>> outputs = model.generate(
...     **inputs,
...     max_new_tokens=5,
...     num_beams=4,
...     num_return_sequences=4,
...     return_dict_in_generate=True,
...     output_scores=True,
... )
>>> transition_scores = model.compute_transition_scores(
...     outputs.sequences, outputs.scores, outputs.beam_indices, normalize_logits=False
... )
>>> # If you sum the generated tokens' scores and apply the length penalty, you'll get the sequence scores.
>>> # Tip: recomputing the scores is only guaranteed to match with `normalize_logits=False`. Depending on the
>>> # use case, you might want to recompute it with `normalize_logits=True`.
>>> output_length = np.sum(transition_scores.numpy() < 0, axis=1)
>>> length_penalty = model.generation_config.length_penalty
>>> reconstructed_scores = np.sum(transition_scores, axis=1) / (output_length**length_penalty)
>>> print(np.allclose(outputs.sequences_scores, reconstructed_scores))
True
```

</ExampleCodeBlock>

</div></div>

## FlaxGenerationMixin[[transformers.FlaxGenerationMixin]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class transformers.FlaxGenerationMixin</name><anchor>transformers.FlaxGenerationMixin</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/flax_utils.py#L130</source><parameters>[]</parameters></docstring>

A class containing all functions for auto-regressive text generation, to be used as a mixin in
[FlaxPreTrainedModel](/docs/transformers/v4.57.0/zh/main_classes/model#transformers.FlaxPreTrainedModel).

The class exposes [generate()](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.FlaxGenerationMixin.generate), which can be used for:
- *greedy decoding* by calling `_greedy_search()` if `num_beams=1` and
  `do_sample=False`
- *multinomial sampling* by calling `_sample()` if `num_beams=1` and
  `do_sample=True`
- *beam-search decoding* by calling `_beam_search()` if `num_beams>1` and
  `do_sample=False`

You do not need to call any of the above methods directly. Pass custom parameter values to 'generate' instead. To
learn more about decoding strategies refer to the [text generation strategies guide](../generation_strategies).



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>generate</name><anchor>transformers.FlaxGenerationMixin.generate</anchor><source>https://github.com/huggingface/transformers/blob/v4.57.0/src/transformers/generation/flax_utils.py#L270</source><parameters>[{"name": "input_ids", "val": ": Array"}, {"name": "generation_config", "val": ": typing.Optional[transformers.generation.configuration_utils.GenerationConfig] = None"}, {"name": "prng_key", "val": ": typing.Optional[jax.Array] = None"}, {"name": "trace", "val": ": bool = True"}, {"name": "params", "val": ": typing.Optional[dict[str, jax.Array]] = None"}, {"name": "logits_processor", "val": ": typing.Optional[transformers.generation.flax_logits_process.FlaxLogitsProcessorList] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **input_ids** (`jnp.ndarray` of shape `(batch_size, sequence_length)`) --
  The sequence used as a prompt for the generation.
- **generation_config** (`~generation.GenerationConfig`, *optional*) --
  The generation configuration to be used as base parametrization for the generation call. `**kwargs`
  passed to generate matching the attributes of `generation_config` will override them. If
  `generation_config` is not provided, the default will be used, which had the following loading
  priority: 1) from the `generation_config.json` model file, if it exists; 2) from the model
  configuration. Please note that unspecified parameters will inherit [GenerationConfig](/docs/transformers/v4.57.0/zh/main_classes/text_generation#transformers.GenerationConfig)'s
  default values, whose documentation should be checked to parameterize generation.
- **trace** (`bool`, *optional*, defaults to `True`) --
  Whether to trace generation. Setting `trace=False` should only be used for debugging and will lead to a
  considerably slower runtime.
- **params** (`dict[str, jnp.ndarray]`, *optional*) --
  Optionally the model parameters can be passed. Can be useful for parallelized generation.
- **logits_processor** (`FlaxLogitsProcessorList `, *optional*) --
  Custom logits processors that complement the default logits processors built from arguments and
  generation config. If a logit processor is passed that is already created with the arguments or a
  generation config an error is thrown. This feature is intended for advanced users.
- **kwargs** (`dict[str, Any]`, *optional*) --
  Ad hoc parametrization of `generate_config` and/or additional model-specific kwargs that will be
  forwarded to the `forward` function of the model. If the model is an encoder-decoder model, encoder
  specific kwargs should not be prefixed and decoder specific kwargs should be prefixed with *decoder_*.</paramsdesc><paramgroups>0</paramgroups><retdesc>[ModelOutput](/docs/transformers/v4.57.0/zh/main_classes/output#transformers.utils.ModelOutput).</retdesc></docstring>

Generates sequences of token ids for models with a language modeling head.






</div></div>

<EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/zh/main_classes/text_generation.md" />