Inference API#

This document describes the API used for MK1 Flywheel. All fields presented here are equivalent between all cloud providers unless specifically noted, although their format may differ. For example, some providers may use a REST API while others may use function calls.

Inputs#

This section describes the input parameters for the MK1 Flywheel API.

text : string (REQUIRED)

The initial input text provided to the model. This text serves as the context or prompt based on which the model generates additional content.

max_tokens : integer (REQUIRED)

The maximum number of tokens that the model will generate in response to the input text.

eos_token_ids : List[integer] (default: [])

A list of token IDs that signify the end of a sequence. When the model generates one of these tokens, it considers the output complete and stops generating further tokens.

Danger

By default, if no eos_token_ids are provided, the model will generate tokens until it reaches the max_tokens limit. In production, it is recommended to always provide eos_token_ids to ensure the model stops generating text at a reasonable point. Check the model’s vocabulary to find the token IDs for the desired end-of-sequence tokens.

max_input_tokens : integer (default: 0)

This specifies the maximum number of tokens allowed in the input text. If the input text exceeds this number, it will be truncated.

num_samples : integer (default: 1)

The number of independent completions to generate for the given input text. Each sample is generated separately and may result in different outputs.

stop : List[string] (default: [])

A list of strings where, if the model generates them, it will stop further text generation. The stop string is not included in the returned output.

temperature : float (default: 1.0)

Controls the degree of determinism in the output. A higher temperature leads to more varied output, while a lower temperature makes the model more likely to choose high-probability logits. At a temperature of 0, the model performs greedy sampling.

top_k : integer (default: 50)

This parameter narrows down the choice of next words to the top ‘k’ most likely options, based on the model’s predictions. It helps in focusing the generation on more probable logits.

top_p : float (range: 0 to 1 default: 1.0)

This parameter allows the model to choose from the smallest set of logits whose cumulative probability does not exceed ‘p’. This can create more diverse and less predictable text compared to top_k.

presence_penalty : float (range: -2 to 2 default: 0)

This parameter adjusts the likelihood of the model introducing new topics or entities during text generation. A positive value encourages the introduction of new concepts by reducing repetition.

frequency_penalty : float (range: -2 to 2 default: 0)

This parameter alters the likelihood of the model repeating the same line of thought or specific words. Positive values discourage repetition, encouraging the model to introduce more varied language and ideas.

JSON Input Example#

{
  "text": "What is the difference between a Llama and an Alpaca?",
  "max_tokens": 50,
  "max_input_tokens": 0,
  "num_samples": 1,
  "eos_token_ids": [1, 2],
  "stop": [],
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0
}

Outputs#

This section describes the output parameters returned by the MK1 Flywheel API.

created : float

The timestamp indicating when the API request was initially created.

finished : float

The timestamp showing when the API response was fully generated and completed.

num_samples : integer

The number of distinct text completions requested to be generated based on the prompt.

prompt : string

The text snippet provided as input to the API to seed the text generation.

prompt_tokens : integer

The number of word-tokens contained in the input prompt text.

responses: [object]

A list of all generated responses.

responses[].finish_reason : string

The reason that the text generation was ended for this particular response.

responses[].finished : float

The timestamp when this specific text response was fully generated and marked as completed.

responses[].generated_tokens : integer

The number of word-tokens produced in the auto-generated text for this response.

responses[].text : string

The actual string of computer-generated text returned in the API response.

JSON Output Example#

{
 "created": 1700510342.0931604,
 "finished": 1700510348.9619334,
 "num_samples": 3,
 "prompt": "What is the difference between a Llama and an Alpaca?",
 "prompt_tokens": 16,
 "responses":
    [
        {
            "finish_reason": "eos",
            "finished": 1700510348.9619334,
            "generated_tokens": 16,
            "text": "One drives a car, the other drives a truck."
        },
        {
            "finish_reason": "eos",
            "finished": 1700510348.9619334,
            "generated_tokens": 30,
            "text": "One spits with attitude, the other cuddles with gratitude."
        },
        {
            "finish_reason": "eos",
            "finished": 1700510348.9619334,
            "generated_tokens": 23,
            "text": "Llamas use smartphones, alpacas stick to landlines."
        },
    ]
}