API Server Overview#
This document provides an overview of the API server, including its available endpoints and their functionalities.
Endpoints#
Health Check#
-Endpoint: /health
-Method: GET
-Description: Checks if the server is running.
-Response: 200 OK
Generate Text#
-Endpoint: /generate
-Method: POST
-Description: Generates text based on the provided prompt.
-Request Body:
{
"text": "string", // The input text to generate from. (Required)
"max_tokens": 512, // Maximum number of tokens to generate. (Default: 512)
"max_input_tokens": 0, // Maximum number of tokens in the input text. (Default: 0, no limit)
"num_samples": 1, // Number of independent completions to generate. (Default: 1)
"eos_token_ids": [1, 2], // List of token IDs that signify the end of a sequence. (Default: [])
"stop": ["\\n\\n\\n"], // List of strings where generation will stop if encountered. (Default: [])
"temperature": 0.8, // Controls the randomness of the output. (Default: 1.0)
"top_k": 40, // Limits the next token selection to the top K tokens. (Default: 50)
"top_p": 0.95, // Limits the next token selection to a cumulative probability. (Default: 1.0)
"min_p": 0.0, // Minimum cumulative probability for token selection. (Default: 0.0)
"presence_penalty": 0.0, // Penalizes new tokens based on their presence in the text so far. (Default: 0.0)
"frequency_penalty": 0.0, // Penalizes new tokens based on their frequency in the text so far. (Default: 0.0)
"logprobs": false, // Whether to return log probabilities of tokens. (Default: false)
"return_tokens": false // Whether to return the generated tokens. (Default: false)
}
Response:
GenerationResponse