API Server Overview

API Server Overview#

This document provides an overview of the API server, including its available endpoints and their functionalities.

Endpoints#

Health Check#

-Endpoint: /health

-Method: GET

-Description: Checks if the server is running.

-Response: 200 OK

Generate Text#

-Endpoint: /generate

-Method: POST

-Description: Generates text based on the provided prompt.

-Request Body:

{
"text": "string",         // The input text to generate from. (Required)
"max_tokens": 512,        // Maximum number of tokens to generate. (Default: 512)
"max_input_tokens": 0,    // Maximum number of tokens in the input text. (Default: 0, no limit)
"num_samples": 1,         // Number of independent completions to generate. (Default: 1)
"eos_token_ids": [1, 2],  // List of token IDs that signify the end of a sequence. (Default: [])
"stop": ["\\n\\n\\n"],    // List of strings where generation will stop if encountered. (Default: [])
"temperature": 0.8,       // Controls the randomness of the output. (Default: 1.0)
"top_k": 40,              // Limits the next token selection to the top K tokens. (Default: 50)
"top_p": 0.95,            // Limits the next token selection to a cumulative probability. (Default: 1.0)
"min_p": 0.0,             // Minimum cumulative probability for token selection. (Default: 0.0)
"presence_penalty": 0.0,  // Penalizes new tokens based on their presence in the text so far. (Default: 0.0)
"frequency_penalty": 0.0, // Penalizes new tokens based on their frequency in the text so far. (Default: 0.0)
"logprobs": false,        // Whether to return log probabilities of tokens. (Default: false)
"return_tokens": false    // Whether to return the generated tokens. (Default: false)
}
  • Response: GenerationResponse