Highlights API#

MK1 Highlights is our API service that retrieves the most relevant source text for any user query. Built on top of our custom large language model (LLM), Highlights is designed to scan large volumes of text quickly while maintaining near-perfect recall.

API Endpoint#

POST https://api.highlights.mk1.ai/search

Authentication#

Authentication is required for all API calls. You’ll need to include your API key in the request headers:

X-API-Key: YOUR-API-KEY

To obtain an API key, please sign up here.

Request Format#

The API accepts POST requests with a JSON body. Chunks can be provided in two formats:

Simple string

{
  "query": "Your Query here",
  "chunk_txts": ["Chunk 1 text here", "Chunk 2 text here", "..."],
  "top_n": 3
}

Dictionary with required “text” field and optional metadata

{
  "query": "Your query here",
  "chunk_txts": [
    {
      "text": "First chunk with metadata",
      "metadata": {
        "source": "document1",
        "page": 1
      }
    },
    "Simple text chunk without metadata",
    {
      "text": "Third chunk with metadata",
      "metadata": {
        "source": "document2",
        "category": "introduction"
      }
    }
  ],
  "top_n": 10,
  "true_order": true
}

Parameters#

Parameter	Type	Required	Description
`query`	string	Yes	The natural language query to search with
`chunk_txts`	array	Yes	Array of text chunks with optional metadata
`top_n`	integer	No	Number of top results to return (default: 10)
`true_order`	boolean	No	Maintain original chunk order in results (default: true)

Chunk Format Options#

Each chunk in the chunk_txts array can be one of:

Field	Type	Required	Description
`text`	string	Yes	The content of the text chunk
`metadata`	object	No	Additional metadata associated with the chunk
`original_index`	integer	No	Original position of chunk in input array (auto-set)

Response Format#

The API returns a JSON response with relevant text chunks and their metadata:

{
  "results": [
    {
      "chunk_id": 0,
      "chunk_txt": "The relevant text chunk...",
      "chunk_score": 136.13050842285156,
      "metadata": {
        "source": "document1",
        "page": 1
      },
      "original_index": 0
    }
  ],
  "metadata": {
    "num_query_tokens": 5,
    "num_context_tokens": 44
  }
}

Response Fields#

Field	Type	Description
`results`	array	Array of relevant text chunks, ordered by relevance
`chunk_id`	integer	Index of the chunk in results array
`chunk_txt`	string	The content of the relevant text chunk
`chunk_score`	float	Relevance score of the chunk to the query
`metadata`	object	Additional metadata associated with the chunk
`original_index`	integer	Original position of chunk in input array
`metadata`	object	Additional information about the request
`num_query_tokens`	integer	Number of tokens in the query
`num_context_tokens`	integer	Total number of tokens in the provided text chunks

Best Practices#

General guidance to optimize your experience and get the most out of the Highlights API.

Input Preparation#

Text Chunking#

Keep chunk sizes between 512-10,000 characters for optimal performance
Ensure no empty chunks are included as these will cause errors
Maintain consistent chunk sizes across your dataset

Semantic Chunking#

Split text at natural semantic boundaries like paragraphs and sections
Keep related concepts together within chunks
Include enough context in each chunk for it to be meaningful on its own

Query Construction#

Writing Effective Queries#

Frame queries as clear, specific questions
Use natural language rather than just keywords
Include key terms that match your target content

Query Structure#

Balance brevity with necessary detail
Avoid overly long queries that could dilute relevance
Ensure queries contain enough information to be meaningful

Python Client#

For easier integration, you can use our Python client:

from typing import List, Optional
import requests

class HighlightsClient:
    def __init__(self, api_key: str, base_url: str = "https://api.highlights.mk1.ai"):
        self.api_key = api_key
        self.base_url = base_url
        self.headers = {
            "X-API-Key": api_key,
            "Content-Type": "application/json"
        }

    def search(
        self,
        query: str,
        text_chunks: List[str],
        top_n: Optional[int] = 3
    ) -> dict:
        """
        Search through text chunks to find relevant passages.

        Args:
            query: The search query
            text_chunks: List of text passages to search through
            top_n: Number of top results to return

        Returns:
            Dictionary containing search results and metadata
        """
        endpoint = f"{self.base_url}/search"

        payload = {
            "query": query,
            "text_chunks": text_chunks,
            "top_n": top_n
        }

        response = requests.post(endpoint, headers=self.headers, json=payload)
        response.raise_for_status()
        return response.json()

Example Usage#

Python Client Example#

from highlights_client import HighlightsClient

# Initialize the client
client = HighlightsClient(api_key="your-api-key")

# Sample text chunks
text_chunks = [
    "Machine learning models can process vast amounts of data quickly.",
    "Natural language processing helps computers understand human language.",
    "Deep learning is a subset of machine learning based on neural networks.",
    "Data science combines statistics, programming, and domain expertise."
]

# Perform the search
results = client.search(
    query="What is machine learning?",
    chunks=text_chunks,
    top_n=2,
    true_order=True
)

# Display results
print("Search Results:")
print(json.dumps(results, indent=2))

# Output:
# Search Results:
# {
#   "results": [
#     {
#       "chunk_id": 0,
#       "chunk_txt": "Machine learning models can process vast amounts of data quickly.",
#       "chunk_score": 136.13050842285156,
#       "metadata": {},
#       "original_index": 0
#     },
#     {
#       "chunk_id": 2,
#       "chunk_txt": "Deep learning is a subset of machine learning based on neural networks.",
#       "chunk_score": 119.29242706298828,
#       "metadata": {},
#       "original_index": 2
#     }
#   ],
#   "metadata": {
#     "num_query_tokens": 5,
#     "num_context_tokens": 44
#   }
# }

cURL Example#

curl -X POST "https://api.highlights.mk1.ai/distributor/search" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR-API-KEY" \
  -d '{
    "query": "What is machine learning?",
    "chunk_txts": [
      "Machine learning models can process vast amounts of data quickly.",
      "Natural language processing helps computers understand human language.",
      "Deep learning is a subset of machine learning based on neural networks.",
      "Data science combines statistics, programming, and domain expertise."
    ],
    "top_n": 2,
    "true_order": true
  }'

Error Codes#

Status Code	Description
200	Success
400	Bad Request - Invalid parameters
401	Unauthorized - Authentication failed
429	Too Many Requests - Rate limit exceeded
500	Internal Server Error

Further Resources#

Example cookbooks

Highlights API

Contents