Highlights API#

MK1 Highlights is our API service that retrieves the most relevant source text for any user query. Built on top of our custom large language model (LLM), Highlights is designed to scan large volumes of text quickly while maintaining near-perfect recall.

API Endpoint#

POST https://api.highlights.mk1.ai/search

Authentication#

Authentication is required for all API calls. You’ll need to include your API key in the request headers:

X-API-Key: YOUR-API-KEY

To obtain an API key, please sign up here.

Request Format#

The API accepts POST requests with a JSON body. Chunks can be provided in two formats:

  1. Simple string

{
  "query": "Your Query here",
  "chunk_txts": ["Chunk 1 text here", "Chunk 2 text here", "..."],
  "top_n": 3
}
  1. Dictionary with required “text” field and optional metadata

{
  "query": "Your query here",
  "chunk_txts": [
    {
      "text": "First chunk with metadata",
      "metadata": {
        "source": "document1",
        "page": 1
      }
    },
    "Simple text chunk without metadata",
    {
      "text": "Third chunk with metadata",
      "metadata": {
        "source": "document2",
        "category": "introduction"
      }
    }
  ],
  "top_n": 10,
  "true_order": true
}

Parameters#

Parameter

Type

Required

Description

query

string

Yes

The natural language query to search with

chunk_txts

array

Yes

Array of text chunks with optional metadata

top_n

integer

No

Number of top results to return (default: 10)

true_order

boolean

No

Maintain original chunk order in results (default: true)

Chunk Format Options#

Each chunk in the chunk_txts array can be one of:

Field

Type

Required

Description

text

string

Yes

The content of the text chunk

metadata

object

No

Additional metadata associated with the chunk

original_index

integer

No

Original position of chunk in input array (auto-set)

Response Format#

The API returns a JSON response with relevant text chunks and their metadata:

{
  "results": [
    {
      "chunk_id": 0,
      "chunk_txt": "The relevant text chunk...",
      "chunk_score": 136.13050842285156,
      "metadata": {
        "source": "document1",
        "page": 1
      },
      "original_index": 0
    }
  ],
  "metadata": {
    "num_query_tokens": 5,
    "num_context_tokens": 44
  }
}

Response Fields#

Field

Type

Description

results

array

Array of relevant text chunks, ordered by relevance

chunk_id

integer

Index of the chunk in results array

chunk_txt

string

The content of the relevant text chunk

chunk_score

float

Relevance score of the chunk to the query

metadata

object

Additional metadata associated with the chunk

original_index

integer

Original position of chunk in input array

metadata

object

Additional information about the request

num_query_tokens

integer

Number of tokens in the query

num_context_tokens

integer

Total number of tokens in the provided text chunks

Best Practices#

General guidance to optimize your experience and get the most out of the Highlights API.

Input Preparation#

Text Chunking#

  • Keep chunk sizes between 512-10,000 characters for optimal performance

  • Ensure no empty chunks are included as these will cause errors

  • Maintain consistent chunk sizes across your dataset

Semantic Chunking#

  • Split text at natural semantic boundaries like paragraphs and sections

  • Keep related concepts together within chunks

  • Include enough context in each chunk for it to be meaningful on its own

Query Construction#

Writing Effective Queries#

  • Frame queries as clear, specific questions

  • Use natural language rather than just keywords

  • Include key terms that match your target content

Query Structure#

  • Balance brevity with necessary detail

  • Avoid overly long queries that could dilute relevance

  • Ensure queries contain enough information to be meaningful

Python Client#

For easier integration, you can use our Python client:

from typing import List, Optional
import requests

class HighlightsClient:
    def __init__(self, api_key: str, base_url: str = "https://api.highlights.mk1.ai"):
        self.api_key = api_key
        self.base_url = base_url
        self.headers = {
            "X-API-Key": api_key,
            "Content-Type": "application/json"
        }

    def search(
        self,
        query: str,
        text_chunks: List[str],
        top_n: Optional[int] = 3
    ) -> dict:
        """
        Search through text chunks to find relevant passages.

        Args:
            query: The search query
            text_chunks: List of text passages to search through
            top_n: Number of top results to return

        Returns:
            Dictionary containing search results and metadata
        """
        endpoint = f"{self.base_url}/search"

        payload = {
            "query": query,
            "text_chunks": text_chunks,
            "top_n": top_n
        }

        response = requests.post(endpoint, headers=self.headers, json=payload)
        response.raise_for_status()
        return response.json()

Example Usage#

Python Client Example#

from highlights_client import HighlightsClient

# Initialize the client
client = HighlightsClient(api_key="your-api-key")

# Sample text chunks
text_chunks = [
    "Machine learning models can process vast amounts of data quickly.",
    "Natural language processing helps computers understand human language.",
    "Deep learning is a subset of machine learning based on neural networks.",
    "Data science combines statistics, programming, and domain expertise."
]

# Perform the search
results = client.search(
    query="What is machine learning?",
    chunks=text_chunks,
    top_n=2,
    true_order=True
)

# Display results
print("Search Results:")
print(json.dumps(results, indent=2))

# Output:
# Search Results:
# {
#   "results": [
#     {
#       "chunk_id": 0,
#       "chunk_txt": "Machine learning models can process vast amounts of data quickly.",
#       "chunk_score": 136.13050842285156,
#       "metadata": {},
#       "original_index": 0
#     },
#     {
#       "chunk_id": 2,
#       "chunk_txt": "Deep learning is a subset of machine learning based on neural networks.",
#       "chunk_score": 119.29242706298828,
#       "metadata": {},
#       "original_index": 2
#     }
#   ],
#   "metadata": {
#     "num_query_tokens": 5,
#     "num_context_tokens": 44
#   }
# }

cURL Example#

curl -X POST "https://api.highlights.mk1.ai/distributor/search" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR-API-KEY" \
  -d '{
    "query": "What is machine learning?",
    "chunk_txts": [
      "Machine learning models can process vast amounts of data quickly.",
      "Natural language processing helps computers understand human language.",
      "Deep learning is a subset of machine learning based on neural networks.",
      "Data science combines statistics, programming, and domain expertise."
    ],
    "top_n": 2,
    "true_order": true
  }'

Error Codes#

Status Code

Description

200

Success

400

Bad Request - Invalid parameters

401

Unauthorized - Authentication failed

429

Too Many Requests - Rate limit exceeded

500

Internal Server Error

Further Resources#