Getting Started#

We’ve broken down the setup process into two steps:

  1. Installing Flywheel

  2. Running a quick test

Installations#

Install flywheel in a python virtual environment (recommended).

python3 -m venv mk1
source mk1/bin/activate
pip install dist/mk1_flywheel-${VERSION}-cp310-cp310-linux_x86_64.whl

Available:

  • dist/mk1_flywheel-*-cp38-cp38-linux_x86_64.whl : ubuntu-18.04+, python-3.8

  • dist/mk1_flywheel-*-cp310-cp310-linux_x86_64.whl : ubuntu-18.04+, python-3.10

Running a quick test#

Offline inference#

Here is a simple example of how to use the flywheel python package to generate text using a model.

import os
from transformers import AutoTokenizer
import mk1.flywheel as flywheel

# Model Path
model_path = "meta-llama/Meta-Llama-3-8B-Instruct" # Feel free to utilize any model!

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, skip_special_tokens=True,clean_up_tokenization_spaces=True)

# Load model
model = flywheel.ModelForInferenceBase.from_pretrained(model_path, tokenizer)

# Sampling configuration
sampling_config = flywheel.SamplingConfiguration(
    max_tokens=512,
    eos_token_ids=[1, 2],
)
prompts = [
    "What are 2 differences between a llama and an alpaca?",
    "What are some differences between Hazy IPA and Double IPA?",
]

# Generate responses
responses = model.generate(prompts, sampling_config)

# Print out responses for all prompts
for resp in responses:
    for i, sample in enumerate(resp.responses):
        print(sample.text)

API client#

Use the command below, making sure to replace {Model Path} with the path to your model. This command will start an HTTP server, enabling you to interact with the model through a REST API.

Initialize the API server:#

python3 -m mk1.flywheel.entrypoints.endpoint --model_path {Model Path}

You can utilize the following to now generate text using your model, you can find more information about our API here.

CURL:#

Here is a simple curl request you can utilize

 curl -X "POST" "http://127.0.0.1:8000/generate" -H 'Content-Type: application/json' -d '{
    "text": "What is the difference between a llama and an alpaca?",
    "max_tokens": 512,
    "eos_token_ids": [1, 2],
    "stop": ["\\n\\n\\n"],
    "temperature": 0.8,
    "top_k": 40,
    "top_p": 0.95
}' --output -

Python:#

Here is the python equivalent to the curl request above.

import requests

url = "http://127.0.0.1:8000/generate"
headers = {
    "Content-Type": "application/json"
}
data = {
    "text": "What is the difference between a llama and an alpaca?",
    "max_tokens": 512,
    "eos_token_ids": [1, 2],
    "stop": ["\\n\\n\\n"],
    "temperature": 0.8,
    "top_k": 40,
    "top_p": 0.95
}

response = requests.post(url, headers=headers, json=data)