Getting Started#
We’ve broken down the setup process into two steps:
Installing Flywheel
Running a quick test
Installations#
Install flywheel in a python virtual environment (recommended).
python3 -m venv mk1
source mk1/bin/activate
pip install dist/mk1_flywheel-${VERSION}-cp310-cp310-linux_x86_64.whl
Available:
dist/mk1_flywheel-*-cp38-cp38-linux_x86_64.whl
: ubuntu-18.04+, python-3.8dist/mk1_flywheel-*-cp310-cp310-linux_x86_64.whl
: ubuntu-18.04+, python-3.10
Running a quick test#
Offline inference#
Here is a simple example of how to use the flywheel python package to generate text using a model.
import os
from transformers import AutoTokenizer
import mk1.flywheel as flywheel
# Model Path
model_path = "meta-llama/Meta-Llama-3-8B-Instruct" # Feel free to utilize any model!
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, skip_special_tokens=True,clean_up_tokenization_spaces=True)
# Load model
model = flywheel.ModelForInferenceBase.from_pretrained(model_path, tokenizer)
# Sampling configuration
sampling_config = flywheel.SamplingConfiguration(
max_tokens=512,
eos_token_ids=[1, 2],
)
prompts = [
"What are 2 differences between a llama and an alpaca?",
"What are some differences between Hazy IPA and Double IPA?",
]
# Generate responses
responses = model.generate(prompts, sampling_config)
# Print out responses for all prompts
for resp in responses:
for i, sample in enumerate(resp.responses):
print(sample.text)
API client#
Use the command below, making sure to replace {Model Path}
with the path to your model. This command will start an HTTP server, enabling you to interact with the model through a REST API.
Initialize the API server:#
python3 -m mk1.flywheel.entrypoints.endpoint --model_path {Model Path}
You can utilize the following to now generate text using your model, you can find more information about our API here.
CURL:#
Here is a simple curl request you can utilize
curl -X "POST" "http://127.0.0.1:8000/generate" -H 'Content-Type: application/json' -d '{
"text": "What is the difference between a llama and an alpaca?",
"max_tokens": 512,
"eos_token_ids": [1, 2],
"stop": ["\\n\\n\\n"],
"temperature": 0.8,
"top_k": 40,
"top_p": 0.95
}' --output -
Python:#
Here is the python equivalent to the curl request above.
import requests
url = "http://127.0.0.1:8000/generate"
headers = {
"Content-Type": "application/json"
}
data = {
"text": "What is the difference between a llama and an alpaca?",
"max_tokens": 512,
"eos_token_ids": [1, 2],
"stop": ["\\n\\n\\n"],
"temperature": 0.8,
"top_k": 40,
"top_p": 0.95
}
response = requests.post(url, headers=headers, json=data)