Getting Started#

Account Setup#

The key thing about using Modal is that you don’t have to set up any infrastructure. Simply…

Create an account at modal.com

Set up your environment

pip install modal
python3 -m modal setup

…and you’re ready to go.

Minimal Example#

The following code shows a minimal example that will invoke Llama2 7B on an Nvidia A10G GPU. You’ll just need to download it to a .py file and run it to get your Flywheel endpoint up and running.

Note

The first time you attempt to run an MK1 model you will be prompted by the command line to accept our terms and conditions. This is a one-time process.

import modal

Model = modal.Cls.lookup(
    "mk1-flywheel-latest-llama2-7b-chat", "Model", workspace="mk1"
).with_options(
    gpu=modal.gpu.A10G(),
)

model = Model()
prompt = "[INST] What is the difference between a llama and an alpaca? [/INST] "

print(f"Prompt:\n{prompt}\n")

responses = model.generate.remote(text=prompt, max_tokens=512, eos_token_ids=[1, 2])
response = responses["responses"][0]["text"]

print(f"Response:\n{response}")

In this section you will find more information about other images with pre-populated models and how to use them.

Next Steps#

Bring-Your-Own-Model: Serve your own models (perhaps fine-tuned) with Flywheel on Modal.
Example: Batch Document Summarization on Modal: Summarize a large batch of news articles with Flywheel in half the time compared to vLLM.
Example: Endpoint: Setup your own endpoint with Flywheel and bootstrap any inference application. Experience up to 2x throughput at the same latencies compared to other leading inference solutions.

Getting Started

Contents

Getting Started#

What is Modal?#

Account Setup#

Minimal Example#

Next Steps#