Efficient Deployment of Custom YOLOv8 Models Using Salad's Affordable Shared GPUs: A Cost-Saving Guide

julian garcia
Jan 8, 2024
6 min read

Are you looking to deploy production-class predictions using a custom YOLOv8 model? YOLOv8, developed by Ultralytics, is a sophisticated version in the YOLO series of object detection algorithms. It is designed to provide fast, accurate, and efficient object detection in images and videos. On the other hand Salad.com offers an economical solution with access to cost-effective and scalable GPUs. This article, we'll guide you through deploying a Docker container using FastAPI, which includes setting up an endpoint to interact with your custom YOLOv8 model. We will conclude by deploying the container to Azure Container Registry and push the image to Salad.com.

To effectively follow this guide, you must first install the Docker engine. For Windows users, we advise using Windows Subsystem for Linux (WSL) to prevent compatibility issues. Please refer to the official Docker installation guide (official docker installation guide ) and select your current platform for the installation process. After installing Docker, you will need to set up an IPv6 network. Consult this docker guide for this purpose. Be sure to follow just through the 4th step of Creating an IPv6 network and follow step 4 using docker create instead of docker compose.but ensure you only complete up to the 4th step of 'Creating an IPv6 network'.

We have all the code available on Github, if you want to skip the code creation part. And remember that if you want to create a custom yolov8 model using our Data Augmentation Studio tool, you can do so following our comprehensive tutorial!

Preparing the Docker container

To setup the container, we will need to list the requirements for out FastAPI application, create a Dockerfile that sets the instructions to build the container, and create a folder named app/ to store the code on, with a file called __init__.py inside. The folder structure will have to match this one:

YOLOFastAPI/
- Dockerfile
- requirements.txt
- app/
     - __init__.py
     - ...

Let's display the contents of the requirements.txt and give a small explanation on their purpose:

fastapi[all]>=0.105.0
pydantic>=1.8.0
uvicorn[standard]>=0.15.0
torch>=2.1.1
opencv-python>=4.8.0
ultralytics

FastAPI and pydantic are necessary for the web framework to work. Uvicorn lets us build a production web server. Ultralytics, torch and opencv are necessary to process the input image and to use the model, in this case, Segment Anything. Now, let's do the same thing over the Dockerfile.

FROM python:3.9.18

WORKDIR /code

# Download YOLOv8 custom Model (saved in drive)

RUN mkdir /code/sam_images
RUN pip install gdown

RUN gdown 1pB_7eVrncxwXc84uA8YyEVQjijrFsho8 -O /code/sam_images/custom_yolo_model.pt

# Install the requirements and libraries

COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
RUN apt update -y
RUN apt install libgl1-mesa-glx -y

# Copy the app code

COPY ./app /code/app

# Startup command

CMD ["uvicorn", "app.main:app", "--host", "::", "--port", "80"]

We start from the official python 3.9 image. We first create a folder to store the YOLOv8 custom weights, and download them there. Then, we install the previously stated requirements, and we install drivers for the gpu. Next, we copy the code to the folder in the container, and we finally start the uvicorn server. This concludes the requirements for the docker container. It is very important that you switch the gdown url with the url pointing to your custom yolov8 model, or else you won't have the correct result. We will go now through the code.

FastAPI startup code

First start by creating a file in the app/ folder called main.py, with the following code:

from fastapi import FastAPI

app = FastAPI()

@app.get("/healthcheck")
def check_working():
    return {"online": True}

This is the startup code for the FastAPI application. You can check if it works by building and running the docker container. It may take a while since it will download the requirements and the custom YOLOv8 model weights, but you will only have to download it once (since Docker adds the already completed steps to the cache). To test the container, you will neet to build the docker image (this step may take a while):

docker build -t yolo_service .

And then, to run the image on the container:

docker run --rm --network ip6net -p 80:80 yolo_service

Remember to have configured the IPv6 network as mentioned earlier. And so, you will have your docker container running with the FastAPI healthcheck! You can check if it's working by running a GET request to the url :::80/healthcheck with any http tool such as postman. You should get the following response if all is working correctly:

{
	"online": true
}

Adding the model querying endpoint

The next step we will take will be to load the models in the fastAPI state. For that purpose, we will add a lifespan to the application, so that the models are loaded when the application starts and unloaded when it ends. Change the code in main.py to include these commands before the creation of the app:

from contextlib import asynccontextmanager
from ultralytics import YOLO
import torch
@asynccontextmanager
async def lifespan(app: FastAPI):

    app.state.ml_models = {}

    if torch.cuda.is_available():
       device = "cuda"
    else:
       device = "cpu"

    # Load the yolo model
    yolo = YOLO("./sam_images/custom_yolo_model.pt").to(device)

    app.state.ml_models["yolo"] = yolo
    yield

    # Clean up the ML models and release the resources
    app.state.ml_models.clear()

And change the app creation code to include the lifespan:

app = FastAPI(lifespan=lifespan)

Now, let's create the pydantic schema for the endpoint. For that matter, create a file in the app/ folder called schemas.py. In that file, enter this contents:

from pydantic import BaseModel
from typing import List, Tuple, Optional

class YoloBody(BaseModel):
    image: str

The idea behind this class is to type the incoming json and serialize it into the class, all of this while validating the correct data types. We will only ask for the image encoded in base64 format.

Then, let's create the inference endpoint. Create a new folder inside the app folder called routers, create a new file called __init__.py and another called inference.py there, having these contents:

import base64
from fastapi import APIRouter, Request
import numpy as np
import cv2
from ..schemas import YoloBody

router = APIRouter()

@router.post("/yolo")
def yolo_prediction(request: Request, body: YoloBody):
    try:
        image_bytes = base64.b64decode(body.image)


        file_bytes = np.fromstring(image_bytes, np.uint8)    
        image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)

        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        results = request.app.state.ml_models["yolo"](image)[0]
        boxes = results.boxes.xywh.tolist()
        
        if results.probs:
            labels = results.probs.top1.tolist()
        else:
            labels = [0 for _ in boxes]
        
        res = list(zip(labels, boxes))

        return res
    
    except Exception as ex:
        return {"detail":  str(ex)}

The previous code loads the bytes64-encoded image into an array, feeds it into the model and returns the predicted boxes. The endpoint is complete! The last step is to add the router in main.py. Add in the beggining of that file the import of the router:

from app.routers import inference

And after the definition of the app load the router:

app = FastAPI(lifespan=lifespan)

app.include_router(inference.router)

This is the final main.py file:

from fastapi import FastAPI
from contextlib import asynccontextmanager
from ultralytics import YOLO
import torch
from app.routers import inference

@asynccontextmanager
async def lifespan(app: FastAPI):

    app.state.ml_models = {}

    if torch.cuda.is_available():
       device = "cuda"
    else:
       device = "cpu"

    # Load the yolo model
    yolo = YOLO("./sam_images/custom_yolo_model.pt").to(device)

    app.state.ml_models["yolo"] = yolo
    yield

    # Clean up the ML models and release the resources
    app.state.ml_models.clear()

app = FastAPI(lifespan=lifespan)

app.include_router(inference.router)

@app.get("/healthcheck")
def check_working():
    return {"online": True}

And the project structure must be as follows:

YOLOFastAPI/
- Dockerfile
- requirements.txt
- app/
     - __init__.py
     - main.py
     - schemas.py
     - routers/
          - __init__.py
          - inference.py

This finalizes the first part of the tutorial: creating the docker image. To test wether the image is working properly, repeat the build and run steps before. Now you can test the healthcheck endpoint on :::80/healthcheck, but also the inference endpoint on :::80/yolo. To test the inference endpoint, we will try to detect the objects in this image:

Create and run the following program in a new file called apitester.py (install the missing requirements):

from base64 import b64decode, b64encode
import json
import requests
import gdown

gdown.download("https://drive.google.com/uc?id=1VIiMg7_AEBIW8gJmG5kOLz8eOdEGLfhP")

with open("can.jpg", 'rb') as file:
    image = file.read()

body = {
    "image": b64encode(image).decode()
}

URL = "http://[::]:80"


r = requests.post(f"{URL}/yolo", json=body, headers={"Content-Type": "application/json"})


res = r.json()

print(res)

The response should look like this:

[[0, [2466.7509765625, 1885.647705078125, 862.8900146484375, 546.5208740234375]], [0, [51.385719299316406, 1837.412353515625, 102.63978576660156, 1124.1590576171875]], [0, [414.3603515625, 1839.579833984375, 828.720703125, 1150.2125244140625]], [0, [1481.547119140625, 1944.6197509765625, 1172.13671875, 2312.378173828125]], [0, [687.561767578125, 2289.55908203125, 1375.12353515625, 2051.265869140625]], [0, [69.6109619140625, 2294.936767578125, 135.67645263671875, 2032.6851806640625]]]

Deploying the docker image to Salad

The second part of the tutorial focuses on step-by-step deployment of the image to Salad. We will initially upload the image to a container registry, followed by deploying it to Salad from this registry

Firstly, upload the image to your preferred container registry. Although this tutorial uses Azure Container Registry, numerous other options are available. To upload the image to the container registry, you need to build the image, tag it, and then push it as follows:

docker build -t yolo_service .
docker tag yolo_service <your-container-registry-url>/api/yolo
docker push <your-container-registry-url>/api/yolo

Now, navigate to salad.com and click on Deploy on Salad.

If you have not created an account yet, create it. Afterwards, create a container group in Salad. Name it custom-yolo.

On Image Source, click Edit and configure the endpoint to point to the registry you are using.

In Replica Count, select 1, and in vCPUs, select 1 too.

In memory, select 4gb.

In GPU, select GTX 1060 (6 GB).

Finally, in Optional Settings, find Networking press Edit. Enable Networking and enter port 80 and select Yes on Use Authentication.

The container is configured! Now click Deploy. The image will take a while to deploy, and then you will be presented with the following screen.

And you're ready to use your custom Yolov8 model on the cloud! Just use the available url (displayed in the Access Domain Name field) to use the endpoints we configured earlier! Remember to use the Salad-Api-Key header to authenticate the request, as mentioned in the salad docs.

Conclusion

Today, we developed a Docker image from scratch to interact with a custom YOLOv8 model and deployed it on Salad.com, a cost-effective platform for hosting GPU-intensive models. If you have any questions about the process, don't hesitate to refer to our Github repository, which contains the Docker image code.

Thanks for reading!