# Cog: Containers for machine learning Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/). ## Highlights - πŸ“¦ **Docker containers without the pain.** Writing your own `Dockerfile` can be a bewildering process. With Cog, you define your environment with a [simple configuration file](#how-it-works) and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on. - 🀬️ **No more CUDA hell.** Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you. - βœ… **Define the inputs and outputs for your model with standard Python.** Then, Cog generates an OpenAPI schema and validates the inputs and outputs. - 🎁 **Automatic HTTP prediction server**: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server. - πŸš€ **Ready for production.** Deploy your model anywhere that Docker images run. Your own infrastructure, or [Replicate](https://replicate.com). ## How it works Define the Docker environment your model runs in with `cog.yaml`: ```yaml build: gpu: true system_packages: - "libgl1-mesa-glx" - "libglib2.0-0" python_version: "3.13" python_requirements: requirements.txt predict: "predict.py:Predictor" ``` Define how predictions are run on your model with `predict.py`: ```python from cog import BasePredictor, Input, Path import torch class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.model = torch.load("./weights.pth") # The arguments and types the model takes as input def predict(self, image: Path = Input(description="Grayscale input image") ) -> Path: """Run a single prediction on the model""" processed_image = preprocess(image) output = self.model(processed_image) return postprocess(output) ``` In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model. Now, you can run predictions on this model: ```console $ cog predict -i image=@input.jpg --> Building Docker image... --> Running Prediction... --> Output written to output.jpg ``` Or, build a Docker image for deployment: ```console $ cog build -t my-classification-model --> Building Docker image... --> Built my-classification-model:latest $ docker run -d -p 5000:5000 --gpus all my-classification-model $ curl http://localhost:5000/predictions -X POST \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "https://.../input.jpg"}}' ``` Or, combine build and run via the `serve` command: ```console $ cog serve -p 8080 $ curl http://localhost:8080/predictions -X POST \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "https://.../input.jpg"}}' ``` ## Why are we building this? It's really hard for researchers to ship machine learning models to production. Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed. [Andreas](https://github.com/andreasjansson) and [Ben](https://github.com/bfirsh) created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created [Docker Compose](https://github.com/docker/compose). We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. [Uber](https://eng.uber.com/michelangelo-pyml/) and others have built similar systems. So, we're making an open source version so other people can do this too. Hit us up if you're interested in using it or want to collaborate with us. [We're on Discord](https://discord.gg/replicate) or email us at [team@replicate.com](mailto:team@replicate.com). ## Prerequisites - **macOS, Linux or Windows 11**. Cog works on macOS, Linux and Windows 11 with [WSL 2](docs/wsl2/wsl2.md) - **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to [install Buildx](https://docs.docker.com/build/architecture/#buildx) as well. ## Install If you're using macOS, you can install Cog using Homebrew: ```console brew install replicate/tap/cog ``` You can also download and install the latest release using our [install script](https://cog.run/install): ```sh # bash, zsh, and other shells sh <(curl -fsSL https://cog.run/install.sh) # fish shell sh (curl -fsSL https://cog.run/install.sh | psub) # download with wget and run in a separate command wget -qO- https://cog.run/install.sh sh ./install.sh ``` You can manually install the latest release of Cog directly from GitHub by running the following commands in a terminal: ```console sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)" sudo chmod +x /usr/local/bin/cog ``` Or if you are on docker: ``` RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)" ``` ## Upgrade If you're using macOS and you previously installed Cog with Homebrew, run the following: ```console brew upgrade replicate/tap/cog ``` Otherwise, you can upgrade to the latest version by running the same commands you used to install it. ## Development See [CONTRIBUTING.md](CONTRIBUTING.md) for how to set up a development environment and build from source. ## Next steps - [Get started with an example model](docs/getting-started.md) - [Get started with your own model](docs/getting-started-own-model.md) - [Using Cog with notebooks](docs/notebooks.md) - [Using Cog with Windows 11](docs/wsl2/wsl2.md) - [Take a look at some examples of using Cog](https://github.com/replicate/cog-examples) - [Deploy models with Cog](docs/deploy.md) - [`cog.yaml` reference](docs/yaml.md) to learn how to define your model's environment - [Prediction interface reference](docs/python.md) to learn how the `Predictor` interface works - [Training interface reference](docs/training.md) to learn how to add a fine-tuning API to your model - [HTTP API reference](docs/http.md) to learn how to use the HTTP API that models serve ## Need help? [Join us in #cog on Discord.](https://discord.gg/replicate) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/replicate/cog) --- # CLI reference ## `cog` Containers for machine learning. To get started, take a look at the documentation: https://github.com/replicate/cog **Examples** ``` To run a command inside a Docker environment defined with Cog: $ cog run echo hello world ``` **Options** ``` --debug Show debugging output -h, --help help for cog --no-color Disable colored output --version Show version of Cog ``` ## `cog build` Build a Docker image from the cog.yaml in the current directory. The generated image contains your model code, dependencies, and the Cog runtime. It can be run locally with 'cog predict' or pushed to a registry with 'cog push'. ``` cog build [flags] ``` **Examples** ``` # Build with default settings cog build # Build and tag the image cog build -t my-model:latest # Build without using the cache cog build --no-cache # Build with model weights in a separate layer cog build --separate-weights -t my-model:v1 ``` **Options** ``` -f, --file string The name of the config file. (default "cog.yaml") -h, --help help for build --no-cache Do not use cache when building the image --openapi-schema string Load OpenAPI schema from a file --progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto") --secret stringArray Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file' --separate-weights Separate model weights from code in image layers -t, --tag string A name for the built image in the form 'repository:tag' --use-cog-base-image Use pre-built Cog base image for faster cold boots (default true) --use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto") ``` ## `cog init` Create a cog.yaml and predict.py in the current directory. These files provide a starting template for defining your model's environment and prediction interface. Edit them to match your model's requirements. ``` cog init [flags] ``` **Examples** ``` # Set up a new Cog project in the current directory cog init ``` **Options** ``` -h, --help help for init ``` ## `cog login` Log in to a container registry. For Replicate's registry (r8.im), this command handles authentication through Replicate's token-based flow. For other registries, this command prompts for username and password, then stores credentials using Docker's credential system. ``` cog login [flags] ``` **Options** ``` -h, --help help for login --token-stdin Pass login token on stdin instead of opening a browser. You can find your Replicate login token at https://replicate.com/auth/token ``` ## `cog predict` Run a prediction. If 'image' is passed, it will run the prediction on that Docker image. It must be an image that has been built by Cog. Otherwise, it will build the model in the current directory and run the prediction on that. ``` cog predict [image] [flags] ``` **Examples** ``` # Run a prediction with named inputs cog predict -i prompt="a photo of a cat" # Pass a file as input cog predict -i image=@photo.jpg # Save output to a file cog predict -i image=@input.jpg -o output.png # Pass multiple inputs cog predict -i prompt="sunset" -i width=1024 -i height=768 # Run against a pre-built image cog predict r8.im/your-username/my-model -i prompt="hello" # Pass inputs as JSON echo '{"prompt": "a cat"}' | cog predict --json @- ``` **Options** ``` -e, --env stringArray Environment variables, in the form name=value -f, --file string The name of the config file. (default "cog.yaml") --gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus. -h, --help help for predict -i, --input stringArray Inputs, in the form name=value. if value is prefixed with @, then it is read from a file on disk. E.g. -i path=@image.jpg --json string Pass inputs as JSON object, read from file (@inputs.json) or via stdin (@-) -o, --output string Output path --progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto") --setup-timeout uint32 The timeout for a container to setup (in seconds). (default 300) --use-cog-base-image Use pre-built Cog base image for faster cold boots (default true) --use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto") --use-replicate-token Pass REPLICATE_API_TOKEN from local environment into the model context ``` ## `cog push` Build a Docker image from cog.yaml and push it to a container registry. Cog can push to any OCI-compliant registry. When pushing to Replicate's registry (r8.im), run 'cog login' first to authenticate. ``` cog push [IMAGE] [flags] ``` **Examples** ``` # Push to Replicate cog push r8.im/your-username/my-model # Push to any OCI registry cog push registry.example.com/your-username/model-name # Push with model weights in a separate layer (Replicate only) cog push r8.im/your-username/my-model --separate-weights ``` **Options** ``` -f, --file string The name of the config file. (default "cog.yaml") -h, --help help for push --no-cache Do not use cache when building the image --openapi-schema string Load OpenAPI schema from a file --progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto") --secret stringArray Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file' --separate-weights Separate model weights from code in image layers --use-cog-base-image Use pre-built Cog base image for faster cold boots (default true) --use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto") ``` ## `cog run` Run a command inside a Docker environment defined by cog.yaml. Cog builds a temporary image from your cog.yaml configuration and runs the given command inside it. This is useful for debugging, running scripts, or exploring the environment your model will run in. ``` cog run [arg...] [flags] ``` **Examples** ``` # Open a Python interpreter inside the model environment cog run python # Run a script cog run python train.py # Run with environment variables cog run -e HUGGING_FACE_HUB_TOKEN=abc123 python download.py # Expose a port (e.g. for Jupyter) cog run -p 8888 jupyter notebook ``` **Options** ``` -e, --env stringArray Environment variables, in the form name=value -f, --file string The name of the config file. (default "cog.yaml") --gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus. -h, --help help for run --progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto") -p, --publish stringArray Publish a container's port to the host, e.g. -p 8000 --use-cog-base-image Use pre-built Cog base image for faster cold boots (default true) --use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto") ``` ## `cog serve` Run a prediction HTTP server. Builds the model and starts an HTTP server that exposes the model's inputs and outputs as a REST API. Compatible with the Cog HTTP protocol. ``` cog serve [flags] ``` **Examples** ``` # Start the server on the default port (8393) cog serve # Start on a custom port cog serve -p 5000 # Test the server curl http://localhost:8393/predictions \ -X POST \ -H 'Content-Type: application/json' \ -d '{"input": {"prompt": "a cat"}}' ``` **Options** ``` -f, --file string The name of the config file. (default "cog.yaml") --gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus. -h, --help help for serve -p, --port int Port on which to listen (default 8393) --progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto") --upload-url string Upload URL for file outputs (e.g. https://example.com/upload/) --use-cog-base-image Use pre-built Cog base image for faster cold boots (default true) --use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto") ``` --- # Deploy models with Cog Cog containers are Docker containers that serve an HTTP server for running predictions on your model. You can deploy them anywhere that Docker containers run. The server inside Cog containers is **coglet**, a Rust-based prediction server that handles HTTP requests, worker process management, and prediction execution. This guide assumes you have a model packaged with Cog. If you don't, [follow our getting started guide](getting-started-own-model.md), or use [an example model](https://github.com/replicate/cog-examples). ## Getting started First, build your model: ```console cog build -t my-model ``` You can serve predictions locally with `cog serve`: ```console cog serve # or, from a built image: cog serve my-model ``` Alternatively, start the Docker container directly: ```shell # If your model uses a CPU: docker run -d -p 5001:5000 my-model # If your model uses a GPU: docker run -d -p 5001:5000 --gpus all my-model ``` The server listens on port 5000 inside the container (mapped to 5001 above). To view the OpenAPI schema, open [localhost:5001/openapi.json](http://localhost:5001/openapi.json) in your browser or use cURL to make a request: ```console curl http://localhost:5001/openapi.json ``` To stop the server, run: ```console docker kill my-model ``` To run a prediction on the model, call the `/predictions` endpoint, passing input in the format expected by your model: ```console curl http://localhost:5001/predictions -X POST \ --header "Content-Type: application/json" \ --data '{"input": {"image": "https://.../input.jpg"}}' ``` For more details about the HTTP API, see the [HTTP API reference documentation](http.md). ## Health checks The server exposes a `GET /health-check` endpoint that returns the current status of the model container. Use this for readiness probes in orchestration systems like Kubernetes. ```console curl http://localhost:5001/health-check ``` The response includes a `status` field with values like `STARTING`, `READY`, `BUSY`, `SETUP_FAILED`, or `DEFUNCT`. See the [HTTP API reference](http.md#get-health-check) for full details. ## Concurrency By default, the server processes one prediction at a time. To enable concurrent predictions, set the `concurrency.max` option in `cog.yaml`: ```yaml concurrency: max: 4 ``` See the [`cog.yaml` reference](yaml.md#concurrency) for more details. ## Environment variables You can configure runtime behavior with environment variables: - `COG_SETUP_TIMEOUT`: Maximum time in seconds for the `setup()` method (default: no timeout). See the [environment variables reference](environment.md) for the full list. --- # Environment variables This guide lists the environment variables that change how Cog functions. ## Build-time variables ### `COG_SDK_WHEEL` Controls which cog Python SDK wheel is installed in the Docker image during `cog build`. Takes precedence over `build.sdk_version` in `cog.yaml`. **Supported values:** | Value | Description | | -------------------- | ---------------------------------------------------- | | `pypi` | Install latest version from PyPI | | `pypi:0.12.0` | Install specific version from PyPI | | `dist` | Use wheel from `dist/` directory (requires git repo) | | `https://...` | Install from URL | | `/path/to/wheel.whl` | Install from local file path | **Default behavior:** - **Release builds**: Installs latest cog from PyPI - **Development builds**: Auto-detects wheel in `dist/` directory, falls back to latest PyPI **Examples:** ```console # Use specific PyPI version $ COG_SDK_WHEEL=pypi:0.11.0 cog build # Use local development wheel $ COG_SDK_WHEEL=dist cog build # Use wheel from URL $ COG_SDK_WHEEL=https://example.com/cog-0.12.0-py3-none-any.whl cog build ``` The `dist` option searches for wheels in: 1. `./dist/` (current directory) 2. `$REPO_ROOT/dist/` (if REPO_ROOT is set) 3. `/dist/` (via `git rev-parse`, useful when running from subdirectories) ### `COGLET_WHEEL` Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based prediction server. **Supported values:** Same as `COG_SDK_WHEEL` **Default behavior:** For development builds, auto-detects a wheel in `dist/`. For release builds, installs the latest version from PyPI. Can be overridden with an explicit value. **Examples:** ```console # Use local development wheel $ COGLET_WHEEL=dist cog build # Use specific version from PyPI $ COGLET_WHEEL=pypi:0.1.0 cog build ``` ## Runtime variables ### `COG_NO_UPDATE_CHECK` By default, Cog automatically checks for updates and notifies you if there is a new version available. To disable this behavior, set the `COG_NO_UPDATE_CHECK` environment variable to any value. ```console $ COG_NO_UPDATE_CHECK=1 cog build # runs without automatic update check ``` ### `COG_SETUP_TIMEOUT` Controls the maximum time (in seconds) allowed for the model's `setup()` method to complete. If setup exceeds this timeout, the server will report a setup failure. By default, there is no timeout β€” setup runs indefinitely. Set to `0` to disable the timeout (same as default). Invalid values are ignored with a warning. ```console $ COG_SETUP_TIMEOUT=300 docker run -p 5000:5000 my-model # 5-minute setup timeout ``` ### `COG_CA_CERT` Injects a custom CA certificate into the Docker image during `cog build`. This is useful when building behind a corporate proxy or VPN that uses custom certificate authorities (e.g. Cloudflare WARP). **Supported values:** | Value | Description | | -------------------------------- | ----------------------------------------------------------- | | `/path/to/cert.crt` | Path to a single PEM certificate file | | `/path/to/certs/` | Directory of `.crt` and `.pem` files (all are concatenated) | | `-----BEGIN CERTIFICATE-----...` | Inline PEM certificate | | `LS0tLS1CRUdJTi...` | Base64-encoded PEM certificate | The certificate is installed into the system CA store and the `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` environment variables are set automatically in the built image. **Examples:** ```console # From a file $ COG_CA_CERT=/usr/local/share/ca-certificates/corporate-ca.crt cog build # From a directory of certs $ COG_CA_CERT=/etc/custom-certs/ cog build # Inline (e.g. from a CI secret) $ COG_CA_CERT="$(cat /path/to/cert.pem)" cog build ``` --- # Getting started with your own model This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the [main getting started guide](getting-started.md). ## Prerequisites - **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows. - **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. ## Initialization First, install Cog if you haven't already: **macOS (recommended):** ```sh brew install replicate/tap/cog ``` **Linux or macOS (manual):** ```sh sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog ``` To configure your project for use with Cog, you'll need to add two files: - [`cog.yaml`](yaml.md) defines system requirements, Python package dependencies, etc - [`predict.py`](python.md) describes the prediction interface for your model Use the `cog init` command to generate these files in your project: ```sh $ cd path/to/your/model $ cog init ``` ## Define the Docker environment The `cog.yaml` file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image. For example: ```yaml build: python_version: "3.13" python_requirements: requirements.txt ``` With a `requirements.txt` containing your dependencies: ``` torch==2.6.0 ``` This will generate a Docker image with Python 3.13 and PyTorch 2 installed, for both CPU and GPU, with the correct version of CUDA, and various other sensible best-practices. To run a command inside this environment, prefix it with `cog run`: ``` $ cog run python βœ“ Building Docker image from cog.yaml... Successfully built 8f54020c8981 Running 'python' in Docker with the current directory mounted as a volume... ──────────────────────────────────────────────────────────────────────────────────────── Python 3.13.x (main, ...) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ``` This is handy for ensuring a consistent environment for development or training. With `cog.yaml`, you can also install system packages and other things. [Take a look at the full reference to see what else you can do.](yaml.md) ## Define how to run predictions The next step is to update `predict.py` to define the interface for running predictions on your model. The `predict.py` generated by `cog init` looks something like this: ```python from cog import BasePredictor, Path, Input import torch class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.net = torch.load("weights.pth") def predict(self, image: Path = Input(description="Image to enlarge"), scale: float = Input(description="Factor to scale image by", default=1.5) ) -> Path: """Run a single prediction on the model""" # ... pre-processing ... output = self.net(input) # ... post-processing ... return output ``` Edit your `predict.py` file and fill in the functions with your own model's setup and prediction code. You might need to import parts of your model from another file. You also need to define the inputs to your model as arguments to the `predict()` function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are: - `str`: a string - `int`: an integer - `float`: a floating point number - `bool`: a boolean - `cog.File`: a file-like object representing a file (deprecated β€” use `cog.Path` instead) - `cog.Path`: a path to a file on disk You can provide more information about the input with the `Input()` function, as shown above. It takes these basic arguments: - `description`: A description of what to pass to this input for users of the model - `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional. - `ge`: For `int` or `float` types, the value should be greater than or equal to this number. - `le`: For `int` or `float` types, the value should be less than or equal to this number. - `min_length`: For `str` types, the minimum length of the string. - `max_length`: For `str` types, the maximum length of the string. - `regex`: For `str` types, the string must match this regular expression. - `choices`: For `str` or `int` types, a list of possible values for this input. - `deprecated`: Mark this input as deprecated with a message explaining what to use instead. There are some more advanced options you can pass, too. For more details, [take a look at the prediction interface documentation](python.md). Next, add the line `predict: "predict.py:Predictor"` to your `cog.yaml`, so it looks something like this: ```yaml build: python_version: "3.13" python_requirements: requirements.txt predict: "predict.py:Predictor" ``` That's it! To test this works, try running a prediction on the model: ``` $ cog predict -i image=@input.jpg βœ“ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4 βœ“ Model running in Docker image 664ef88bc1f4 Written output to output.png ``` To pass more inputs to the model, you can add more `-i` options: ``` $ cog predict -i image=@image.jpg -i scale=2.0 ``` In this case it is just a number, not a file, so you don't need the `@` prefix. ## Using GPUs To use GPUs with Cog, add the `gpu: true` option to the `build` section of your `cog.yaml`: ```yaml build: gpu: true ... ``` Cog will use the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using. For more details, [see the `gpu` section of the `cog.yaml` reference](yaml.md#gpu). ## Next steps Next, you might want to take a look at: - [A guide explaining how to deploy a model.](deploy.md) - [The reference for `cog.yaml`](yaml.md) - [The reference for the Python library](python.md) --- # Getting started This guide will walk you through what you can do with Cog by using an example model. > [!TIP] > Using a language model to help you write the code for your new Cog model? > > Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org). ## Prerequisites - **macOS or Linux**. Cog works on macOS and Linux, but does not currently support Windows. - **Docker**. Cog uses Docker to create a container for your model. You'll need to [install Docker](https://docs.docker.com/get-docker/) before you can run Cog. ## Install Cog **macOS (recommended):** ```bash brew install replicate/tap/cog ``` **Linux or macOS (manual):** ```bash sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog sudo xattr -d com.apple.quarantine /usr/local/bin/cog 2>/dev/null || true ``` > [!NOTE] > **macOS: "cannot be opened because the developer cannot be verified"** > > If you downloaded the binary manually (via `curl` or a browser) and see this Gatekeeper warning, run: > > ```bash > sudo xattr -d com.apple.quarantine /usr/local/bin/cog > ``` > > Installing via `brew install replicate/tap/cog` handles this automatically. ## Create a project Let's make a directory to work in: ```bash mkdir cog-quickstart cd cog-quickstart ``` ## Run commands The simplest thing you can do with Cog is run a command inside a Docker environment. The first thing you need to do is create a file called `cog.yaml`: ```yaml build: python_version: "3.13" ``` Then, you can run any command inside this environment. For example, enter ```bash cog run python ``` and you'll get an interactive Python shell: ```none βœ“ Building Docker image from cog.yaml... Successfully built 8f54020c8981 Running 'python' in Docker with the current directory mounted as a volume... ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Python 3.13.x (main, ...) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ``` (Hit Ctrl-D to exit the Python shell.) Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on. ## Run predictions on a model Let's pretend we've trained a model. With Cog, we can define how to run predictions on it in a standard way, so other people can easily run predictions on it without having to hunt around for a prediction script. We need to write some code to describe how predictions are run on the model. Save this to `predict.py`: ```python import os os.environ["TORCH_HOME"] = "." import torch from cog import BasePredictor, Input, Path from PIL import Image from torchvision import models WEIGHTS = models.ResNet50_Weights.IMAGENET1K_V1 class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.model = models.resnet50(weights=WEIGHTS).to(self.device) self.model.eval() def predict(self, image: Path = Input(description="Image to classify")) -> dict: """Run a single prediction on the model""" img = Image.open(image).convert("RGB") preds = self.model(WEIGHTS.transforms()(img).unsqueeze(0).to(self.device)) top3 = preds[0].softmax(0).topk(3) categories = WEIGHTS.meta["categories"] return {categories[i]: p.detach().item() for p, i in zip(*top3)} ``` We also need to point Cog at this, and tell it what Python dependencies to install. Save this to `requirements.txt`: ``` pillow==11.1.0 torch==2.6.0 torchvision==0.21.0 ``` Then update `cog.yaml` to look like this: ```yaml build: python_version: "3.13" python_requirements: requirements.txt predict: "predict.py:Predictor" ``` > [!TIP] > If you have a machine with an NVIDIA GPU attached, add `gpu: true` to the `build` section of your `cog.yaml` to enable GPU acceleration. Let's grab an image to test the model with: ```bash IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg curl $IMAGE_URL > input.jpg ``` Now, let's run the model using Cog: ```bash cog predict -i image=@input.jpg ``` If you see the following output ```json { "tiger_cat": 0.4874822497367859, "tabby": 0.23169134557247162, "Egyptian_cat": 0.09728282690048218 } ``` then it worked! Note: The first time you run `cog predict`, the build process will be triggered to generate a Docker container that can run your model. The next time you run `cog predict` the pre-built container will be used. ## Build an image We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves predictions with an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time predictions. ```bash cog build -t resnet # Building Docker image... # Built resnet:latest ``` You can run this image with `cog predict` by passing the filename as an argument: ```bash cog predict resnet -i image=@input.jpg ``` Or, you can run it with Docker directly, and it'll serve an HTTP server: ```bash docker run -d --rm -p 5000:5000 resnet ``` We can send inputs directly with `curl`: ```bash curl http://localhost:5000/predictions -X POST \ -H 'Content-Type: application/json' \ -d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}' ``` As a shorthand, you can add the Docker image's name as an extra line in `cog.yaml`: ```yaml image: "r8.im/replicate/resnet" ``` Once you've done this, you can use `cog push` to build and push the image to a Docker registry: ```bash cog push # Building r8.im/replicate/resnet... # Pushing r8.im/replicate/resnet... # Pushed! ``` The Docker image is now accessible to anyone or any system that has access to this Docker registry. ## Next steps Those are the basics! Next, you might want to take a look at: - [A guide to help you set up your own model on Cog.](getting-started-own-model.md) - [A guide explaining how to deploy a model.](deploy.md) - [Reference for `cog.yaml`](yaml.md) - [Reference for the Python library](python.md) --- # HTTP API > [!TIP] > For information about how to run the HTTP server, > see [our documentation on deploying models](deploy.md). When you run a Docker image built by Cog, it serves an HTTP API for making predictions. The server supports both synchronous and asynchronous prediction creation: - **Synchronous**: The server waits until the prediction is completed and responds with the result. - **Asynchronous**: The server immediately returns a response and processes the prediction in the background. The client can create a prediction asynchronously by setting the `Prefer: respond-async` header in their request. When provided, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`. > [!NOTE] > The only supported way to receive updates on the status of predictions > started asynchronously is using [webhooks](#webhooks). > Polling for prediction status is not currently supported. You can also use certain server endpoints to create predictions idempotently, such that if a client calls this endpoint more than once with the same ID (for example, due to a network interruption) while the prediction is still running, no new prediction is created. Instead, the client receives a `202 Accepted` response with the initial state of the prediction. --- Here's a summary of the prediction creation endpoints: | Endpoint | Header | Behavior | | ---------------------------------- | ----------------------- | ---------------------------- | | `POST /predictions` | - | Synchronous, non-idempotent | | `POST /predictions` | `Prefer: respond-async` | Asynchronous, non-idempotent | | `PUT /predictions/` | - | Synchronous, idempotent | | `PUT /predictions/` | `Prefer: respond-async` | Asynchronous, idempotent | Choose the endpoint that best fits your needs: - Use synchronous endpoints when you want to wait for the prediction result. - Use asynchronous endpoints when you want to start a prediction and receive updates via webhooks. - Use idempotent endpoints when you need to safely retry requests without creating duplicate predictions. ## Webhooks You can provide a `webhook` parameter in the client request body when creating a prediction. ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async { "input": {"prompt": "A picture of an onion with sunglasses"}, "webhook": "https://example.com/webhook/prediction" } ``` The server makes requests to the provided URL with the current state of the prediction object in the request body at the following times. - `start`: Once, when the prediction starts (`status` is `starting`). - `output`: Each time a predict function generates an output (either once using `return` or multiple times using `yield`) - `logs`: Each time the predict function writes to `stdout` - `completed`: Once, when the prediction reaches a terminal state (`status` is `succeeded`, `canceled`, or `failed`) Webhook requests for `start` and `completed` event types are sent immediately. Webhook requests for `output` and `logs` event types are sent at most once every 500ms. This interval is not configurable. By default, the server sends requests for all event types. Clients can specify which events trigger webhook requests with the `webhook_events_filter` parameter in the prediction request body. For example, the following request specifies that webhooks are sent by the server only at the start and end of the prediction: ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async { "input": {"prompt": "A picture of an onion with sunglasses"}, "webhook": "https://example.com/webhook/prediction", "webhook_events_filter": ["start", "completed"] } ``` ## Generating unique prediction IDs Endpoints for creating and canceling a prediction idempotently accept a `prediction_id` parameter in their path. By default, the server runs one prediction at a time, but this can be increased with the [`concurrency.max`](yaml.md#concurrency) setting. When all prediction slots are in use, the server returns `409 Conflict`. The client should ensure prediction slots are available before creating a new prediction with a different ID. Clients are responsible for providing unique prediction IDs. We recommend generating a UUIDv4 or [UUIDv7](https://uuid7.com), base32-encoding that value, and removing padding characters (`==`). This produces a random identifier that is 26 ASCII characters long. ```python >> from uuid import uuid4 >> from base64 import b32encode >> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=') 'wjx3whax6rf4vphkegkhcvpv6a' ``` ## File uploads A model's `predict` function can produce file output by yielding or returning a `cog.Path` or `cog.File` value. By default, files are returned as a base64-encoded [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs). ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 { "input": {"prompt": "A picture of an onion with sunglasses"}, } ``` ```http HTTP/1.1 200 OK Content-Type: application/json { "status": "succeeded", "output": "data:image/png;base64,..." } ``` When creating a prediction synchronously, the client can configure a base URL to upload output files to instead by setting the `output_file_prefix` parameter in the request body: ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 { "input": {"prompt": "A picture of an onion with sunglasses"}, "output_file_prefix": "https://example.com/upload", } ``` When the model produces a file output, the server sends the following request to upload the file to the configured URL: ```http PUT /upload HTTP/1.1 Host: example.com Content-Type: multipart/form-data --boundary Content-Disposition: form-data; name="file"; filename="image.png" Content-Type: image/png --boundary-- ``` If the upload succeeds, the server responds with output: ```http HTTP/1.1 200 OK Content-Type: application/json { "status": "succeeded", "output": "http://example.com/upload/image.png" } ``` If the upload fails, the server responds with an error. > [!IMPORTANT] > File uploads for predictions created asynchronously > require `--upload-url` to be specified when starting the HTTP server. ## Endpoints ### `GET /` Returns a discovery document listing available API endpoints, the OpenAPI schema URL, and version information. ```http GET / HTTP/1.1 ``` ```http HTTP/1.1 200 OK Content-Type: application/json { "cog_version": "0.17.0", "docs_url": "/docs", "openapi_url": "/openapi.json", "shutdown_url": "/shutdown", "healthcheck_url": "/health-check", "predictions_url": "/predictions", "predictions_idempotent_url": "/predictions/{prediction_id}", "predictions_cancel_url": "/predictions/{prediction_id}/cancel" } ``` If training is configured, the response also includes `trainings_url`, `trainings_idempotent_url`, and `trainings_cancel_url` fields. ### `GET /health-check` Returns the current health status of the model container. This endpoint always responds with `200 OK` β€” check the `status` field in the response body to determine readiness. The response body is a JSON object with the following fields: - `status`: One of the following values: - `STARTING`: The model's `setup()` method is still running. - `READY`: The model is ready to accept predictions. - `BUSY`: The model is ready but all prediction slots are in use. - `SETUP_FAILED`: The model's `setup()` method raised an exception. - `DEFUNCT`: The model encountered an unrecoverable error. - `UNHEALTHY`: The model is ready but a user-defined `healthcheck()` method returned `False`. - `setup`: Setup phase details (included once setup has started): - `started_at`: ISO 8601 timestamp of when setup began. - `completed_at`: ISO 8601 timestamp of when setup finished (if complete). - `status`: One of `starting`, `succeeded`, or `failed`. - `logs`: Output captured during setup. - `version`: Runtime version information: - `coglet`: Coglet version. - `cog`: Cog Python SDK version (if available). - `python`: Python version (if available). - `user_healthcheck_error`: Error message from a user-defined `healthcheck()` method (if applicable). ```http GET /health-check HTTP/1.1 ``` ```http HTTP/1.1 200 OK Content-Type: application/json { "status": "READY", "setup": { "started_at": "2025-01-01T00:00:00.000000+00:00", "completed_at": "2025-01-01T00:00:05.000000+00:00", "status": "succeeded", "logs": "" }, "version": { "coglet": "0.17.0", "cog": "0.14.0", "python": "3.13.0" } } ``` ### `GET /openapi.json` The [OpenAPI](https://swagger.io/specification/) specification of the API, which is derived from the input and output types specified in your model's [Predictor](python.md) and [Training](training.md) objects. ### `POST /predictions` Makes a single prediction. The request body is a JSON object with the following fields: - `input`: A JSON object with the same keys as the [arguments to the `predict()` function](python.md). Any `File` or `Path` inputs are passed as URLs. The response body is a JSON object with the following fields: - `status`: Either `succeeded` or `failed`. - `output`: The return value of the `predict()` function. - `error`: If `status` is `failed`, the error message. - `metrics`: An object containing prediction metrics. Always includes `predict_time` (elapsed seconds). May also include custom metrics recorded by the model using [`self.record_metric()`](python.md#metrics). ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 { "input": { "image": "https://example.com/image.jpg", "text": "Hello world!" } } ``` ```http HTTP/1.1 200 OK Content-Type: application/json { "status": "succeeded", "output": "data:image/png;base64,...", "metrics": { "predict_time": 4.52 } } ``` If the client sets the `Prefer: respond-async` header in their request, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`. ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async { "input": {"prompt": "A picture of an onion with sunglasses"} } ``` ```http HTTP/1.1 202 Accepted Content-Type: application/json { "status": "starting", } ``` ### `PUT /predictions/` Make a single prediction. This is the idempotent version of the `POST /predictions` endpoint. ```http PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1 Content-Type: application/json; charset=utf-8 { "input": {"prompt": "A picture of an onion with sunglasses"} } ``` ```http HTTP/1.1 200 OK Content-Type: application/json { "status": "succeeded", "output": "data:image/png;base64,..." } ``` If the client sets the `Prefer: respond-async` header in their request, the server responds immediately after starting the prediction with `202 Accepted` status and a prediction object in status `processing`. ```http PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async { "input": {"prompt": "A picture of an onion with sunglasses"} } ``` ```http HTTP/1.1 202 Accepted Content-Type: application/json { "id": "wjx3whax6rf4vphkegkhcvpv6a", "status": "starting" } ``` ### `POST /predictions//cancel` A client can cancel an asynchronous prediction by making a `POST /predictions//cancel` request using the prediction `id` provided when the prediction was created. For example, if the client creates a prediction by sending the request: ```http POST /predictions HTTP/1.1 Content-Type: application/json; charset=utf-8 Prefer: respond-async { "id": "abcd1234", "input": {"prompt": "A picture of an onion with sunglasses"}, } ``` The client can cancel the prediction by sending the request: ```http POST /predictions/abcd1234/cancel HTTP/1.1 ``` A prediction cannot be canceled if it's created synchronously, without the `Prefer: respond-async` header, or created without a provided `id`. If a prediction exists with the provided `id`, the server responds with status `200 OK`. Otherwise, the server responds with status `404 Not Found`. When a prediction is canceled, Cog raises [`CancelationException`](python.md#cancelationexception) in sync predictors (or `asyncio.CancelledError` in async predictors). This exception may be caught by the model to perform necessary cleanup. The cleanup should be brief, ideally completing within a few seconds. After cleanup, the exception must be re-raised using a bare `raise` statement. Failure to re-raise the exception may result in the termination of the container. ```python from cog import BasePredictor, CancelationException, Input, Path class Predictor(BasePredictor): def predict(self, image: Path = Input(description="Image to process")) -> Path: try: return self.process(image) except CancelationException: self.cleanup() raise # always re-raise ``` --- # Notebooks Cog plays nicely with Jupyter notebooks. ## Install the jupyterlab Python package First, add `jupyterlab` to your `requirements.txt` file and reference it in [`cog.yaml`](yaml.md): `requirements.txt`: ``` jupyterlab ``` `cog.yaml`: ```yaml build: python_requirements: requirements.txt ``` ## Run a notebook Cog can run notebooks in the environment you've defined in `cog.yaml` with the following command: ```sh cog run -p 8888 jupyter lab --allow-root --ip=0.0.0.0 ``` ## Use notebook code in your predictor You can also import a notebook into your Cog [Predictor](python.md) file. First, export your notebook to a Python file: ```sh jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py ``` Then import the exported Python script into your `predict.py` file. Any functions or variables defined in your notebook will be available to your predictor: ```python from cog import BasePredictor, Input import my_notebook class Predictor(BasePredictor): def predict(self, prompt: str = Input(description="string prompt")) -> str: output = my_notebook.do_stuff(prompt) return output ``` --- # Private package registry This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup. ## `pip.conf` In a directory outside your Cog project, create a `pip.conf` file with an `index-url` set to the registry's URL with embedded credentials. ```conf [global] index-url = https://username:password@my-private-registry.com ``` > **Warning** > Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in `.gitignore` and `.dockerignore`. ## `cog.yaml` In your project's [`cog.yaml`](yaml.md) file, add a setup command to run `pip install` with a secret configuration file mounted to `/etc/pip.conf`. ```yaml build: run: - command: pip install mounts: - type: secret id: pip target: /etc/pip.conf ``` ## Build When building or pushing your model with Cog, pass the `--secret` option with an `id` matching the one specified in `cog.yaml`, along with a path to your local `pip.conf` file. ```console $ cog build --secret id=pip,source=/path/to/pip.conf ``` Using a secret mount allows the private registry credentials to be securely passed to the `pip install` setup command, without baking them into the Docker image. > **Warning** > If you run `cog build` or `cog push` and then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change the `id` value in `cog.yaml` and the `--secret` option, or pass the `--no-cache` option to bypass the cache entirely. --- # Prediction interface reference This document defines the API of the `cog` Python module, which is used to define the interface for running predictions on your model. > [!TIP] > Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `predict.py` file that can be used as a starting point for setting up your model. > [!TIP] > Using a language model to help you write the code for your new Cog model? > > Feed it [https://cog.run/llms.txt](https://cog.run/llms.txt), which has all of Cog's documentation bundled into a single file. To learn more about this format, check out [llmstxt.org](https://llmstxt.org). ## Contents - [Contents](#contents) - [`BasePredictor`](#basepredictor) - [`Predictor.setup()`](#predictorsetup) - [`Predictor.predict(**kwargs)`](#predictorpredictkwargs) - [`async` predictors and concurrency](#async-predictors-and-concurrency) - [`Input(**kwargs)`](#inputkwargs) - [Deprecating inputs](#deprecating-inputs) - [Output](#output) - [Returning an object](#returning-an-object) - [Returning a list](#returning-a-list) - [Optional properties](#optional-properties) - [Streaming output](#streaming-output) - [Metrics](#metrics) - [Recording metrics](#recording-metrics) - [Accumulation modes](#accumulation-modes) - [Dot-path keys](#dot-path-keys) - [Type safety](#type-safety) - [Cancellation](#cancellation) - [`CancelationException`](#cancelationexception) - [Input and output types](#input-and-output-types) - [Primitive types](#primitive-types) - [`cog.Path`](#cogpath) - [`cog.File` (deprecated)](#cogfile-deprecated) - [`cog.Secret`](#cogsecret) - [Wrapper types](#wrapper-types) - [`Optional`](#optional) - [`list`](#list) - [`dict`](#dict) - [Structured output with `BaseModel`](#structured-output-with-basemodel) - [Using `cog.BaseModel`](#using-cogbasemodel) - [Using Pydantic `BaseModel`](#using-pydantic-basemodel) - [`BaseModel` field types](#basemodel-field-types) - [Type limitations](#type-limitations) ## `BasePredictor` You define how Cog runs predictions on your model by defining a class that inherits from `BasePredictor`. It looks something like this: ```python from cog import BasePredictor, Path, Input import torch class Predictor(BasePredictor): def setup(self): """Load the model into memory to make running multiple predictions efficient""" self.model = torch.load("weights.pth") def predict(self, image: Path = Input(description="Image to enlarge"), scale: float = Input(description="Factor to scale image by", default=1.5) ) -> Path: """Run a single prediction on the model""" # ... pre-processing ... output = self.model(image) # ... post-processing ... return output ``` Your Predictor class should define two methods: `setup()` and `predict()`. ### `Predictor.setup()` Prepare the model so multiple predictions run efficiently. Use this _optional_ method to include expensive one-off operations like loading trained models, instantiating data transformations, etc. Many models use this method to download their weights (e.g. using [`pget`](https://github.com/replicate/pget)). This has some advantages: - Smaller image sizes - Faster build times - Faster pushes and inference on [Replicate](https://replicate.com) However, this may also significantly increase your `setup()` time. As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your `cog.yaml` and ensure they are not excluded in your `.dockerignore` file. While this will increase your image size and build time, it offers other advantages: - Faster `setup()` time - Ensures idempotency and reduces your model's reliance on external systems - Preserves reproducibility as your model will be self-contained in the image > When using this method, you should use the `--separate-weights` flag on `cog build` to store weights in a [separate layer](https://github.com/replicate/cog/blob/12ac02091d93beebebed037f38a0c99cd8749806/docs/getting-started.md?plain=1#L219). ### `Predictor.predict(**kwargs)` Run a single prediction. This _required_ method is where you call the model that was loaded during `setup()`, but you may also want to add pre- and post-processing code here. The `predict()` method takes an arbitrary list of named arguments, where each argument name must correspond to an [`Input()`](#inputkwargs) annotation. `predict()` can return strings, numbers, [`cog.Path`](#cogpath) objects representing files on disk, or lists or dicts of those types. You can also define a custom [`BaseModel`](#structured-output-with-basemodel) for structured return types. See [Input and output types](#input-and-output-types) for the full list of supported types. ## `async` predictors and concurrency > Added in cog 0.14.0. You may specify your `predict()` method as `async def predict(...)`. In addition, if you have an async `predict()` function you may also have an async `setup()` function: ```py class Predictor(BasePredictor): async def setup(self) -> None: print("async setup is also supported...") async def predict(self) -> str: print("async predict"); return "hello world"; ``` Models that have an async `predict()` function can run predictions concurrently, up to the limit specified by [`concurrency.max`](yaml.md#max) in cog.yaml. Attempting to exceed this limit will return a 409 Conflict response. ## `Input(**kwargs)` Use cog's `Input()` function to define each of the parameters in your `predict()` method: ```py class Predictor(BasePredictor): def predict(self, image: Path = Input(description="Image to enlarge"), scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0) ) -> Path: ``` The `Input()` function takes these keyword arguments: - `description`: A description of what to pass to this input for users of the model. - `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional. - `ge`: For `int` or `float` types, the value must be greater than or equal to this number. - `le`: For `int` or `float` types, the value must be less than or equal to this number. - `min_length`: For `str` types, the minimum length of the string. - `max_length`: For `str` types, the maximum length of the string. - `regex`: For `str` types, the string must match this regular expression. - `choices`: For `str` or `int` types, a list of possible values for this input. - `deprecated`: (optional) If set to `True`, marks this input as deprecated. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. See [Deprecating inputs](#deprecating-inputs). Each parameter of the `predict()` method must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](#input-and-output-types) for the full list of supported types. Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely: ```py class Predictor(BasePredictor): def predict(self, prompt: str = "default prompt", # this is valid iterations: int # also valid ) -> str: # ... ``` ## Deprecating inputs You can mark an input as deprecated by passing `deprecated=True` to the `Input()` function. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. This is useful when you want to phase out an input without breaking existing clients immediately: ```py from cog import BasePredictor, Input class Predictor(BasePredictor): def predict(self, text: str = Input(description="Some deprecated text", deprecated=True), prompt: str = Input(description="Prompt for the model") ) -> str: # ... return prompt ``` ## Output Cog predictors can return a simple data type like a string, number, float, or boolean. Use Python's `-> ` syntax to annotate the return type. Here's an example of a predictor that returns a string: ```py from cog import BasePredictor class Predictor(BasePredictor): def predict(self) -> str: return "hello" ``` ### Returning an object To return a complex object with multiple values, define an `Output` object with multiple fields to return from your `predict()` method: ```py from cog import BasePredictor, BaseModel, File class Output(BaseModel): file: File text: str class Predictor(BasePredictor): def predict(self) -> Output: return Output(text="hello", file=io.StringIO("hello")) ``` Each of the output object's properties must be one of the supported output types. For the full list, see [Input and output types](#input-and-output-types). ### Returning a list The `predict()` method can return a list of any of the supported output types. Here's an example that outputs multiple files: ```py from cog import BasePredictor, Path class Predictor(BasePredictor): def predict(self) -> list[Path]: predictions = ["foo", "bar", "baz"] output = [] for i, prediction in enumerate(predictions): out_path = Path(f"/tmp/out-{i}.txt") with out_path.open("w") as f: f.write(prediction) output.append(out_path) return output ``` Files are named in the format `output..`, e.g. `output.0.txt`, `output.1.txt`, and `output.2.txt` from the example above. ### Optional properties To conditionally omit properties from the Output object, define them using `typing.Optional`: ```py from cog import BaseModel, BasePredictor, Path from typing import Optional class Output(BaseModel): score: Optional[float] file: Optional[Path] class Predictor(BasePredictor): def predict(self) -> Output: if condition: return Output(score=1.5) else: return Output(file=io.StringIO("hello")) ``` ### Streaming output Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated. To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[]` where `` can be one of `str`, `int`, `float`, `bool`, or `cog.Path`. ```py from cog import BasePredictor, Path from typing import Iterator class Predictor(BasePredictor): def predict(self) -> Iterator[Path]: done = False while not done: output_path, done = do_stuff() yield Path(output_path) ``` If you have an [async `predict()` method](#async-predictors-and-concurrency), use `AsyncIterator` from the `typing` module: ```py from typing import AsyncIterator from cog import BasePredictor, Path class Predictor(BasePredictor): async def predict(self) -> AsyncIterator[Path]: done = False while not done: output_path, done = do_stuff() yield Path(output_path) ``` If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings. ```py from cog import BasePredictor, Path, ConcatenateIterator class Predictor(BasePredictor): def predict(self) -> ConcatenateIterator[str]: tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] for token in tokens: yield token + " " ``` Or for async `predict()` methods, use `AsyncConcatenateIterator`: ```py from cog import BasePredictor, Path, AsyncConcatenateIterator class Predictor(BasePredictor): async def predict(self) -> AsyncConcatenateIterator[str]: tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] for token in tokens: yield token + " " ``` ## Metrics You can record custom metrics from your `predict()` function to track model-specific data like token counts, timing breakdowns, or confidence scores. Metrics are included in the prediction response alongside the output. ### Recording metrics Use `self.record_metric()` inside your `predict()` method: ```python from cog import BasePredictor class Predictor(BasePredictor): def predict(self, prompt: str) -> str: self.record_metric("temperature", 0.7) self.record_metric("token_count", 42) result = self.model.generate(prompt) return result ``` For advanced use (dict-style access, deleting metrics), use `self.scope`: ```python self.scope.metrics["token_count"] = 42 del self.scope.metrics["token_count"] ``` Metrics appear in the prediction response `metrics` field: ```json { "status": "succeeded", "output": "...", "metrics": { "temperature": 0.7, "token_count": 42, "predict_time": 1.23 } } ``` The `predict_time` metric is always added automatically by the runtime. If you set `predict_time` yourself, the runtime value takes precedence. Supported value types are `bool`, `int`, `float`, `str`, `list`, and `dict`. Setting a metric to `None` deletes it. ### Accumulation modes By default, recording a metric replaces any previous value for that key. You can use accumulation modes to build up values across multiple calls: ```python # Increment a counter (adds to the existing numeric value) self.record_metric("token_count", 1, mode="incr") self.record_metric("token_count", 1, mode="incr") # Result: {"token_count": 2} # Append to an array self.record_metric("steps", "preprocessing", mode="append") self.record_metric("steps", "inference", mode="append") # Result: {"steps": ["preprocessing", "inference"]} # Replace (default behavior) self.record_metric("status", "running", mode="replace") self.record_metric("status", "done", mode="replace") # Result: {"status": "done"} ``` The `mode` parameter accepts `"replace"` (default), `"incr"`, or `"append"`. ### Dot-path keys Use dot-separated keys to create nested objects in the metrics output: ```python self.record_metric("timing.preprocess", 0.12) self.record_metric("timing.inference", 0.85) ``` This produces nested JSON: ```json { "metrics": { "timing": { "preprocess": 0.12, "inference": 0.85 }, "predict_time": 1.23 } } ``` ### Type safety Once a metric key has been assigned a value of a certain type, it cannot be changed to a different type without deleting it first. This prevents accidental type mismatches when using accumulation modes: ```python self.record_metric("count", 1) # This would raise an error β€” "count" is an int, not a string: # self.record_metric("count", "oops") # Delete first, then set with new type: del self.scope.metrics["count"] self.record_metric("count", "now a string") ``` Outside an active prediction, `self.record_metric()` and `self.scope` are silent no-ops β€” no need for `None` checks. ## Cancellation When a prediction is canceled (via the [cancel HTTP endpoint](http.md#post-predictionsprediction_idcancel) or a dropped connection), the Cog runtime interrupts the running `predict()` function. The exception raised depends on whether the predictor is sync or async: | Predictor type | Exception raised | | --------------------------- | ------------------------ | | Sync (`def predict`) | `CancelationException` | | Async (`async def predict`) | `asyncio.CancelledError` | ### `CancelationException` ```python from cog import CancelationException ``` `CancelationException` is raised in **sync** predictors when a prediction is cancelled. It is a `BaseException` subclass β€” **not** an `Exception` subclass. This means bare `except Exception` blocks in your predict code will not accidentally catch it, matching the behavior of `KeyboardInterrupt` and `asyncio.CancelledError`. You do **not** need to handle this exception in normal predictor code β€” the runtime manages cancellation automatically. However, if you need to run cleanup logic when a prediction is cancelled, you can catch it explicitly: ```python from cog import BasePredictor, CancelationException, Path class Predictor(BasePredictor): def predict(self, image: Path) -> Path: try: return self.process(image) except CancelationException: self.cleanup() raise # always re-raise ``` > [!WARNING] > You **must** re-raise `CancelationException` after cleanup. Swallowing it will prevent the runtime from marking the prediction as canceled, and may result in the termination of the container. `CancelationException` is available as: - `cog.CancelationException` (recommended) - `cog.exceptions.CancelationException` For **async** predictors, cancellation follows standard Python async conventions and raises `asyncio.CancelledError` instead. ## Input and output types Each parameter of the `predict()` method must be annotated with a type. The method's return type must also be annotated. ### Primitive types These types can be used directly as input parameter types and output return types: | Type | Description | JSON Schema | |------|-------------|-------------| | `str` | A string | `string` | | `int` | An integer | `integer` | | `float` | A floating-point number | `number` | | `bool` | A boolean | `boolean` | | [`cog.Path`](#cogpath) | A path to a file on disk | `string` (format: `uri`) | | [`cog.File`](#cogfile-deprecated) | A file-like object (deprecated) | `string` (format: `uri`) | | [`cog.Secret`](#cogsecret) | A string containing sensitive information | `string` (format: `password`) | ### `cog.Path` `cog.Path` is used to get files in and out of models. It represents a _path to a file on disk_. `cog.Path` is a subclass of Python's [`pathlib.Path`](https://docs.python.org/3/library/pathlib.html#basic-use) and can be used as a drop-in replacement. Any `os.PathLike` subclass is also accepted as an input type and treated as `cog.Path`. For models that return a `cog.Path` object, the prediction output returned by Cog's built-in HTTP server will be a URL. This example takes an input file, resizes it, and returns the resized image: ```python import tempfile from cog import BasePredictor, Input, Path class Predictor(BasePredictor): def predict(self, image: Path = Input(description="Image to enlarge")) -> Path: upscaled_image = do_some_processing(image) # To output cog.Path objects the file needs to exist, so create a temporary file first. # This file will automatically be deleted by Cog after it has been returned. output_path = Path(tempfile.mkdtemp()) / "upscaled.png" upscaled_image.save(output_path) return Path(output_path) ``` ### `cog.File` (deprecated) > [!WARNING] > `cog.File` is deprecated and will be removed in a future version of Cog. Use [`cog.Path`](#cogpath) instead. `cog.File` represents a _file handle_. For models that return a `cog.File` object, the prediction output returned by Cog's built-in HTTP server will be a URL. ```python from cog import BasePredictor, File, Input from PIL import Image class Predictor(BasePredictor): def predict(self, source_image: File = Input(description="Image to enlarge")) -> File: pillow_img = Image.open(source_image) upscaled_image = do_some_processing(pillow_img) return File(upscaled_image) ``` ### `cog.Secret` `cog.Secret` signifies that an input holds sensitive information like a password or API token. `cog.Secret` redacts its contents in string representations to prevent accidental disclosure. Access the underlying value with `get_secret_value()`. ```python from cog import BasePredictor, Secret class Predictor(BasePredictor): def predict(self, api_token: Secret) -> None: # Prints '**********' print(api_token) # Use get_secret_value method to see the secret's content. print(api_token.get_secret_value()) ``` A predictor's `Secret` inputs are represented in OpenAPI with the following schema: ```json { "type": "string", "format": "password", "x-cog-secret": true } ``` Models uploaded to Replicate treat secret inputs differently throughout its system. When you create a prediction on Replicate, any value passed to a `Secret` input is redacted after being sent to the model. > [!WARNING] > Passing secret values to untrusted models can result in > unintended disclosure, exfiltration, or misuse of sensitive data. ### Wrapper types Cog supports wrapper types that modify how a primitive type is treated. #### `Optional` Use `Optional[T]` or `T | None` (Python 3.10+) to mark an input as optional. Optional inputs default to `None` if not provided. ```python from typing import Optional from cog import BasePredictor, Input class Predictor(BasePredictor): def predict(self, prompt: Optional[str] = Input(description="Input prompt"), seed: int | None = Input(description="Random seed", default=None), ) -> str: if prompt is None: return "hello" return "hello " + prompt ``` Prefer `Optional[T]` or `T | None` over `str = Input(default=None)` for inputs that can be `None`. This lets type checkers warn about error-prone `None` values: ```python # Bad: type annotation says str but value can be None def predict(self, prompt: str = Input(default=None)) -> str: return "hello" + prompt # TypeError at runtime if prompt is None # Good: type annotation matches actual behavior def predict(self, prompt: Optional[str] = Input(description="prompt")) -> str: if prompt is None: return "hello" return "hello " + prompt ``` > [!NOTE] > `Optional[T]` is supported in `BaseModel` output fields but **not** as a top-level return type. Use a `BaseModel` with optional fields instead. #### `list` Use `list[T]` or `List[T]` to accept or return a list of values. `T` can be a supported Cog type, but nested container types are not supported. **As an input type:** ```py from cog import BasePredictor, Path class Predictor(BasePredictor): def predict(self, paths: list[Path]) -> str: output_parts = [] for path in paths: with open(path) as f: output_parts.append(f.read()) return "".join(output_parts) ``` With `cog predict`, repeat the input name to pass multiple values: ```bash $ echo test1 > 1.txt $ echo test2 > 2.txt $ cog predict -i paths=@1.txt -i paths=@2.txt ``` **As an output type:** ```py from cog import BasePredictor, Path class Predictor(BasePredictor): def predict(self) -> list[Path]: predictions = ["foo", "bar", "baz"] output = [] for i, prediction in enumerate(predictions): out_path = Path(f"/tmp/out-{i}.txt") with out_path.open("w") as f: f.write(prediction) output.append(out_path) return output ``` Files are named in the format `output..`, e.g. `output.0.txt`, `output.1.txt`, `output.2.txt`. #### `dict` Use `dict` to accept or return an opaque JSON object. The value is passed through as-is without type validation. ```python from cog import BasePredictor, Input class Predictor(BasePredictor): def predict(self, params: dict = Input(description="Arbitrary JSON parameters"), ) -> dict: return {"greeting": "hello", "params": params} ``` > [!NOTE] > `dict` inputs and outputs are represented as `{"type": "object"}` in the OpenAPI schema with no additional structure. For structured data with validated fields, use a [`BaseModel`](#structured-output-with-basemodel) instead. ### Structured output with `BaseModel` To return a complex object with multiple typed fields, define a class that inherits from `cog.BaseModel` or Pydantic's `BaseModel` and use it as your return type. #### Using `cog.BaseModel` `cog.BaseModel` subclasses are automatically converted to Python dataclasses. Define fields using standard type annotations: ```python from typing import Optional from cog import BasePredictor, BaseModel, Path class Output(BaseModel): text: str confidence: float image: Optional[Path] class Predictor(BasePredictor): def predict(self, prompt: str) -> Output: result = self.model.generate(prompt) return Output( text=result.text, confidence=result.score, image=None, ) ``` The output class can have any name β€” it does not need to be called `Output`: ```python from cog import BaseModel class SegmentationResult(BaseModel): success: bool error: Optional[str] segmented_image: Optional[Path] ``` #### Using Pydantic `BaseModel` If you already use Pydantic v2 in your model, you can use a Pydantic `BaseModel` subclass directly as the output type: ```python from pydantic import BaseModel as PydanticBaseModel from cog import BasePredictor class Result(PydanticBaseModel): name: str score: float tags: list[str] class Predictor(BasePredictor): def predict(self, prompt: str) -> Result: return Result(name="example", score=0.95, tags=["fast", "accurate"]) ``` #### `BaseModel` field types Fields in a `BaseModel` output support these types: | Type | Example | |------|---------| | `str`, `int`, `float`, `bool` | `score: float` | | `cog.Path` | `image: Path` | | `cog.File` | `data: File` (deprecated) | | `cog.Secret` | `token: Secret` | | `Optional[T]` | `error: Optional[str]` | | `list[T]` | `tags: list[str]` | ### Type limitations The following type patterns are **not** supported: - **Nested generics**: `list[list[str]]`, `list[Optional[str]]`, `Optional[list[str]]` are not supported. - **Union types beyond Optional**: `str | int`, `Union[str, int, None]` β€” only `Optional[T]` (i.e. `T | None`) is supported. - **`Optional` as a top-level return type**: `-> Optional[str]` is not allowed. Use a `BaseModel` with optional fields instead. - **Nested `BaseModel` fields**: A `BaseModel` field typed as another `BaseModel` is not supported in Cog's type system for schema generation. - **Tuple, Set, or other collection types**: Only `list` and `dict` are supported as collection types. --- # Training interface reference > [!WARNING] > The `cog train` command is deprecated and will be removed in the next version of Cog. The training API described below may still be used with the HTTP API's `/trainings` endpoint, but the CLI command is no longer recommended for new projects. Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fine-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2). ## How it works If you've used Cog before, you've probably seen the [Predictor](./python.md) class, which defines the interface for creating predictions against your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights. `cog.yaml`: ```yaml build: python_version: "3.13" train: "train.py:train" ``` `train.py`: ```python from cog import BasePredictor, File import io def train(param: str) -> File: return io.StringIO("hello " + param) ``` Then you can run it like this: ``` $ cog train -i param=train ... $ cat weights hello train ``` You can also use classes if you want to run many model trainings and save on setup time. This works the same way as the [Predictor](./python.md) class with the only difference being the `train` method. `cog.yaml`: ```yaml build: python_version: "3.13" train: "train.py:Trainer" ``` `train.py`: ```python from cog import BasePredictor, File import io class Trainer: def setup(self) -> None: self.base_model = ... # Load a big base model def train(self, param: str) -> File: return self.base_model.train(param) # Train on top of a base model ``` ## `Input(**kwargs)` Use Cog's `Input()` function to define each of the parameters in your `train()` function: ```py from cog import Input, Path def train( train_data: Path = Input(description="HTTPS URL of a file containing training data"), learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0), seed: int = Input(description="random seed to use for training", default=None) ) -> str: return "hello, weights" ``` The `Input()` function takes these keyword arguments: - `description`: A description of what to pass to this input for users of the model. - `default`: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to `None`, the input is optional. - `ge`: For `int` or `float` types, the value must be greater than or equal to this number. - `le`: For `int` or `float` types, the value must be less than or equal to this number. - `min_length`: For `str` types, the minimum length of the string. - `max_length`: For `str` types, the maximum length of the string. - `regex`: For `str` types, the string must match this regular expression. - `choices`: For `str` or `int` types, a list of possible values for this input. Each parameter of the `train()` function must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](./python.md#input-and-output-types) for the full list of supported types. Using the `Input` function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely: ```py def train(self, training_data: str = "foo bar", # this is valid iterations: int # also valid ) -> str: # ... ``` ## Training Output Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a `TrainingOutput` object with multiple fields to return from your `train()` function, and specify it as the return type for the train function using Python's `->` return type annotation: ```python from cog import BaseModel, Input, Path class TrainingOutput(BaseModel): weights: Path def train( train_data: Path = Input(description="HTTPS URL of a file containing training data"), learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0), seed: int = Input(description="random seed to use for training", default=42) ) -> TrainingOutput: weights_file = generate_weights("...") return TrainingOutput(weights=Path(weights_file)) ``` ## Testing If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a `COG_WEIGHTS` environment variable when running `predict`: ```console cog predict -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK" ``` --- # Using `cog` on Windows 11 with WSL 2 - [0. Prerequisites](#0-prerequisites) - [1. Install the GPU driver](#1-install-the-gpu-driver) - [2. Unlocking features](#2-unlocking-features) - [2.1. Unlock WSL2](#21-unlock-wsl2) - [2.2. Unlock virtualization](#22-unlock-virtualization) - [2.3. Reboot](#23-reboot) - [3. Update MS Linux kernel](#3-update-ms-linux-kernel) - [4. Configure WSL 2](#4-configure-wsl-2) - [5. Configure CUDA WSL-Ubuntu Toolkit](#5-configure-cuda-wsl-ubuntu-toolkit) - [6. Install Docker](#6-install-docker) - [7. Install `cog` and pull an image](#7-install-cog-and-pull-an-image) - [8. Run a model in WSL 2](#8-run-a-model-in-wsl-2) - [9. References](#9-references) Running cog on Windows is now possible thanks to WSL 2. Follow this guide to enable WSL 2 and GPU passthrough on Windows 11. **Windows 10 is not officially supported, as you need to be on an insider build in order to use GPU passthrough.** ## 0. Prerequisites Before beginning installation, make sure you have: - Windows 11. - NVIDIA GPU. - RTX 2000/3000 series - Kesler/Tesla/Volta/Ampere series - Other configurations are not guaranteed to work. ## 1. Install the GPU driver Per NVIDIA, the first order of business is to install the latest Game Ready drivers for your NVIDIA GPU. I have an NVIDIA RTX 2070 Super, so filled out the form as such: ![a form showing the correct model number selected for an RTX 2070 Super](images/nvidia_driver_select.png) Click "search", and follow the dialogue to download and install the driver. Restart your computer once the driver has finished installation. ## 2. Unlocking features Open Windows Terminal as an administrator. - Use start to search for "Terminal" - Right click -> Run as administrator... Run the following powershell command to enable the Windows Subsystem for Linux and Virtual Machine Platform capabilities. ### 2.1. Unlock WSL2 ```powershell dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart ``` If you see an error about permissions, make sure the terminal you are using is run as an administrator and that you have an account with administrator-level privileges. ### 2.2. Unlock virtualization ```powershell dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart ``` If this command fails, make sure to [enable virtualization capabilities](https://docs.microsoft.com/en-us/windows/wsl/troubleshooting#error-0x80370102-the-virtual-machine-could-not-be-started-because-a-required-feature-is-not-installed) in your computer's BIOS/UEFI. A successful output will print `The operation completed successfully.` ![Output from running the above commands successfully. Should read "The operation completed successfully".](images/enable_feature_success.png) ### 2.3. Reboot Before moving forward, make sure you reboot your computer so that Windows 11 will have WSL2 and virtualization available to it. ## 3. Update MS Linux kernel Download and run the [WSL2 Linux kernel update package for x64 machines](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi) msi installer. When prompted for elevated permissions, click 'yes' to approve the installation. To ensure you are using the correct WSL kernel, `open Windows Terminal as an administrator` and enter: ```powershell wsl cat /proc/version ``` This will return a complicated string such as: ```sh Linux version 5.10.102.1-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) ``` The version we are interested in is `Linux version 5.10.102.1`. At this point, you should have updated your kernel to be at least `Linux version 5.10.43.3`. If you can't get the correct kernel version to show: Open `Settings` β†’ `Windows Update` β†’ `Advanced options` and ensure `Receive updates for other Microsoft products` is enabled. Then go to `Windows Update` again and click `Check for updates`. ## 4. Configure WSL 2 First, configure Windows to use the virtualization-based version of WSL (version 2) by default. In a Windows Terminal with administrator privileges, type the following: ```powershell wsl --set-default-version 2 ``` Now, you will need to go to the Microsoft Store and [Download Ubuntu 18.04](https://www.microsoft.com/store/apps/9N9TNGVNDL3Q) ![Screenshot showing the "Ubuntu" store page](https://docs.microsoft.com/en-us/windows/wsl/media/ubuntustore.png) Launch the "Ubuntu" app available in your Start Menu. Linux will require its own user account and password, which you will need to enter now: ![a terminal showing input for user account info on WSL 2](https://docs.microsoft.com/en-us/windows/wsl/media/ubuntuinstall.png) ## 5. Configure CUDA WSL-Ubuntu Toolkit By default, a shimmed version of the CUDA tooling is provided by your Windows GPU drivers. Important: you should _never_ use instructions for installing CUDA-toolkit in a generic linux fashion. in WSL 2, you _always_ want to use the provided `CUDA Toolkit using WSL-Ubuntu Package`. First, open PowerShell or Windows Command Prompt in administrator mode by right-clicking and selecting "Run as administrator". Then enter the following command: ```powershell wsl.exe ``` This should drop you into your running linux VM. Now you can run the following bash commands to install the correct version of cuda-toolkit for WSL-Ubuntu. Note that the version of CUDA used below may not be the version of CUDA your GPU supports. ```sh sudo apt-key del 7fa2af80 # if this line fails, you may remove it. wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb sudo dpkg -i cuda-repo-wsl-ubuntu-11-7-local_11.7.0-1_amd64.deb sudo cp /var/cuda-repo-wsl-ubuntu-11-7-local/cuda-B81839D3-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda-toolkit-11-7 ``` ## 6. Install Docker Download and install [Docker Desktop for Windows](https://desktop.docker.com/win/main/amd64/Docker%20Desktop%20Installer.exe). It has WSL 2 support built in by default. Once installed, run `Docker Desktop`, you can ignore the first-run tutorial. Go to **Settings β†’ General** and ensure **Use the WSL 2 based engine** has a checkmark next to it. Click **Apply & Restart**. !["Use the WSL 2 based engine" is checked in this interface](images/wsl2-enable.png) Reboot your computer one more time. ## 7. Install `cog` and pull an image Open Windows Terminal and enter your WSL 2 VM: ```powershell wsl.exe ``` Download and install `cog` inside the VM: ```bash sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog ``` Make sure it's available by typing: ```bash which cog # should output /usr/local/bin/cog cog --version # should output the cog version number. ``` ## 8. Run a model in WSL 2 Finally, make sure it works. Let's try running `afiaka87/glid-3-xl` locally: ```bash cog predict 'r8.im/afiaka87/glid-3-xl' -i prompt="a fresh avocado floating in the water" -o prediction.json ``` ![Output from a running cog prediction in Windows Terminal](images/cog_model_output.png) While your prediction is running, you can use `Task Manager` to keep an eye on GPU memory consumption: ![Windows task manager will show the shared host/guest GPU memory](images/memory-usage.png) This model just barely manages to fit under 8 GB of VRAM. Notice that output is returned as JSON for this model as it has a complex return type. You will want to convert the base64 string in the json array to an image. `jq` can help with this: ```sh sudo apt install jq ``` The following bash uses `jq` to grab the first element in our prediction array and converts it from a base64 string to a `png` file. ```bash jq -cs '.[0][0][0]' prediction.json | cut --delimiter "," --field 2 | base64 --ignore-garbage --decode > prediction.png ``` When using WSL 2, you can access Windows binaries with the `.exe` extension. This lets you open photos easily within linux. ```bash explorer.exe prediction.png ``` ![a square image of an avocado, generated by the model](images/glide_out.png) ## 9. References - - - - - --- # `cog.yaml` reference `cog.yaml` defines how to build a Docker image and how to run predictions on your model inside that image. It has three keys: [`build`](#build), [`image`](#image), and [`predict`](#predict). It looks a bit like this: ```yaml build: python_version: "3.13" python_requirements: requirements.txt system_packages: - "ffmpeg" - "git" predict: "predict.py:Predictor" ``` Tip: Run [`cog init`](getting-started-own-model.md#initialization) to generate an annotated `cog.yaml` file that can be used as a starting point for setting up your model. ## `build` This stanza describes how to build the Docker image your model runs in. It contains various options within it: ### `cuda` Cog automatically picks the correct version of CUDA to install, but this lets you override it for whatever reason by specifying the minor (`11.8`) or patch (`11.8.0`) version of CUDA to use. For example: ```yaml build: cuda: "11.8" ``` ### `gpu` Enable GPUs for this model. When enabled, the [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) base image will be used, and Cog will automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using. For example: ```yaml build: gpu: true ``` When you use `cog run` or `cog predict`, Cog will automatically pass the `--gpus=all` flag to Docker. When you run a Docker image built with Cog, you'll need to pass this option to `docker run`. ### `python_requirements` A pip requirements file specifying the Python packages to install. For example: ```yaml build: python_requirements: requirements.txt ``` Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies. This follows the standard [requirements.txt](https://pip.pypa.io/en/stable/reference/requirements-file-format/) format. To install Git-hosted Python packages, add `git` to the `system_packages` list, then use the `git+https://` syntax to specify the package name. For example: `cog.yaml`: ```yaml build: system_packages: - "git" python_requirements: requirements.txt ``` `requirements.txt`: ``` git+https://github.com/huggingface/transformers ``` You can also pin Python package installations to a specific git commit: `cog.yaml`: ```yaml build: system_packages: - "git" python_requirements: requirements.txt ``` `requirements.txt`: ``` git+https://github.com/huggingface/transformers@2d1602a ``` Note that you can use a shortened prefix of the 40-character git commit SHA, but you must use at least six characters, like `2d1602a` above. ### `python_packages` **DEPRECATED**: This will be removed in future versions, please use [python_requirements](#python_requirements) instead. A list of Python packages to install from the PyPi package index, in the format `package==version`. For example: ```yaml build: python_packages: - pillow==8.3.1 - tensorflow==2.5.0 ``` Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. ### `python_version` The minor (`3.13`) or patch (`3.13.1`) version of Python to use. For example: ```yaml build: python_version: "3.13.1" ``` Cog supports Python 3.10, 3.11, 3.12, and 3.13. If you don't define a version, Cog will use the latest version of Python 3.13 or a version of Python that is compatible with the versions of PyTorch or TensorFlow you specify. Note that these are the versions supported **in the Docker container**, not your host machine. You can run any version(s) of Python you wish on your host machine. ### `run` A list of setup commands to run in the environmentΒ after your system packages and Python packages have been installed. If you're familiar with Docker, it's like a `RUN` instruction in your `Dockerfile`. For example: ```yaml build: run: - curl -L https://github.com/cowsay-org/cowsay/archive/refs/tags/v3.7.0.tar.gz | tar -xzf - - cd cowsay-3.7.0 && make install ``` Your code is _not_ available to commands in `run`. This is so we can build your image efficiently when running locally. Each command in `run` can be either a string or a dictionary in the following format: ```yaml build: run: - command: pip install mounts: - type: secret id: pip target: /etc/pip.conf ``` You can use secret mounts to securely pass credentials to setup commands, without baking them into the image. For more information, see [Dockerfile reference](https://docs.docker.com/engine/reference/builder/#run---mounttypesecret). ### `sdk_version` Pin the version of the cog Python SDK installed in the container. Accepts a [PEP 440](https://peps.python.org/pep-0440/) version string. When omitted, the latest release is installed. ```yaml build: python_version: "3.13" sdk_version: "0.18.0" ``` Pre-release versions are also supported: ```yaml build: sdk_version: "0.18.0a1" ``` When a pre-release `sdk_version` is set, `--pre` is automatically passed to the pip install commands for both `cog` and `coglet`, so pip will resolve matching pre-release packages. The minimum supported version is `0.16.0`. Specifying an older version will cause `cog build` to fail with an error. The `COG_SDK_WHEEL` environment variable takes precedence over `sdk_version`. See [Environment variables](./environment.md) for details. ### `system_packages` A list of Ubuntu APT packages to install. For example: ```yaml build: system_packages: - "ffmpeg" - "libavcodec-dev" ``` ## `concurrency` > Added in cog 0.14.0. This stanza describes the concurrency capabilities of the model. It has one option: ### `max` The maximum number of concurrent predictions the model can process. If this is set, the model must specify an [async `predict()` method](python.md#async-predictors-and-concurrency). For example: ```yaml concurrency: max: 10 ``` ## `image` The name given to built Docker images. If you want to push to a registry, this should also include the registry name. For example: ```yaml image: "r8.im/your-username/your-model" ``` r8.im is Replicate's registry, but this can be any Docker registry. If you don't set this, then a name will be generated from the directory name. If you set this, then you can run `cog push` without specifying the model name. If you specify an image name argument when pushing (like `cog push your-username/custom-model-name`), the argument will be used and the value of `image` in cog.yaml will be ignored. ## `predict` The pointer to the `Predictor` object in your code, which defines how predictions are run on your model. For example: ```yaml predict: "predict.py:Predictor" ``` See [the Python API documentation for more information](python.md).