Skip to main content
Inference offers 5 models through Lava’s AI Gateway, supporting Chat Completions. Authentication uses Authorization: Bearer. See the Inference API docs for provider-specific parameters.
Supports both managed (Lava’s API keys) and unmanaged (bring your own credentials) mode.

Quick Start

const response = await fetch('https://api.lava.so/v1/forward?u=https%3A%2F%2Fapi.inference.net%2Fv1%2Fchat%2Fcompletions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${forwardToken}`,
  },
  body: JSON.stringify({
    model: 'google/gemma-3-27b-instruct/bf-16',
    messages: [{ role: "user", content: "Hello!" }],
  }),
});

Chat Completions

Target URL: https://api.inference.net/v1/chat/completions
Content Typeapplication/json
StreamingYes (set stream: true in request body)
ModelInput / 1M tokensOutput / 1M tokens
google/gemma-3-27b-instruct/bf-16$0.30$0.40
meta-llama/llama-3.2-11b-instruct/fp-16$0.055$0.055
meta-llama/llama-3.1-8b-instruct/fp-8$0.03$0.03
meta-llama/llama-3.2-3b-instruct/fp-16$0.02$0.02
meta-llama/llama-3.2-1b-instruct/fp-16$0.01$0.01

Next Steps

All Providers

Browse all supported AI providers

Forward Proxy

Learn how to construct proxy URLs and authenticate requests