Shreyansh Cloud API Documentation

Shreyansh Cloud API

Powerful AI models accessible through a simple API. Each user gets a personal API key with rate limiting.

Free Tier: 5 Requests/Minute

Documentation

Authentication
Rate Limits
Available Models
Chat Completions
Curl Examples
Error Handling

Your API Key

sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Each user gets a unique API key

Get Your Key

Authentication

All API requests must include your personal API key in the Authorization header. Each user gets a unique key when they sign up.

curl https://api.shreyansh.cloud/v3/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Security Notice: Never expose your API key in client-side code or public repositories. For production applications, use environment variables or secure key management systems.

Rate Limits

To ensure fair usage and service stability, we implement rate limits on API requests.

Plan	Rate Limit	Notes
Free Tier	5 requests per minute	Per API key limit
Pro Tier	100 requests per minute	Coming soon

Rate Limit Response

When you exceed your rate limit, you'll receive a 429 status code with the following response:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_exceeded",
    "code": 429
  }
}

Available Models

Qwen 3 4B FP8

Model ID: qwen/qwen3-4b-fp8

A powerful 4-billion parameter model from the Qwen series with FP8 precision for efficient inference.

General Purpose 4B Parameters

Llama 3.2 1B Instruct

Model ID: llama-3.2-1b-instruct

A compact 1-billion parameter instruction-tuned model from Meta's Llama series, optimized for dialogue.

Instruction Tuned 1B Parameters

List All Models

You can retrieve a list of all available models using the following endpoint:

curl "https://api.shreyansh.cloud/v3/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Chat Completions

The chat completions endpoint allows you to have conversations with the AI models. Send a series of messages and receive a model-generated response.

Endpoint

POST https://api.shreyansh.cloud/v3/chat/completions

Request Format

{
  "model": "model-id",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

Parameters

model: The ID of the model to use (e.g., "qwen/qwen3-4b-fp8")
messages: An array of message objects with "role" and "content"
temperature: Controls randomness (0.0 to 1.0, default 0.7)
max_tokens: Maximum number of tokens to generate

Curl Examples

Basic Example with Qwen

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "user", "content": "How are you?"}
    ]
  }'

Example Response

{
  "id": "a208bad9fd4bda74b4e4815067a2818d",
  "object": "chat.completion",
  "created": 1757162710,
  "model": "qwen/qwen3-4b-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help! How can I assist you today? 😊"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 191,
    "total_tokens": 203
  }
}

Conversation Example with Llama

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant that translates English to French."},
      {"role": "user", "content": "Translate the following English text to French: Hello, how are you?"}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

Advanced Example with Parameters

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "system", "content": "You are a knowledgeable science tutor."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.5,
    "max_tokens": 500,
    "top_p": 0.9,
    "frequency_penalty": 0.2,
    "presence_penalty": 0.3
  }'

Error Handling

The API uses standard HTTP status codes to indicate the success or failure of a request.

Status Code	Error Type	Description
400	Bad Request	Invalid request parameters
401	Unauthorized	Invalid or missing API key
404	Not Found	Requested resource not found
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Something went wrong on our end