Shreyansh Cloud

API Documentation

Shreyansh Cloud API

Powerful AI models accessible through a simple API. Each user gets a personal API key with rate limiting.

Free Tier: 5 Requests/Minute

Documentation

Your API Key

sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Each user gets a unique API key

Get Your Key

Authentication

All API requests must include your personal API key in the Authorization header. Each user gets a unique key when they sign up.

curl https://api.shreyansh.cloud/v3/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Security Notice: Never expose your API key in client-side code or public repositories. For production applications, use environment variables or secure key management systems.

Rate Limits

To ensure fair usage and service stability, we implement rate limits on API requests.

Plan Rate Limit Notes
Free Tier
5 requests per minute
Per API key limit
Pro Tier
100 requests per minute
Coming soon

Rate Limit Response

When you exceed your rate limit, you'll receive a 429 status code with the following response:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_exceeded",
    "code": 429
  }
}

Available Models

Qwen 3 4B FP8

Model ID: qwen/qwen3-4b-fp8

A powerful 4-billion parameter model from the Qwen series with FP8 precision for efficient inference.

General Purpose 4B Parameters

Llama 3.2 1B Instruct

Model ID: llama-3.2-1b-instruct

A compact 1-billion parameter instruction-tuned model from Meta's Llama series, optimized for dialogue.

Instruction Tuned 1B Parameters

List All Models

You can retrieve a list of all available models using the following endpoint:

curl "https://api.shreyansh.cloud/v3/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Chat Completions

The chat completions endpoint allows you to have conversations with the AI models. Send a series of messages and receive a model-generated response.

Endpoint

POST https://api.shreyansh.cloud/v3/chat/completions

Request Format

{
  "model": "model-id",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

Parameters

  • model: The ID of the model to use (e.g., "qwen/qwen3-4b-fp8")
  • messages: An array of message objects with "role" and "content"
  • temperature: Controls randomness (0.0 to 1.0, default 0.7)
  • max_tokens: Maximum number of tokens to generate

Curl Examples

Basic Example with Qwen

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "user", "content": "How are you?"}
    ]
  }'

Example Response

{
  "id": "a208bad9fd4bda74b4e4815067a2818d",
  "object": "chat.completion",
  "created": 1757162710,
  "model": "qwen/qwen3-4b-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help! How can I assist you today? 😊"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 191,
    "total_tokens": 203
  }
}

Conversation Example with Llama

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b-instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant that translates English to French."},
      {"role": "user", "content": "Translate the following English text to French: Hello, how are you?"}
    ],
    "temperature": 0.3,
    "max_tokens": 100
  }'

Advanced Example with Parameters

curl "https://api.shreyansh.cloud/v3/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-4b-fp8",
    "messages": [
      {"role": "system", "content": "You are a knowledgeable science tutor."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.5,
    "max_tokens": 500,
    "top_p": 0.9,
    "frequency_penalty": 0.2,
    "presence_penalty": 0.3
  }'

Error Handling

The API uses standard HTTP status codes to indicate the success or failure of a request.

Status Code Error Type Description
400 Bad Request Invalid request parameters
401 Unauthorized Invalid or missing API key
404 Not Found Requested resource not found
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Something went wrong on our end