tasq/node_modules/agentic-flow/docs/plans/requesty/04-testing-strategy.md

21 KiB

Requesty.ai Integration - Testing Strategy

Testing Overview

This document outlines a comprehensive testing strategy for the Requesty.ai integration, covering unit tests, integration tests, end-to-end scenarios, and validation criteria.

Test Categories

Category Scope Duration Automation
Unit Tests Proxy functions 30 min Jest/Vitest
Integration Tests CLI → Proxy → API 60 min Manual + Script
E2E Tests Full user workflows 45 min Manual
Performance Tests Latency, throughput 30 min Script
Security Tests API key handling 15 min Manual
Total 3 hours

1. Unit Tests

Test File: src/proxy/anthropic-to-requesty.test.ts

1.1 Proxy Initialization

describe('AnthropicToRequestyProxy - Initialization', () => {
  it('should initialize with default configuration', () => {
    const proxy = new AnthropicToRequestyProxy({
      requestyApiKey: 'test-key'
    });

    expect(proxy.requestyBaseUrl).toBe('https://router.requesty.ai/v1');
    expect(proxy.defaultModel).toBe('openai/gpt-4o-mini');
  });

  it('should accept custom base URL', () => {
    const proxy = new AnthropicToRequestyProxy({
      requestyApiKey: 'test-key',
      requestyBaseUrl: 'https://custom.requesty.ai/v1'
    });

    expect(proxy.requestyBaseUrl).toBe('https://custom.requesty.ai/v1');
  });

  it('should accept custom default model', () => {
    const proxy = new AnthropicToRequestyProxy({
      requestyApiKey: 'test-key',
      defaultModel: 'anthropic/claude-3.5-sonnet'
    });

    expect(proxy.defaultModel).toBe('anthropic/claude-3.5-sonnet');
  });
});

1.2 Format Conversion (Anthropic → OpenAI)

describe('AnthropicToRequestyProxy - Request Conversion', () => {
  let proxy: AnthropicToRequestyProxy;

  beforeEach(() => {
    proxy = new AnthropicToRequestyProxy({
      requestyApiKey: 'test-key'
    });
  });

  it('should convert simple message', () => {
    const anthropicReq = {
      model: 'openai/gpt-4o-mini',
      messages: [
        { role: 'user', content: 'Hello' }
      ]
    };

    const openaiReq = proxy.convertAnthropicToOpenAI(anthropicReq);

    expect(openaiReq.model).toBe('openai/gpt-4o-mini');
    expect(openaiReq.messages).toHaveLength(2); // system + user
    expect(openaiReq.messages[1].content).toBe('Hello');
  });

  it('should convert system prompt (string)', () => {
    const anthropicReq = {
      model: 'openai/gpt-4o-mini',
      system: 'You are a helpful assistant',
      messages: [{ role: 'user', content: 'Hi' }]
    };

    const openaiReq = proxy.convertAnthropicToOpenAI(anthropicReq);

    expect(openaiReq.messages[0].role).toBe('system');
    expect(openaiReq.messages[0].content).toContain('You are a helpful assistant');
  });

  it('should convert system prompt (array)', () => {
    const anthropicReq = {
      model: 'openai/gpt-4o-mini',
      system: [
        { type: 'text', text: 'You are helpful' },
        { type: 'text', text: 'Be concise' }
      ],
      messages: [{ role: 'user', content: 'Hi' }]
    };

    const openaiReq = proxy.convertAnthropicToOpenAI(anthropicReq);

    expect(openaiReq.messages[0].content).toContain('You are helpful');
    expect(openaiReq.messages[0].content).toContain('Be concise');
  });

  it('should convert tools to OpenAI format', () => {
    const anthropicReq = {
      model: 'openai/gpt-4o-mini',
      messages: [{ role: 'user', content: 'Read file' }],
      tools: [{
        name: 'Read',
        description: 'Read a file',
        input_schema: {
          type: 'object',
          properties: {
            file_path: { type: 'string' }
          },
          required: ['file_path']
        }
      }]
    };

    const openaiReq = proxy.convertAnthropicToOpenAI(anthropicReq);

    expect(openaiReq.tools).toHaveLength(1);
    expect(openaiReq.tools[0].type).toBe('function');
    expect(openaiReq.tools[0].function.name).toBe('Read');
    expect(openaiReq.tools[0].function.parameters).toEqual(anthropicReq.tools[0].input_schema);
  });
});

1.3 Format Conversion (OpenAI → Anthropic)

describe('AnthropicToRequestyProxy - Response Conversion', () => {
  it('should convert text response', () => {
    const openaiRes = {
      id: 'chatcmpl-123',
      model: 'openai/gpt-4o-mini',
      choices: [{
        index: 0,
        message: {
          role: 'assistant',
          content: 'Hello, how can I help?'
        },
        finish_reason: 'stop'
      }],
      usage: {
        prompt_tokens: 10,
        completion_tokens: 20,
        total_tokens: 30
      }
    };

    const anthropicRes = proxy.convertOpenAIToAnthropic(openaiRes);

    expect(anthropicRes.id).toBe('chatcmpl-123');
    expect(anthropicRes.role).toBe('assistant');
    expect(anthropicRes.content).toHaveLength(1);
    expect(anthropicRes.content[0].type).toBe('text');
    expect(anthropicRes.content[0].text).toBe('Hello, how can I help?');
    expect(anthropicRes.stop_reason).toBe('end_turn');
    expect(anthropicRes.usage.input_tokens).toBe(10);
    expect(anthropicRes.usage.output_tokens).toBe(20);
  });

  it('should convert tool_calls response', () => {
    const openaiRes = {
      id: 'chatcmpl-123',
      model: 'openai/gpt-4o-mini',
      choices: [{
        message: {
          role: 'assistant',
          content: null,
          tool_calls: [{
            id: 'call_abc123',
            type: 'function',
            function: {
              name: 'Read',
              arguments: '{"file_path": "/test.txt"}'
            }
          }]
        },
        finish_reason: 'tool_calls'
      }],
      usage: { prompt_tokens: 10, completion_tokens: 20, total_tokens: 30 }
    };

    const anthropicRes = proxy.convertOpenAIToAnthropic(openaiRes);

    expect(anthropicRes.content).toHaveLength(1);
    expect(anthropicRes.content[0].type).toBe('tool_use');
    expect(anthropicRes.content[0].id).toBe('call_abc123');
    expect(anthropicRes.content[0].name).toBe('Read');
    expect(anthropicRes.content[0].input).toEqual({ file_path: '/test.txt' });
    expect(anthropicRes.stop_reason).toBe('tool_use');
  });

  it('should convert mixed content response', () => {
    const openaiRes = {
      id: 'chatcmpl-123',
      model: 'openai/gpt-4o-mini',
      choices: [{
        message: {
          role: 'assistant',
          content: 'Let me read that file',
          tool_calls: [{
            id: 'call_abc123',
            type: 'function',
            function: {
              name: 'Read',
              arguments: '{"file_path": "/test.txt"}'
            }
          }]
        },
        finish_reason: 'tool_calls'
      }],
      usage: { prompt_tokens: 10, completion_tokens: 20, total_tokens: 30 }
    };

    const anthropicRes = proxy.convertOpenAIToAnthropic(openaiRes);

    expect(anthropicRes.content).toHaveLength(2); // tool_use + text
    expect(anthropicRes.content[0].type).toBe('tool_use');
    expect(anthropicRes.content[1].type).toBe('text');
  });
});

1.4 Error Handling

describe('AnthropicToRequestyProxy - Error Handling', () => {
  it('should handle invalid API key', async () => {
    const proxy = new AnthropicToRequestyProxy({
      requestyApiKey: 'invalid-key'
    });

    await expect(proxy.handleRequest(validRequest, mockRes))
      .rejects.toThrow('Invalid API key');
  });

  it('should handle rate limit errors', async () => {
    // Mock 429 response
    global.fetch = jest.fn().mockResolvedValue({
      ok: false,
      status: 429,
      headers: new Map([['Retry-After', '60']]),
      text: async () => 'Rate limit exceeded'
    });

    await expect(proxy.handleRequest(validRequest, mockRes))
      .rejects.toThrow('Rate limit exceeded');
  });

  it('should handle model not found', async () => {
    const req = {
      model: 'invalid/model',
      messages: [{ role: 'user', content: 'Test' }]
    };

    await expect(proxy.handleRequest(req, mockRes))
      .rejects.toThrow('Model not found');
  });
});

Unit Test Coverage Goals

  • Proxy initialization: 100%
  • Request conversion: 95%
  • Response conversion: 95%
  • Tool calling: 100%
  • Error handling: 90%
  • Overall: >90%

2. Integration Tests

2.1 CLI to Proxy Integration

#!/bin/bash
# tests/integration/requesty-cli.sh

echo "=== Requesty CLI Integration Tests ==="

# Test 1: Basic provider detection
echo "Test 1: Provider detection via flag"
npx agentic-flow --agent coder \
  --task "Say hello" \
  --provider requesty \
  --model "openai/gpt-4o-mini" | grep "Requesty"

# Test 2: Provider detection via env var
echo "Test 2: Provider detection via USE_REQUESTY"
USE_REQUESTY=true npx agentic-flow --agent coder \
  --task "Say hello" | grep "Requesty"

# Test 3: API key validation (missing key)
echo "Test 3: Missing API key error"
REQUESTY_API_KEY= npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty 2>&1 | grep "REQUESTY_API_KEY required"

# Test 4: Proxy startup
echo "Test 4: Proxy starts on port 3000"
npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty 2>&1 | grep "http://localhost:3000"

# Test 5: Model override
echo "Test 5: Model override works"
npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty \
  --model "anthropic/claude-3.5-sonnet" 2>&1 | grep "claude-3.5-sonnet"

2.2 Proxy to Requesty API Integration

#!/bin/bash
# tests/integration/requesty-api.sh

echo "=== Requesty API Integration Tests ==="

# Test 1: Chat completions endpoint
curl -X POST http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: test-key" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# Test 2: Tool calling
curl -X POST http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: test-key" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Read file.txt"}],
    "tools": [{
      "name": "Read",
      "input_schema": {
        "type": "object",
        "properties": {"file_path": {"type": "string"}},
        "required": ["file_path"]
      }
    }]
  }'

# Test 3: Health check
curl http://localhost:3000/health

2.3 End-to-End User Workflows

Workflow 1: Simple Code Generation

npx agentic-flow --agent coder \
  --task "Create a Python function that adds two numbers" \
  --provider requesty \
  --model "openai/gpt-4o-mini"

Expected:

  • Proxy starts
  • Request sent to Requesty
  • Python function generated
  • Exit code 0

Workflow 2: File Operations with Tools

npx agentic-flow --agent coder \
  --task "Create a file hello.py with a hello world function" \
  --provider requesty

Expected:

  • Tool calling works
  • File created: hello.py
  • Function is valid Python
  • Exit code 0

Workflow 3: Research with Streaming

npx agentic-flow --agent researcher \
  --task "Explain machine learning in simple terms" \
  --provider requesty \
  --model "anthropic/claude-3.5-sonnet" \
  --stream

Expected:

  • Streaming output (real-time)
  • Coherent explanation
  • Exit code 0

Workflow 4: Multi-Step Task

npx agentic-flow --agent coder \
  --task "Create a REST API with Express.js - include routes for GET /users and POST /users" \
  --provider requesty \
  --model "openai/gpt-4o"

Expected:

  • Multiple files created
  • Valid Express.js code
  • Includes route handlers
  • Exit code 0

Integration Test Matrix

Test ID Component Input Expected Output Status
INT-01 CLI Detection --provider requesty Proxy starts [ ]
INT-02 CLI Detection USE_REQUESTY=true Proxy starts [ ]
INT-03 CLI Validation Missing API key Error message [ ]
INT-04 Proxy Startup Start proxy Port 3000 listening [ ]
INT-05 Proxy Health GET /health 200 OK [ ]
INT-06 API Request Simple chat Valid response [ ]
INT-07 Tool Calling Request with tools Tool use response [ ]
INT-08 Streaming Stream flag SSE events [ ]
INT-09 Model Override Different model Model changed [ ]
INT-10 Error Handling Invalid key 401 error [ ]

3. Model-Specific Tests

Test Different Model Families

3.1 OpenAI Models

# GPT-4o Mini (fast, cheap)
npx agentic-flow --agent coder \
  --task "Create a hello function" \
  --provider requesty \
  --model "openai/gpt-4o-mini"

# GPT-4o (premium quality)
npx agentic-flow --agent coder \
  --task "Create a complex sorting algorithm" \
  --provider requesty \
  --model "openai/gpt-4o"

# GPT-4 Turbo
npx agentic-flow --agent researcher \
  --task "Research quantum computing" \
  --provider requesty \
  --model "openai/gpt-4-turbo"

3.2 Anthropic Models

# Claude 3.5 Sonnet
npx agentic-flow --agent coder \
  --task "Write a file parser" \
  --provider requesty \
  --model "anthropic/claude-3.5-sonnet"

# Claude 3 Haiku (fast)
npx agentic-flow --agent coder \
  --task "Simple function" \
  --provider requesty \
  --model "anthropic/claude-3-haiku"

3.3 Google Models

# Gemini 2.5 Flash (FREE)
npx agentic-flow --agent coder \
  --task "Create a calculator" \
  --provider requesty \
  --model "google/gemini-2.5-flash"

# Gemini 2.5 Pro
npx agentic-flow --agent researcher \
  --task "Analyze AI trends" \
  --provider requesty \
  --model "google/gemini-2.5-pro"

3.4 DeepSeek Models

# DeepSeek Chat V3 (cheap)
npx agentic-flow --agent coder \
  --task "Create API endpoint" \
  --provider requesty \
  --model "deepseek/deepseek-chat-v3"

# DeepSeek Coder
npx agentic-flow --agent coder \
  --task "Write Python script" \
  --provider requesty \
  --model "deepseek/deepseek-coder"

3.5 Meta/Llama Models

# Llama 3.3 70B
npx agentic-flow --agent coder \
  --task "Create function" \
  --provider requesty \
  --model "meta-llama/llama-3.3-70b-instruct"

# Llama 3.3 8B (fast)
npx agentic-flow --agent coder \
  --task "Simple task" \
  --provider requesty \
  --model "meta-llama/llama-3.3-8b-instruct"

Model Test Matrix

Model Provider Tools Stream Expected Time Status
gpt-4o-mini OpenAI <5s [ ]
gpt-4o OpenAI <10s [ ]
claude-3.5-sonnet Anthropic <15s [ ]
claude-3-haiku Anthropic <5s [ ]
gemini-2.5-flash Google <3s [ ]
gemini-2.5-pro Google <10s [ ]
deepseek-chat-v3 DeepSeek <8s [ ]
llama-3.3-70b Meta <12s [ ]

4. Performance Tests

4.1 Latency Testing

#!/bin/bash
# tests/performance/latency.sh

echo "=== Requesty Performance Tests ==="

for i in {1..10}; do
  start=$(date +%s%N)

  npx agentic-flow --agent coder \
    --task "Say hello" \
    --provider requesty \
    --model "openai/gpt-4o-mini" > /dev/null

  end=$(date +%s%N)
  duration=$(( (end - start) / 1000000 ))

  echo "Request $i: ${duration}ms"
done

Success Criteria:

  • Average latency < 3000ms
  • P95 latency < 5000ms
  • No timeouts

4.2 Concurrent Requests

#!/bin/bash
# tests/performance/concurrent.sh

echo "=== Concurrent Request Test ==="

for i in {1..5}; do
  npx agentic-flow --agent coder \
    --task "Task $i" \
    --provider requesty &
done

wait
echo "All concurrent requests completed"

Success Criteria:

  • All requests complete successfully
  • No proxy crashes
  • No rate limit errors (or proper retry)

4.3 Large Context Test

# Test with large system prompt
npx agentic-flow --agent coder \
  --task "$(cat large-context.txt)" \
  --provider requesty \
  --model "google/gemini-2.5-pro"  # Large context window

Success Criteria:

  • Request succeeds
  • No context truncation errors
  • Response is relevant

5. Security Tests

5.1 API Key Handling

Test: Verify API key is not logged

# Run with verbose logging
VERBOSE=true npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty 2>&1 | grep -c "requesty-"

Expected: 0 matches (only prefix should be logged)

5.2 API Key Validation

# Test invalid key format
REQUESTY_API_KEY="invalid" npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty

Expected: Error message about invalid key

5.3 Environment Isolation

# Verify proxy doesn't leak API key to client
curl http://localhost:3000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}' \
  | grep -c "requesty-"

Expected: 0 matches (API key should not be in response)


6. Error Handling Tests

6.1 Invalid API Key

REQUESTY_API_KEY="requesty-invalid" npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty

Expected:

  • HTTP 401 Unauthorized
  • Clear error message
  • Exit code 1

6.2 Rate Limiting

# Send many requests quickly
for i in {1..50}; do
  npx agentic-flow --agent coder \
    --task "Test $i" \
    --provider requesty &
done
wait

Expected:

  • Some requests succeed
  • Some requests retry (429 errors)
  • All eventually complete or fail gracefully

6.3 Model Not Found

npx agentic-flow --agent coder \
  --task "Test" \
  --provider requesty \
  --model "invalid/model-123"

Expected:

  • HTTP 404 or 400
  • Clear error message
  • Exit code 1

6.4 Network Timeout

# Mock network timeout (requires proxy modification for testing)
REQUESTY_TIMEOUT=1 npx agentic-flow --agent coder \
  --task "Long task" \
  --provider requesty

Expected:

  • Timeout error
  • Retry attempt
  • Clear error message

7. Compatibility Tests

7.1 Cross-Platform

Test on multiple OS:

  • Linux (Ubuntu 22.04)
  • macOS (Ventura+)
  • Windows (WSL2)

7.2 Node.js Versions

Test with different Node versions:

  • Node.js 18.x
  • Node.js 20.x
  • Node.js 22.x

7.3 Package Managers

Test with different package managers:

  • npm
  • yarn
  • pnpm

8. Regression Tests

8.1 Existing Functionality

Verify Requesty doesn't break existing providers:

# Test Anthropic (should still work)
npx agentic-flow --agent coder \
  --task "Test" \
  --provider anthropic

# Test OpenRouter (should still work)
npx agentic-flow --agent coder \
  --task "Test" \
  --provider openrouter

# Test Gemini (should still work)
npx agentic-flow --agent coder \
  --task "Test" \
  --provider gemini

# Test ONNX (should still work)
npx agentic-flow --agent coder \
  --task "Test" \
  --provider onnx

Success Criteria:

  • All providers still work
  • No interference between providers
  • Proxy ports don't conflict

9. Acceptance Criteria

Must Pass (MVP)

  • Unit tests: >90% coverage
  • At least 5 models tested successfully
  • Tool calling works (1+ model)
  • Streaming works (1+ model)
  • Error handling for invalid API key
  • No regressions in existing providers
  • Documentation complete

Should Pass (V1)

  • 10+ models tested
  • Performance tests pass
  • Security tests pass
  • Cross-platform tested
  • Concurrent requests work
  • Rate limiting handled

Nice to Have (Future)

  • 20+ models tested
  • Load testing (100+ concurrent)
  • Automated test suite
  • CI/CD integration

10. Test Execution Plan

Day 1: Unit Tests

  • Write test files
  • Run unit tests
  • Fix any failures
  • Achieve >90% coverage

Day 2: Integration Tests

  • Test CLI integration
  • Test proxy integration
  • Test API integration
  • Document results

Day 3: Model Testing

  • Test 5+ models
  • Test tool calling
  • Test streaming
  • Document results

Day 4: Final Validation

  • Run all tests
  • Fix any regressions
  • Update documentation
  • Sign off

11. Bug Reporting Template

When filing bugs, use this template:

## Bug Report: Requesty Integration

**Test Category:** [Unit/Integration/E2E/Performance/Security]
**Test ID:** [e.g., INT-07]

**Description:**
[Clear description of the issue]

**Steps to Reproduce:**
1. Step 1
2. Step 2
3. Step 3

**Expected Behavior:**
[What should happen]

**Actual Behavior:**
[What actually happened]

**Environment:**
- OS: [e.g., Ubuntu 22.04]
- Node.js: [e.g., 20.10.0]
- agentic-flow: [e.g., 1.3.0]
- Requesty Model: [e.g., openai/gpt-4o-mini]

**Logs:**
[Paste relevant logs]

**Screenshots:**
[If applicable]

12. Test Results Dashboard

Summary

Category Total Tests Passed Failed Skipped Coverage
Unit 0 0 0 0 0%
Integration 0 0 0 0 N/A
E2E 0 0 0 0 N/A
Performance 0 0 0 0 N/A
Security 0 0 0 0 N/A
Total 0 0 0 0 0%

(To be filled during testing)


Conclusion

This testing strategy ensures comprehensive validation of the Requesty integration across all critical dimensions: functionality, performance, security, and compatibility. By following this plan, we can confidently release the Requesty provider to users.

Total Testing Time: ~3 hours Test Coverage Goal: >90% Model Coverage Goal: 5+ models (MVP), 10+ models (V1)