On this page

Python AI SDK

Statsig's Python SDK for AI Application Configuration & Telemetry

The Statsig AI Python SDK is currently in beta. Statsig is no longer accepting new beta customers at this time.

How the Python AI SDK works

The Statsig Python AI SDK lets you manage prompts, online and offline evals, and debug LLM applications in production. It depends on the Statsig Python Server SDK and provides hooks for AI-specific functionality.
  1. Install the SDK

    pip install statsig-ai
    
  2. Initialize the SDK

    For initialization requirements in forking and WSGI servers, refer to the Statsig Python Server SDK docs.

    If you already have a Statsig instance, you can pass it into the SDK. Otherwise, the SDK creates an instance internally.

    Initialize the AI SDK with a Server Secret Key from the Statsig console.

    Server Secret Keys should always be kept private. If you expose one, you can disable and recreate it in the Statsig console.

    python
    from statsig_ai import StatsigAI, StatsigCreateConfig
    
    statsig_ai = StatsigAI(statsig_source=StatsigCreateConfig(server_secret_key='YOUR_SERVER_SECRET_KEY'))
    statsig_ai.initialize().
    

Using the SDK

Getting a prompt

Statsig can act as the control plane for your LLM prompts, allowing you to version and change them without deploying code. For more information, refer to the Prompts documentation.
python
from statsig_ai import StatsigUser

# Create a user object
user = StatsigUser(user_id='a-user')

# Get the prompt
my_prompt = statsig_ai.get_prompt(user, 'my_prompt')

# Use the live version of the prompt
live_version = my_prompt.get_live()

# Get the candidate versions of the prompt
candidate_versions = my_prompt.get_candidates()

# Use the live version of the prompt in a completion
response = openai.chat.completions.create(
    model=live_version.get_model(fallback='gpt-4'),  # optional fallback
    temperature=live_version.get_temperature(),
    max_tokens=live_version.get_max_tokens(),
    messages=[{'role': 'user', 'content': 'Your prompt here'}],
)

Logging eval results

When running an online eval, you can log results back to Statsig for analysis. Provide a score between 0 and 1, along with the grader name and any useful metadata (such as session IDs). Currently, you must provide the grader manually. Future releases will support automated grading options.
python
from statsig_ai import StatsigUser

live_prompt_version = statsig_ai.get_prompt(user, 'my_prompt').get_live()
# Create a user object
user = StatsigUser(user_id='a-user')

# Log the results of the eval
statsig_ai.log_eval_grade(user, live_prompt_version, 0.5, 'my_grader', {
    'session_id': '1234567890',
})

# flush eval grade events to statsig
statsig_ai.flush().wait()

Programmatic evaluation

Programmatic evaluation allows you to run evaluations on datasets programmatically, automatically scoring outputs and sending results to Statsig for analysis.

With programmatic evaluation, you can:

  • Run evaluations on datasets: Process arrays, iterators, or async generators of input/expected pairs
  • Define custom tasks: Create functions that generate outputs from inputs (supports both sync and async)
  • Score outputs: Use single or multiple named scorer functions to evaluate outputs (supports boolean, numeric, or metadata-rich scores)
  • Use parameters: Pass dynamic parameters to tasks using Zod schemas (Node) or dictionaries (Python)
  • Categorize data: Group evaluation records by categories for better analysis
  • Compute summary scores: Aggregate results across all records with custom summary functions
  • Handle errors gracefully: Task and scorer errors are caught and reported without stopping the evaluation

The evaluation automatically sends results to Statsig, where you can view them in the console alongside your other eval data.

Tasks and scorers can be async functions. Data can also be provided as async functions, promises, or async iterators. The expected field in data records is optional; scorers can evaluate outputs without expected values. Task and scorer errors are automatically caught and reported in the results.

python
from statsig_ai import Eval, EvalScorerArgs, EvalDataRecord, EvalHook

# Basic evaluation with a single scorer
result = Eval(
    name='greeting_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
    eval_run_name='run-123',
)

# Multiple named scorers
result2 = Eval(
    name='multi_scorer_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
        'starts_with_hello': lambda args: args.output.startswith('Hello'),
        'length_check': lambda args: len(args.output) > 5,
    },
)

# Using parameters
def task_with_params(input: str, hook: EvalHook) -> str:
    prefix = hook.parameters.get('prefix', 'Hello')
    return f'{prefix} {input}'

result3 = Eval(
    name='parameterized_task',
    data=[
        {'input': 'world', 'expected': 'Hi world'},
    ],
    task=task_with_params,
    scorer=lambda args: args.output == args.expected,
    parameters={'prefix': 'Hi', 'suffix': '!', 'number': 123},
)

# Extras: Categories and summary scores
def summary_scorer(results):
    correct = sum(1 for r in results if r.scores.get('correctness', 0.0) == 1.0)
    return {
        'accuracy': correct / len(results) if results else 0.0,
        'total': len(results),
    }

result4 = Eval(
    name='categorized_with_summary',
    data=[
        {'input': 'world', 'expected': 'Hello world', 'category': 'greeting'},
        {'input': 'test', 'expected': 'Hello test', 'category': ['greeting', 'test']},
        {'input': 'foo', 'expected': 'Goodbye foo', 'category': 'farewell'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
    },
    summary_score_fn=summary_scorer,
)

# Using EvalDataRecord dataclass
result5 = Eval(
    name='dataclass_records',
    data=[
        EvalDataRecord(input='world', expected='Hello world'),
        EvalDataRecord(input='test', expected='Hello test'),
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
)

OpenTelemetry (OTEL)

OTel tracing isn't yet supported in the Python AI SDK. Support is coming soon.

Wrapping OpenAI

The OpenAI wrapper isn't yet supported in the Python AI SDK. Support is coming soon.

Using other SDK methods

You can access the Statsig instance from the statsig_ai instance regardless of how you initialized it, and use its methods:

python
# Check a gate value
gate = statsig_ai.get_statsig().check_gate(statsig_user, 'my_gate')

# Log an event
statsig_ai.get_statsig().log_event(statsig_user, 'my_event', value=1)
Refer to the Statsig Python SDK docs for more information on Core Statsig SDK methods, advanced setup, and singleton usage.

Was this helpful?