Python AI SDK

Statsig's Python SDK for AI Application Configuration & Telemetry

Early Access

This feature is in Early Access. During this time, aspects of the functionality may still be developed, and this documentation may not always be up to date. If you have any questions, contact Statsig Support.

The Statsig Python AI SDK manages LLM prompts, runs online and offline evals, and debugs LLM applications in production from a Python server. Reach for it when you want to version prompts without shipping code, log eval grades back to Statsig for analysis, and run programmatic evaluations over your datasets. It builds on the Statsig Python Server SDK and adds hooks for AI-specific functionality.

Statsig isn't accepting new customers for the AI SDKs.

How the Python AI SDK works

The Statsig Python AI SDK lets you manage prompts, online and offline evals, and debug LLM applications in production. It depends on the Statsig Python Server SDK and provides hooks for AI-specific functionality.

Install the SDK
pip install statsig-ai
Initialize the SDK
For initialization requirements in forking and WSGI servers, refer to the Statsig Python Server SDK docs.
If you already have a Statsig instance, you can pass it into the SDK. Otherwise, the SDK creates an instance internally.
Initialize the AI SDK with a Server Secret Key from the Statsig console.
Always keep Server Secret Keys private. If you expose one, you can disable and recreate it in the Statsig console.
python
from statsig_ai import StatsigAI, StatsigCreateConfig statsig_ai = StatsigAI(statsig_source=StatsigCreateConfig(server_secret_key='YOUR_SERVER_SECRET_KEY')) statsig_ai.initialize().

Using the SDK

Getting a prompt

Statsig can act as the control plane for your LLM prompts, allowing you to version and change them without deploying code. For more information, refer to the Prompts documentation.

python

from statsig_ai import StatsigUser

# Create a user object
user = StatsigUser(user_id='a-user')

# Get the prompt
my_prompt = statsig_ai.get_prompt(user, 'my_prompt')

# Use the live version of the prompt
live_version = my_prompt.get_live()

# Get the candidate versions of the prompt
candidate_versions = my_prompt.get_candidates()

# Use the live version of the prompt in a completion
response = openai.chat.completions.create(
    model=live_version.get_model(fallback='gpt-4'),  # optional fallback
    temperature=live_version.get_temperature(),
    max_tokens=live_version.get_max_tokens(),
    messages=[{'role': 'user', 'content': 'Your prompt here'}],
)

Logging eval results

When running an online eval, you can log results back to Statsig for analysis. Provide a score between 0 and 1, along with the grader name and any useful metadata (such as session IDs). You must provide the grader manually. Future releases support automated grading options.

python

from statsig_ai import StatsigUser

live_prompt_version = statsig_ai.get_prompt(user, 'my_prompt').get_live()
# Create a user object
user = StatsigUser(user_id='a-user')

# Log the results of the eval
statsig_ai.log_eval_grade(user, live_prompt_version, 0.5, 'my_grader', {
    'session_id': '1234567890',
})

# flush eval grade events to statsig
statsig_ai.flush().wait()

Programmatic evaluation

Programmatic evaluation allows you to run evaluations on datasets programmatically, automatically scoring outputs and sending results to Statsig for analysis.

With programmatic evaluation, you can:

Run evaluations on datasets: Process arrays, iterators, or async generators of input/expected pairs
Define custom tasks: Create functions that generate outputs from inputs (supports both sync and async)
Score outputs: Use single or multiple named scorer functions to evaluate outputs (supports boolean, numeric, or metadata-rich scores)
Use parameters: Pass dynamic parameters to tasks using Zod schemas (Node) or dictionaries (Python)
Categorize data: Group evaluation records by categories for better analysis
Compute summary scores: Aggregate results across all records with custom summary functions
Handle errors gracefully: The SDK catches and reports task and scorer errors without stopping the evaluation

The evaluation automatically sends results to Statsig, where you can view them in the console alongside your other eval data.

Tasks and scorers can be async functions. You can also provide data as async functions, promises, or async iterators. The expected field in data records is optional; scorers can evaluate outputs without expected values. The SDK automatically catches and reports task and scorer errors in the results.

python

from statsig_ai import Eval, EvalScorerArgs, EvalDataRecord, EvalHook

# Basic evaluation with a single scorer
result = Eval(
    name='greeting_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
    eval_run_name='run-123',
)

# Multiple named scorers
result2 = Eval(
    name='multi_scorer_task',
    data=[
        {'input': 'world', 'expected': 'Hello world'},
        {'input': 'test', 'expected': 'Hello test'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
        'starts_with_hello': lambda args: args.output.startswith('Hello'),
        'length_check': lambda args: len(args.output) > 5,
    },
)

# Using parameters
def task_with_params(input: str, hook: EvalHook) -> str:
    prefix = hook.parameters.get('prefix', 'Hello')
    return f'{prefix} {input}'

result3 = Eval(
    name='parameterized_task',
    data=[
        {'input': 'world', 'expected': 'Hi world'},
    ],
    task=task_with_params,
    scorer=lambda args: args.output == args.expected,
    parameters={'prefix': 'Hi', 'suffix': '!', 'number': 123},
)

# Extras: Categories and summary scores
def summary_scorer(results):
    correct = sum(1 for r in results if r.scores.get('correctness', 0.0) == 1.0)
    return {
        'accuracy': correct / len(results) if results else 0.0,
        'total': len(results),
    }

result4 = Eval(
    name='categorized_with_summary',
    data=[
        {'input': 'world', 'expected': 'Hello world', 'category': 'greeting'},
        {'input': 'test', 'expected': 'Hello test', 'category': ['greeting', 'test']},
        {'input': 'foo', 'expected': 'Goodbye foo', 'category': 'farewell'},
    ],
    task=lambda input: f'Hello {input}',
    scorer={
        'correctness': lambda args: args.output == args.expected,
    },
    summary_score_fn=summary_scorer,
)

# Using EvalDataRecord dataclass
result5 = Eval(
    name='dataclass_records',
    data=[
        EvalDataRecord(input='world', expected='Hello world'),
        EvalDataRecord(input='test', expected='Hello test'),
    ],
    task=lambda input: f'Hello {input}',
    scorer=lambda args: args.output == args.expected,
)

OpenTelemetry (OTEL)

The Python AI SDK doesn't yet support OTel tracing. Support is coming soon.

Wrapping OpenAI

The Python AI SDK doesn't yet support the OpenAI wrapper. Support is coming soon.

Using other SDK methods

You can access the Statsig instance from the statsig_ai instance regardless of how you initialized it, and use its methods:

python

# Check a gate value
gate = statsig_ai.get_statsig().check_gate(statsig_user, 'my_gate')

# Log an event
statsig_ai.get_statsig().log_event(statsig_user, 'my_event', value=1)

Refer to the Statsig Python SDK docs for more information on Core Statsig SDK methods, advanced setup, and singleton usage.

Was this helpful?

Python AI SDK

How the Python AI SDK works

Install the SDK

Initialize the SDK

Initializing With Options

Using the SDK

Getting a prompt

Logging eval results

Programmatic evaluation

OpenTelemetry (OTEL)

Wrapping OpenAI

Using other SDK methods