On this page

OpenAI

Integrate Statsig with OpenAI to log AI requests, capture metrics, and run experiments on prompts, models, and parameters across your applications.

How this integration works

When using a pre-trained large language model, several inputs influence user experience: the prompts used, inference parameters like temperature, length penalties, and repetition penalties, and the model selected. Statsig can assign users to experiments that modify these inputs and can identify when changes have a statistically significant impact on user experience metrics. This enables efficient iteration by tweaking model choices, prompts, and inference parameters. This guide shows how to log both implicit indicators of user feedback (like response time) and explicit ones, like self-reported satisfaction.

The example Python code demonstrates the interaction between OpenAI's GPT and Statsig to experiment with model inputs and log user events. This example uses OpenAI's ChatCompletion feature to answer questions, plus a Statsig integration to experiment with model versions and log user feedback.

This example assumes you have a funded OpenAI account and a Statsig experiment that varies the model selected between "gpt-3.5-turbo" and "gpt-4". For more information on setting up a Statsig experiment, refer to the experiments page.

Code breakdown

Initial configuration

Install both the Statsig and OpenAI Python packages before starting:

bash
pip3 install openai, statsig

Then add the following to a Python file:

python
import openai
from statsig import statsig, StatsigEvent, StatsigUser
import time

openai.api_key = "your_openai_key"  # Replace with your own key
statsig.initialize("your_statsig_secret")  # Replace with your Statsig secret
user = StatsigUser("user-id") #This is a placeholder ID - in a normal experiment Statsig recommends using a user's actual unique ID for consistency in targeting. See /concepts/user

The ask_question function

The following code all occurs in one function titled ask_question (refer to the final code).
  1. Get User Input
python
#ask the user for a question to query GPT with
question = input("\nWhat is your question? ")

First, prompt the user for a question to ask ChatGPT.

  1. Query OpenAI's GPT
python

#track the start time so we can check response time later
start_time = time.time()

completion = openai.ChatCompletion.create(
        model=statsig.get_experiment(user, "statsig_openai_integration").get("model", 'gpt-4'),
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question}
        ]
    )

Request a completion that queries the GPT model specified by the Statsig experiment (either gpt-3.5-turbo or gpt-4). The timer started earlier tracks the response time in events.

  1. Display Response & Log Implicit Feedback
python
#log "implicit" indicators to Statsig
c = completion.choices[0]
stats = completion.usage #completion object has the number of tokens used - which is that what GPT usage is charged on.
statsig.log_event(StatsigEvent(user, "chat_completion", value=c.finish_reason, metadata={"response_time": time.time() - start_time, "completion_tokens": stats["completion_tokens"], "prompt_tokens": stats["prompt_tokens"], "total_tokens": stats["total_tokens"]}))

#print the message back to the user
print(f"\nAnswer: {c.message['content']}")

The OpenAI response contains the first set of useful information to log to Statsig, like the response time and tokens used. Log this information using Statsig's SDK, stored in the metadata of the request.

  1. Collect User Feedback & Log to Statsig
python
#track explicit feedback: if the user is satisfied with the answer
satisfaction = input("\nDid this answer your question? (y/n): ")

# Log "explicit" indicators to Statsig
if satisfaction == 'y':
    statsig.log_event(StatsigEvent(user, "satisfaction"))
elif satisfaction == 'n':
    statsig.log_event(StatsigEvent(user, "dissatisfaction"))

Log a more explicit indicator of feedback: the user's self-reported satisfaction or dissatisfaction. The satisfaction metric provides a strong indicator of the model's overall quality.

Run this Python program with the following code, outside of the ask_question function.

python
if __name__ == "__main__":
    while input("Would you like to ask a question? (y/n): ").lower() == 'y':
        ask_question()

Tips for using Statsig with AI

  1. Experimentation: Test other model parameters like temperature, top_p, or initial prompts.
  2. Log useful data: Log other user interactions or feedback that may be informative.
  3. Analyze and iterate: After collecting enough data, analyze the results on the Statsig dashboard.
  4. User identification: Integrate a mechanism to uniquely identify each user or session.

Always ensure you're in compliance with user privacy regulations and that you have user consent where necessary.

Final code

python
import openai
from statsig import statsig, StatsigEvent, StatsigUser
import time

openai.api_key = "your_openai_key"
statsig.initialize("your_statsig_secret")
user = StatsigUser("user-id") #This is a placeholder ID - in a normal experiment Statsig recommends using a user's actual unique ID for consistency in targeting. See /concepts/user

def ask_question():

    #ask the user for a question to query GPT with
    question = input("\nWhat is your question? ")

    #track the start time so we can check response time later
    start_time = time.time()

    #query GPT with the OpenAI Python library for chat completions
    completion = openai.ChatCompletion.create(
        model=statsig.get_experiment(user, "statsig_openai_collab").get("model", 'gpt-4'), #experiment is setup to return either "gpt-3.5-turbo" or "gpt-4".
        #other than varying the model selected, other attributes could be varied like "temperature", "top_p", "presence_penalty" and more.
        #See the "Create chat completions" section of the OpenAI documentation for more: https://platform.openai.com/docs/api-reference/chat/create
        messages=[
            {"role": "system", "content": "You are a helpful assistant."}, #Initial prompts are another candidate for experimentation
            {"role": "user", "content": question}
        ]
    )

    #log "implicit" indicators to Statsig
    c = completion.choices[0] #we've only requested one choice, so selecting the first
    stats = completion.usage #completion object has the number of tokens used - which is that what GPT usage is charged on.
    statsig.log_event(StatsigEvent(user, "chat_completion", value=c.finish_reason, metadata={"response_time": time.time() - start_time, "completion_tokens": stats["completion_tokens"], "prompt_tokens": stats["prompt_tokens"], "total_tokens": stats["total_tokens"]}))

    #print the message back to the user
    print(f"\nAnswer: {c.message['content']}")

    #track explicit feedback: if the user is satisfied with the answer
    satisfaction = input("\nDid this answer your question? (y/n): ")

    #log "explicit" indicators to Statsig
    if satisfaction == 'y':
        statsig.log_event(StatsigEvent(user, "satisfaction"))
    elif satisfaction == 'n':
        statsig.log_event(StatsigEvent(user, "dissatisfaction"))

if __name__ == "__main__":
    #Let the user abandon the question-asking process when they are done
    while input("Would you like to ask a question? (y/n): ").lower() == 'y':
        ask_question()

Was this helpful?