Time Traveling | Alpha Insights

I was doing research on determinism and realized that it is only way I can truly ‘time travel’, both backwards and forwards.

Determinism in computing means that for any given input, I will always get the same output, regardless of when or how many times I run the operation.

In a deterministic server:

If I send the same request twice, I’ll get the same response twice
Running the same function with the same parameters will always produce the same result
There are no random elements that change between runs (or the randomness is controlled with fixed seeds)
Time-dependent operations use a controlled clock rather than the actual system time

The point? If something worked yesterday, it’ll work exactly the same way today. No surprises, no ‘it worked on my machine’ excuses.

In other words, determinism makes my system behavior predictable

This is like having a reliable friend. They may not remember everything you ever told them, but they’ll respond the same way to the same question every time.

How does this differ from non-determinism?

This is different from non-deterministic systems where you might get different results each time due to:

True randomness
System time differences
Race conditions in concurrent operations
External services that might respond differently (ex: OpenAI calls)

OpenAI introduced reproducible outputs in November 2023, with strong evidence pointing to around November 5, 2023, as the launch period

https://platform.openai.com/docs/advanced-usage#reproducible-outputs

On Nov 6 2023, OpenAI team wrote ‘How to make your completions outputs consistent with the new seed parameter’

https://cookbook.openai.com/examples/reproducible_outputs_with_the_seed_parameter

According to OpenAI responses API docs:

This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {
        "role": "developer",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "seed": 4944116822809979520,
    "system_fingerprint": "fp_50cad350e4"
  }'

What about system fingerprint?

This fingerprint represents the backend configuration that the model runs with.

Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

What does this consist of?

The system fingerprint is an identifier for the current combination of model weights, infrastructure, and other configuration options used by OpenAI servers to generate the completion. It changes whenever you change request parameters, or OpenAI updates numerical configuration of the infrastructure serving our models (which may happen a few times a year).

According to their cookbook

If the seed, request parameters, and system_fingerprint all match across your requests, then model outputs will mostly be identical. There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of our models.

According to OpenAI:

To receive (mostly) deterministic outputs across API calls, you can:

Set the seed parameter to any integer of your choice and use the same value across requests you’d like deterministic outputs for.

Ensure all other parameters (like prompt or temperature) are the exact same across requests.

That means I need to set:

seed
prompt
temperature

Let’s see if vLLM can handle this determinism!

vLLM does include a seed parameter within its SamplingParams class. This parameter is explicitly intended to ‘control the randomness of the sampling’ and allow users to set a ‘Random seed to use for the generation,’ similar in purpose to OpenAI’s seed

To maximize determinism in vLLM for a given prompt, similar to OpenAI, I need to control the sampling process tightly:

Set a specific integer seed in SamplingParams.
Set temperature to 0. This forces greedy sampling (always picking the most likely token).
Optionally, set top_k to 1, which explicitly reinforces greedy selection.
Ensure all other relevant SamplingParams (like top_p, penalties, max_tokens, etc.) are identical across requests.
The prompt itself must be identical.

What is top_k?

According to OpenAI:

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

According to vLLM:

Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.

It is important to note that OpenAI’s models still generate different results each time, even when the temperature is set to 0

OpenAI states that using the seed parameter (which implies you’d also set temperature=0 for determinism) provides a ‘best effort’ at deterministic sampling, but ‘Determinism is not guaranteed.’ They explicitly mention referring to the system_fingerprint and acknowledge that even with matching seeds, parameters, and fingerprints, outputs will ‘mostly be identical,’ implying occasional differences due to ‘inherent non-determinism.’

What does this mean?

While setting temperature to 0 and using a seed significantly increases consistency, complete determinism isn’t always guaranteed. Factors like model architecture, floating-point computations, or backend updates might introduce slight variations.

Wait but is event sourcing aligned with determinism?

Yes and no!

By itself, it is not deterministic unless I make it so!

Determinism therefore unlocks a new mental model for me for building software and achieving backward and forward time travel

My new mental model involves a combination of:

Backward Time Travel (Deterministic Event Sourcing)

Records every state change as an immutable event
Uses deterministic processes (fixed seeds, temperature=0) to ensure reproducibility
Allows you to reconstruct any past state exactly as it was
Answers: ‘What actually happened and why?’

Forward Time Travel (Deterministic Simulation Testing)

Simulates possible future scenarios in a controlled environment
Uses deterministic execution to ensure reproducible simulations
Tests how your system would handle various inputs, failures, and edge cases
Answers: ‘What would happen if…?’

Hmm..

It’s also non-deterministic if I drop the temperature to zero. The only way to get deterministic responses is to lock the seed argument to a fixed value.

So what creates determinism for LLMs?

The seed parameter is indeed a critical part of creating determinism with LLMs, but it’s just one piece of the puzzle

For LLM Determinism of OpenAI:

Seed parameter - This is the primary control for making randomness reproducible
Temperature = 0 - Forces the model to always pick the most likely token
Fixed parameters - All other parameters (top_k, top_p, etc.) must be the same
Identical prompt - The exact same input text
Model version - The same model weights (what OpenAI tracks with ‘system_fingerprint’)

Even with all these factors controlled, LLMs still have a small chance of producing slightly different outputs, as noted by OpenAI: ‘There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of our models.’

So what is my new mental model?

Event Sourcing: Records the history of what happened
Deterministic Execution: Makes that history reproducible
Seed control: Is how we make LLMs (mostly) deterministic
DST: Uses deterministic execution to explore future scenarios

My new mental model is to use Deterministic Event Sourcing (DES) which is a more strict version of Event Sourcing (ES).

Here’s how DES/DST compares to alternatives:

Traditional State Management (CRUD)
State Machine Patterns
Actor Model
Functional Programming Principles
Event-Driven Architectures

Traditional State Management (CRUD):

Stores only the current state, typically in a database, modifying it directly (Create, Read, Update, Delete). Lacks inherent history tracking, replay capabilities, or controlled deterministic testing. Simple for basic needs but weak on auditability and debugging complex state issues.

State Machine Patterns:

Models system behavior using explicit, predefined states and transitions triggered by events. Provides clear structure for well-defined workflows but can become complex (‘state explosion’) and lacks the comprehensive, built-in deterministic history replay (DES) or controlled simulation testing (DST) unless specifically added.

Actor Model:

Manages concurrency using independent ‘actors’ that communicate via asynchronous messages, encapsulating their own state. Focuses on scalability and fault tolerance but inherently embraces non-determinism (message ordering), making reproducible debugging and testing (core goals of DES/DST) challenging.

Functional Programming Principles:

A programming paradigm emphasizing pure functions and immutable data. Enhances predictability and testability, naturally supporting DES’s deterministic state application logic. While it aids deterministic approaches, it’s not a complete architectural alternative on its own and doesn’t inherently provide the event log (DES) or simulation framework (DST).

Event-Driven Architectures (EDA):

Architectural style where components interact via asynchronous events, promoting loose coupling and scalability. Focuses on inter-component communication, not necessarily state history (unless combined with ES). Often involves non-deterministic event timing and processing, contrasting with the controlled determinism prioritized by DES/DST.