I was doing research on determinism and realized that it is only way I can truly ‘time travel’, both backwards and forwards.
Determinism in computing means that for any given input, I will always get the same output, regardless of when or how many times I run the operation.
In a deterministic server:
- If I send the same request twice, I’ll get the same response twice
- Running the same function with the same parameters will always produce the same result
- There are no random elements that change between runs (or the randomness is controlled with fixed seeds)
- Time-dependent operations use a controlled clock rather than the actual system time
The point? If something worked yesterday, it’ll work exactly the same way today. No surprises, no ‘it worked on my machine’ excuses.
In other words, determinism makes my system behavior predictable
This is like having a reliable friend. They may not remember everything you ever told them, but they’ll respond the same way to the same question every time.
How does this differ from non-determinism?
This is different from non-deterministic systems where you might get different results each time due to:
- True randomness
- System time differences
- Race conditions in concurrent operations
- External services that might respond differently (ex: OpenAI calls)
OpenAI introduced reproducible outputs in November 2023, with strong evidence pointing to around November 5, 2023, as the launch period
https://platform.openai.com/docs/advanced-usage#reproducible-outputs
On Nov 6 2023, OpenAI team wrote ‘How to make your completions outputs consistent with the new seed parameter’
https://cookbook.openai.com/examples/reproducible_outputs_with_the_seed_parameter
According to OpenAI responses API docs:
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4.1",
"messages": [
{
"role": "developer",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
],
"seed": 4944116822809979520,
"system_fingerprint": "fp_50cad350e4"
}'
What about system fingerprint?
This fingerprint represents the backend configuration that the model runs with.
Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.
What does this consist of?
The system fingerprint is an identifier for the current combination of model weights, infrastructure, and other configuration options used by OpenAI servers to generate the completion. It changes whenever you change request parameters, or OpenAI updates numerical configuration of the infrastructure serving our models (which may happen a few times a year).
According to their cookbook
If the
seed, request parameters, andsystem_fingerprintall match across your requests, then model outputs will mostly be identical. There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of our models.
According to OpenAI:
To receive (mostly) deterministic outputs across API calls, you can:
- Set the
seedparameter to any integer of your choice and use the same value across requests you’d like deterministic outputs for.- Ensure all other parameters (like
promptortemperature) are the exact same across requests.
That means I need to set:
seedprompttemperature
Let’s see if vLLM can handle this determinism!
vLLM does include a seed parameter within its SamplingParams class. This parameter is explicitly intended to ‘control the randomness of the sampling’ and allow users to set a ‘Random seed to use for the generation,’ similar in purpose to OpenAI’s seed
To maximize determinism in vLLM for a given prompt, similar to OpenAI, I need to control the sampling process tightly:
- Set a specific integer seed in SamplingParams.
- Set temperature to 0. This forces greedy sampling (always picking the most likely token).
- Optionally, set top_k to 1, which explicitly reinforces greedy selection.
- Ensure all other relevant SamplingParams (like top_p, penalties, max_tokens, etc.) are identical across requests.
- The prompt itself must be identical.
What is top_k?
According to OpenAI:
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or
temperaturebut not both.
According to vLLM:
Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.
It is important to note that OpenAI’s models still generate different results each time, even when the temperature is set to 0
OpenAI states that using the seed parameter (which implies you’d also set temperature=0 for determinism) provides a ‘best effort’ at deterministic sampling, but ‘Determinism is not guaranteed.’ They explicitly mention referring to the system_fingerprint and acknowledge that even with matching seeds, parameters, and fingerprints, outputs will ‘mostly be identical,’ implying occasional differences due to ‘inherent non-determinism.’
What does this mean?
While setting temperature to 0 and using a seed significantly increases consistency, complete determinism isn’t always guaranteed. Factors like model architecture, floating-point computations, or backend updates might introduce slight variations.
Wait but is event sourcing aligned with determinism?
Yes and no!
By itself, it is not deterministic unless I make it so!
Determinism therefore unlocks a new mental model for me for building software and achieving backward and forward time travel
My new mental model involves a combination of:
- Backward Time Travel (Deterministic Event Sourcing)
- Records every state change as an immutable event
- Uses deterministic processes (fixed seeds, temperature=0) to ensure reproducibility
- Allows you to reconstruct any past state exactly as it was
- Answers: ‘What actually happened and why?’
- Forward Time Travel (Deterministic Simulation Testing)
- Simulates possible future scenarios in a controlled environment
- Uses deterministic execution to ensure reproducible simulations
- Tests how your system would handle various inputs, failures, and edge cases
- Answers: ‘What would happen if…?’
Hmm..
It’s also non-deterministic if I drop the temperature to zero. The only way to get deterministic responses is to lock the seed argument to a fixed value.
So what creates determinism for LLMs?
The seed parameter is indeed a critical part of creating determinism with LLMs, but it’s just one piece of the puzzle
For LLM Determinism of OpenAI:
- Seed parameter - This is the primary control for making randomness reproducible
- Temperature = 0 - Forces the model to always pick the most likely token
- Fixed parameters - All other parameters (top_k, top_p, etc.) must be the same
- Identical prompt - The exact same input text
- Model version - The same model weights (what OpenAI tracks with ‘system_fingerprint’)
Even with all these factors controlled, LLMs still have a small chance of producing slightly different outputs, as noted by OpenAI: ‘There is a small chance that responses differ even when request parameters and system_fingerprint match, due to the inherent non-determinism of our models.’
So what is my new mental model?
- Event Sourcing: Records the history of what happened
- Deterministic Execution: Makes that history reproducible
- Seed control: Is how we make LLMs (mostly) deterministic
- DST: Uses deterministic execution to explore future scenarios
My new mental model is to use Deterministic Event Sourcing (DES) which is a more strict version of Event Sourcing (ES).
Here’s how DES/DST compares to alternatives:
- Traditional State Management (CRUD)
- State Machine Patterns
- Actor Model
- Functional Programming Principles
- Event-Driven Architectures
Traditional State Management (CRUD):
Stores only the current state, typically in a database, modifying it directly (Create, Read, Update, Delete). Lacks inherent history tracking, replay capabilities, or controlled deterministic testing. Simple for basic needs but weak on auditability and debugging complex state issues.
State Machine Patterns:
Models system behavior using explicit, predefined states and transitions triggered by events. Provides clear structure for well-defined workflows but can become complex (‘state explosion’) and lacks the comprehensive, built-in deterministic history replay (DES) or controlled simulation testing (DST) unless specifically added.
Actor Model:
Manages concurrency using independent ‘actors’ that communicate via asynchronous messages, encapsulating their own state. Focuses on scalability and fault tolerance but inherently embraces non-determinism (message ordering), making reproducible debugging and testing (core goals of DES/DST) challenging.
Functional Programming Principles:
A programming paradigm emphasizing pure functions and immutable data. Enhances predictability and testability, naturally supporting DES’s deterministic state application logic. While it aids deterministic approaches, it’s not a complete architectural alternative on its own and doesn’t inherently provide the event log (DES) or simulation framework (DST).
Event-Driven Architectures (EDA):
Architectural style where components interact via asynchronous events, promoting loose coupling and scalability. Focuses on inter-component communication, not necessarily state history (unless combined with ES). Often involves non-deterministic event timing and processing, contrasting with the controlled determinism prioritized by DES/DST.