OMEGA Core has had one API since Phase 3: the Evidence API. It queries Docker, PostgreSQL, Ollama and the host system, and returns live platform state over HTTP. Read-only. Stateless. An observer.
The obvious next step was to be able to send a prompt to one of the installed models and get a response back. One endpoint. The dependency on Ollama is already there. The implementation would be straightforward.
But the endpoint would have been wrong in the Evidence API.
What the Evidence API is for
The Evidence API has one job: tell you what is happening on the platform right now. Docker container states, service health, model inventory, system metrics. Every endpoint is a read. Nothing it does changes the state of anything else.
That constraint is deliberate. When you are building a security evaluation platform, having an authoritative read-only evidence layer is foundational. It is the thing you trust when you are debugging, auditing or evaluating the platform. Its value comes from what it does not do as much as what it does.
Adding a /query endpoint — something that sends a prompt to a language model and returns a response — would dissolve that constraint. The service would no longer be purely observing the platform. It would be actively using it. The conceptual boundary breaks, and with it the clarity about what the Evidence API is for.
More concretely: inference has different characteristics to evidence collection. It is slow rather than fast. It is stateful within a session. It has security considerations — prompt injection, model abuse, rate limiting — that have no equivalent in read-only state queries. The security assessment of these two services looks completely different. Keeping them separate means you can assess them separately.
Naming the first thing properly
Before building the second service, there was a naming problem to fix.
The Evidence API was named omegacore-api. That name made sense with one API. With two, it is ambiguous. Which one is “the API”? Ambiguous names are maintenance debt — they become traps in documentation, scripts, monitoring rules, and the heads of everyone working on the platform.
So the rename happened first. Container name, Python package, compose stack, directory — everything changed from omegacore-api to omegacore-evidence. The directory structure was reorganised so both services live under api/ at the repository root: api/evidence/ and api/inference/.
The rename was one commit. Done early, the cost was low. Done after a year of accumulated tooling and documentation, it becomes a project.
The rule is simple: name things for what they do, not what they are. “API” is a type. “Evidence” is a purpose.
The Inference API
The Inference API is a FastAPI application at port 8001. It exposes four endpoints:
POST /query— accepts a model name and a prompt, returns the response and duration in millisecondsGET /models— lists available Ollama modelsGET /healthGET /version
It communicates with omegacore-ollama via omega_internal using container name resolution. Bound to 10.10.1.94:8001 on the LAN.
Model selection is explicit. Every query names the model it wants. You cannot send a prompt without specifying a model. This is a deliberate design decision: on a platform with multiple specialist role models, choosing the right model for the task is not a preference — it is a consequential choice that should be visible and intentional.
The omega-query.sh script wraps the API for use from the command line:
./scripts/omega-query.sh ask omega-threat-analyst 'What is prompt injection?'
./scripts/omega-query.sh ask 'What is the capital of France?'
./scripts/omega-query.sh models
The second form uses llama3.2:1b as the default. The first form names the model explicitly.
What this changes
The platform now has two distinct API surfaces:
| Service | Port | Purpose |
|---|---|---|
omegacore-evidence | 8000 | Read-only platform observation |
omegacore-inference | 8001 | Governed AI query layer |
The Evidence API tells you what is running. The Inference API lets you use it.
That separation also means Claude Code — the AI assistant used to build and maintain OMEGA Core — can query the platform’s own models directly as part of engineering work. Model inventory visible via /models. Prompts sendable to any installed model via /query. This is directly useful for the GAIPS lab work beginning 2026-07-01: probing models for hallucination, comparing responses across models for the same adversarial prompt, and collecting the output as documented evidence rather than manual notes.
The compose structure
Each service has its own compose stack under compose/:
compose/
evidence/
docker-compose.yml
inference/
docker-compose.yml
Independent lifecycle. Independent deployment. The inference stack can be stopped, rebuilt or replaced without touching the evidence stack. Both connect to omega_internal for cross-stack communication with Ollama and the data layer.
This is the same pattern used throughout OMEGA Core: one concern per stack, shared networking for what needs to communicate, nothing coupled that does not have to be.