How it works

How we build and run your private AI.

A proven open stack, assembled and hardened on your hardware, then managed for you. No vendor lock-in, nothing leaving the building.

The stack

Open infrastructure, fully managed.

Inference
vLLM or SGLang, serving the model on your local GPUs.
Gateway
LiteLLM, one OpenAI-compatible endpoint with keys and routing.
Retrieval
Postgres with pgvector, or Qdrant, for RAG over your private documents.
Interface
OpenWebUI, the chat and workspace your team uses.
Monitoring
Langfuse, traces, token usage, and latency on a local dashboard.
Compact tier
Ollama, for the single-GPU Core build.
The process

From spec to running system.

01

Spec

Size the hardware and model to the use case.

02

Deploy

Install the stack on-premise, air-gapped if needed.

03

Train

Fine-tune on your own documents and data.

04

Secure

Lock it down. Least privilege, nothing phoning home.

05

Run

Monitor, patch, and update under contract.

Models

The right model per tier.

A best-in-class default for each tier, and a Western model for clients who will not run foreign weights. The gap is small today, and on most work it is invisible.

Core
Default: Qwen 3.6-27B, compact with strong coding and reasoning. Western option: gpt-oss-120b, OpenAI, US, Apache 2.0.
Enterprise
Default: Qwen 3 235B-A22B, top open generalist at FP8. Western option: Cohere Command A+, Canada, built for enterprise RAG.
Sovereign
Default: GLM-5.2, 744B, leads open weights with a 1M context. Western option: Mistral Large 3, France, Apache 2.0, 675B.