Skip to main content

Kriton

Local AI services for embeddings and semantic operations.

Overview

Kriton is the centralized embeddings service for Rhea infrastructure. Named from Greek Κρίτων (one who judges/distinguishes), it provides vector embeddings using locally-hosted models via Ollama.

Key responsibilities:

  • Generate text embeddings for semantic search
  • Provide a unified API for all embedding operations
  • Eliminate external API dependencies (no OpenAI, no costs, no rate limits)

URLs

EndpointURL
APIhttps://kriton.meetrhea.com
Healthhttps://kriton.meetrhea.com/health
Modelshttps://kriton.meetrhea.com/api/v1/models

Architecture

┌─────────────────────────────────────────────────────────┐
│ Consumers │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Janus │ │ Argus │ │ Agents │ │ Other │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Kriton │ ← FastAPI │
│ │ API │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Ollama │ ← Local LLM Runtime │
│ │ (local) │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ nomic- │ ← Embedding Model │
│ │ embed- │ 768 dimensions │
│ │ text │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────────┘

Embedding Model

Kriton uses nomic-embed-text via Ollama:

PropertyValue
Modelnomic-embed-text
Dimensions768
RuntimeOllama (local)
CostFree (self-hosted)
Rate LimitsNone

Task Prefixes

nomic-embed-text uses task prefixes for optimal performance:

TaskUse CaseExample
search_documentContent being indexedDevlogs, tickets, learnings
search_queryUser queries"How do I deploy to Coolify?"
classificationClassification tasksCategorizing content
clusteringClustering tasksGrouping similar items

API Reference

Generate Embedding

POST /api/v1/embed
Content-Type: application/json

{
"text": "Your text to embed",
"task": "search_document"
}

Response:

{
"embedding": [0.123, -0.456, ...],
"model": "nomic-embed-text",
"dimensions": 768
}

Batch Embeddings

POST /api/v1/embed/batch
Content-Type: application/json

{
"texts": ["First text", "Second text", "Third text"],
"task": "search_document"
}

Response:

{
"embeddings": [[0.123, ...], [0.456, ...], [0.789, ...]],
"model": "nomic-embed-text",
"dimensions": 768,
"count": 3
}

List Models

GET /api/v1/models

Response:

{
"current_model": "nomic-embed-text",
"embedding_models": ["nomic-embed-text:latest"],
"all_models": ["nomic-embed-text:latest", "llama3.2:latest", ...]
}

Health Check

GET /health

Response:

{
"status": "healthy",
"service": "kriton",
"version": "0.2.0",
"ollama": "connected",
"ollama_url": "http://host.docker.internal:11434",
"model": "nomic-embed-text"
}

Integration with Janus

Janus uses Kriton for semantic search and hybrid context discovery:

# Janus calls Kriton for embeddings
async def embed(self, text: str, task: str = "search_document") -> List[float]:
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.kriton_url}/api/v1/embed",
json={"text": text, "task": task},
timeout=30.0,
)
return response.json()["embedding"]

Hybrid Context Discovery

Kriton embeddings power the hybrid context system:

  1. Semantic Search: Find content by meaning, not just keywords
  2. Graph Traversal: Expand with connected relationships
  3. Combined Ranking: Items found by both methods get boosted
Query: "authentication issues"

Kriton generates query embedding (task: search_query)

Vector similarity search finds:
- Devlog about SSO debugging
- Learning about OAuth flows
- Ticket about login failures

Graph traversal expands:
- Related services (Authentik, Argus)
- Connected concepts (security, sessions)

Merged and ranked context returned

Agent Registration

Kriton is registered as an AI agent for higher-level semantic operations:

Agent ID: kriton
Category: ai
Capabilities:
- Embeddings generation
- Similarity search
- Context discovery
- Semantic ranking

Invoke via:

ask_agent(agent_id="kriton", request="Find relevant context for deploying a new service")

Environment Variables

VariableDefaultDescription
OLLAMA_URLhttp://host.docker.internal:11434Ollama API endpoint
EMBEDDING_MODELnomic-embed-textModel to use for embeddings

Database Integration

Embeddings are stored in PostgreSQL using pgvector:

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column
ALTER TABLE argus.learnings
ADD COLUMN embedding vector(768);

-- Similarity search
SELECT id, title,
1 - (embedding <=> query_vector) as similarity
FROM argus.learnings
WHERE embedding IS NOT NULL
ORDER BY embedding <=> query_vector
LIMIT 10;

Tables with embeddings:

  • argus.learnings - Captured knowledge and recommendations
  • argus.devlogs - Development logs and decisions
  • argus.tickets - Work items and tasks

Deployment

Kriton runs on Coolify with:

  • Build: Dockerfile
  • Port: 8000
  • Domain: kriton.meetrhea.com

Requirements

  • Ollama running on the host machine
  • nomic-embed-text model pulled: ollama pull nomic-embed-text

Repository

  • Janus: Calls Kriton for embeddings, provides MCP tools
  • Argus: Stores embeddings in PostgreSQL with pgvector
  • Ollama: Local LLM runtime providing the embedding model