Kriton
Local AI services for embeddings and semantic operations.
Overview
Kriton is the centralized embeddings service for Rhea infrastructure. Named from Greek Κρίτων (one who judges/distinguishes), it provides vector embeddings using locally-hosted models via Ollama.
Key responsibilities:
- Generate text embeddings for semantic search
- Provide a unified API for all embedding operations
- Eliminate external API dependencies (no OpenAI, no costs, no rate limits)
URLs
| Endpoint | URL |
|---|---|
| API | https://kriton.meetrhea.com |
| Health | https://kriton.meetrhea.com/health |
| Models | https://kriton.meetrhea.com/api/v1/models |
Architecture
┌─────────────────────────────────────────────────────────┐
│ Consumers │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Janus │ │ Argus │ │ Agents │ │ Other │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Kriton │ ← FastAPI │
│ │ API │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Ollama │ ← Local LLM Runtime │
│ │ (local) │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ nomic- │ ← Embedding Model │
│ │ embed- │ 768 dimensions │
│ │ text │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────────┘
Embedding Model
Kriton uses nomic-embed-text via Ollama:
| Property | Value |
|---|---|
| Model | nomic-embed-text |
| Dimensions | 768 |
| Runtime | Ollama (local) |
| Cost | Free (self-hosted) |
| Rate Limits | None |
Task Prefixes
nomic-embed-text uses task prefixes for optimal performance:
| Task | Use Case | Example |
|---|---|---|
search_document | Content being indexed | Devlogs, tickets, learnings |
search_query | User queries | "How do I deploy to Coolify?" |
classification | Classification tasks | Categorizing content |
clustering | Clustering tasks | Grouping similar items |
API Reference
Generate Embedding
POST /api/v1/embed
Content-Type: application/json
{
"text": "Your text to embed",
"task": "search_document"
}
Response:
{
"embedding": [0.123, -0.456, ...],
"model": "nomic-embed-text",
"dimensions": 768
}
Batch Embeddings
POST /api/v1/embed/batch
Content-Type: application/json
{
"texts": ["First text", "Second text", "Third text"],
"task": "search_document"
}
Response:
{
"embeddings": [[0.123, ...], [0.456, ...], [0.789, ...]],
"model": "nomic-embed-text",
"dimensions": 768,
"count": 3
}
List Models
GET /api/v1/models
Response:
{
"current_model": "nomic-embed-text",
"embedding_models": ["nomic-embed-text:latest"],
"all_models": ["nomic-embed-text:latest", "llama3.2:latest", ...]
}
Health Check
GET /health
Response:
{
"status": "healthy",
"service": "kriton",
"version": "0.2.0",
"ollama": "connected",
"ollama_url": "http://host.docker.internal:11434",
"model": "nomic-embed-text"
}
Integration with Janus
Janus uses Kriton for semantic search and hybrid context discovery:
# Janus calls Kriton for embeddings
async def embed(self, text: str, task: str = "search_document") -> List[float]:
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.kriton_url}/api/v1/embed",
json={"text": text, "task": task},
timeout=30.0,
)
return response.json()["embedding"]
Hybrid Context Discovery
Kriton embeddings power the hybrid context system:
- Semantic Search: Find content by meaning, not just keywords
- Graph Traversal: Expand with connected relationships
- Combined Ranking: Items found by both methods get boosted
Query: "authentication issues"
↓
Kriton generates query embedding (task: search_query)
↓
Vector similarity search finds:
- Devlog about SSO debugging
- Learning about OAuth flows
- Ticket about login failures
↓
Graph traversal expands:
- Related services (Authentik, Argus)
- Connected concepts (security, sessions)
↓
Merged and ranked context returned
Agent Registration
Kriton is registered as an AI agent for higher-level semantic operations:
Agent ID: kriton
Category: ai
Capabilities:
- Embeddings generation
- Similarity search
- Context discovery
- Semantic ranking
Invoke via:
ask_agent(agent_id="kriton", request="Find relevant context for deploying a new service")
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_URL | http://host.docker.internal:11434 | Ollama API endpoint |
EMBEDDING_MODEL | nomic-embed-text | Model to use for embeddings |
Database Integration
Embeddings are stored in PostgreSQL using pgvector:
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Add embedding column
ALTER TABLE argus.learnings
ADD COLUMN embedding vector(768);
-- Similarity search
SELECT id, title,
1 - (embedding <=> query_vector) as similarity
FROM argus.learnings
WHERE embedding IS NOT NULL
ORDER BY embedding <=> query_vector
LIMIT 10;
Tables with embeddings:
argus.learnings- Captured knowledge and recommendationsargus.devlogs- Development logs and decisionsargus.tickets- Work items and tasks
Deployment
Kriton runs on Coolify with:
- Build: Dockerfile
- Port: 8000
- Domain: kriton.meetrhea.com
Requirements
- Ollama running on the host machine
- nomic-embed-text model pulled:
ollama pull nomic-embed-text
Repository
- GitHub: meetrhea/kriton
- Tech Stack: Python, FastAPI, httpx
Related Services
- Janus: Calls Kriton for embeddings, provides MCP tools
- Argus: Stores embeddings in PostgreSQL with pgvector
- Ollama: Local LLM runtime providing the embedding model