Kriton

Local AI services for embeddings and semantic operations.

Overview

Kriton is the centralized embeddings service for Rhea infrastructure. Named from Greek Κρίτων (one who judges/distinguishes), it provides vector embeddings using locally-hosted models via Ollama.

Key responsibilities:

Generate text embeddings for semantic search
Provide a unified API for all embedding operations
Eliminate external API dependencies (no OpenAI, no costs, no rate limits)

URLs

Endpoint	URL
API	https://kriton.meetrhea.com
Health	https://kriton.meetrhea.com/health
Models	https://kriton.meetrhea.com/api/v1/models

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Consumers                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐    │
│  │  Janus  │  │  Argus  │  │  Agents │  │  Other  │    │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘    │
│       │            │            │            │          │
│       └────────────┴────────────┴────────────┘          │
│                          │                               │
│                    ┌─────▼─────┐                        │
│                    │  Kriton   │ ← FastAPI              │
│                    │  API      │                        │
│                    └─────┬─────┘                        │
│                          │                               │
│                    ┌─────▼─────┐                        │
│                    │  Ollama   │ ← Local LLM Runtime    │
│                    │  (local)  │                        │
│                    └─────┬─────┘                        │
│                          │                               │
│                    ┌─────▼─────┐                        │
│                    │  nomic-   │ ← Embedding Model      │
│                    │  embed-   │   768 dimensions       │
│                    │  text     │                        │
│                    └───────────┘                        │
└─────────────────────────────────────────────────────────┘

Embedding Model

Kriton uses nomic-embed-text via Ollama:

Property	Value
Model	nomic-embed-text
Dimensions	768
Runtime	Ollama (local)
Cost	Free (self-hosted)
Rate Limits	None

Task Prefixes

nomic-embed-text uses task prefixes for optimal performance:

Task	Use Case	Example
`search_document`	Content being indexed	Devlogs, tickets, learnings
`search_query`	User queries	"How do I deploy to Coolify?"
`classification`	Classification tasks	Categorizing content
`clustering`	Clustering tasks	Grouping similar items

API Reference

Generate Embedding

POST /api/v1/embed
Content-Type: application/json

{
  "text": "Your text to embed",
  "task": "search_document"
}

Response:

{
  "embedding": [0.123, -0.456, ...],
  "model": "nomic-embed-text",
  "dimensions": 768
}

Batch Embeddings

POST /api/v1/embed/batch
Content-Type: application/json

{
  "texts": ["First text", "Second text", "Third text"],
  "task": "search_document"
}

Response:

{
  "embeddings": [[0.123, ...], [0.456, ...], [0.789, ...]],
  "model": "nomic-embed-text",
  "dimensions": 768,
  "count": 3
}

List Models

GET /api/v1/models

Response:

{
  "current_model": "nomic-embed-text",
  "embedding_models": ["nomic-embed-text:latest"],
  "all_models": ["nomic-embed-text:latest", "llama3.2:latest", ...]
}

Health Check

GET /health

Response:

{
  "status": "healthy",
  "service": "kriton",
  "version": "0.2.0",
  "ollama": "connected",
  "ollama_url": "http://host.docker.internal:11434",
  "model": "nomic-embed-text"
}

Integration with Janus

Janus uses Kriton for semantic search and hybrid context discovery:

# Janus calls Kriton for embeddings
async def embed(self, text: str, task: str = "search_document") -> List[float]:
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{self.kriton_url}/api/v1/embed",
            json={"text": text, "task": task},
            timeout=30.0,
        )
        return response.json()["embedding"]

Hybrid Context Discovery

Kriton embeddings power the hybrid context system:

Semantic Search: Find content by meaning, not just keywords
Graph Traversal: Expand with connected relationships
Combined Ranking: Items found by both methods get boosted

Query: "authentication issues"
  ↓
Kriton generates query embedding (task: search_query)
  ↓
Vector similarity search finds:
  - Devlog about SSO debugging
  - Learning about OAuth flows
  - Ticket about login failures
  ↓
Graph traversal expands:
  - Related services (Authentik, Argus)
  - Connected concepts (security, sessions)
  ↓
Merged and ranked context returned

Agent Registration

Kriton is registered as an AI agent for higher-level semantic operations:

Agent ID: kriton
Category: ai
Capabilities:
  - Embeddings generation
  - Similarity search
  - Context discovery
  - Semantic ranking

Invoke via:

ask_agent(agent_id="kriton", request="Find relevant context for deploying a new service")

Environment Variables

Variable	Default	Description
`OLLAMA_URL`	`http://host.docker.internal:11434`	Ollama API endpoint
`EMBEDDING_MODEL`	`nomic-embed-text`	Model to use for embeddings

Database Integration

Embeddings are stored in PostgreSQL using pgvector:

-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add embedding column
ALTER TABLE argus.learnings
ADD COLUMN embedding vector(768);

-- Similarity search
SELECT id, title,
       1 - (embedding <=> query_vector) as similarity
FROM argus.learnings
WHERE embedding IS NOT NULL
ORDER BY embedding <=> query_vector
LIMIT 10;

Tables with embeddings:

argus.learnings - Captured knowledge and recommendations
argus.devlogs - Development logs and decisions
argus.tickets - Work items and tasks

Deployment

Kriton runs on Coolify with:

Build: Dockerfile
Port: 8000
Domain: kriton.meetrhea.com

Requirements

Ollama running on the host machine
nomic-embed-text model pulled: ollama pull nomic-embed-text

Repository

GitHub: meetrhea/kriton
Tech Stack: Python, FastAPI, httpx

Janus: Calls Kriton for embeddings, provides MCP tools
Argus: Stores embeddings in PostgreSQL with pgvector
Ollama: Local LLM runtime providing the embedding model

Overview​

URLs​

Architecture​

Embedding Model​

Task Prefixes​

API Reference​

Generate Embedding​

Batch Embeddings​

List Models​

Health Check​

Integration with Janus​

Hybrid Context Discovery​

Agent Registration​

Environment Variables​

Database Integration​

Deployment​

Requirements​

Repository​

Related Services​

Overview

URLs

Architecture

Embedding Model

Task Prefixes

API Reference

Generate Embedding

Batch Embeddings

List Models

Health Check

Integration with Janus

Hybrid Context Discovery

Agent Registration

Environment Variables

Database Integration

Deployment

Requirements

Repository

Related Services