feat(ezra): Implement SEED Architecture for Ezra Archon #6
226
archons/ezra/IMPLEMENTATION.md
Normal file
226
archons/ezra/IMPLEMENTATION.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Ezra Archon: SEED Architecture Implementation Plan
|
||||
|
||||
## Overview
|
||||
This document outlines the implementation of the SEED Architecture for Ezra Archon as part of the Allegro House migration to the Gemma 4 intelligence layer.
|
||||
|
||||
## Architecture Reference
|
||||
- **Epic:** ARCHITECTURE-SEED-EPIC.md
|
||||
- **Stack:** Hermes Agent (Fit Layer) → Claw Code Harness → Gemma 4
|
||||
- **Philosophy:** Hermes Agent does not think; it routes everything to the harness.
|
||||
|
||||
## Ezra-Specific Configuration
|
||||
|
||||
### 1. Gemma 4 Server Deployment
|
||||
```yaml
|
||||
# /opt/gemma4/ezra-config.yaml
|
||||
gemma4:
|
||||
default_model: 26b-moe
|
||||
models:
|
||||
26b-moe:
|
||||
path: /opt/models/gemma-4-26b-it-Q4_K_M.gguf
|
||||
gpu_layers: -1
|
||||
context: 8192
|
||||
port: 8645
|
||||
31b:
|
||||
path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
|
||||
gpu_layers: -1
|
||||
context: 8192
|
||||
port: 8646
|
||||
4b:
|
||||
path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
|
||||
gpu_layers: -1
|
||||
context: 8192
|
||||
port: 8647
|
||||
```
|
||||
|
||||
### 2. Hermes Agent Fit Layer
|
||||
```yaml
|
||||
# ~/.hermes/profiles/ezra/config.yaml
|
||||
profile:
|
||||
name: ezra-fit
|
||||
type: dispatcher
|
||||
house: allegro
|
||||
|
||||
intelligence:
|
||||
enabled: false # NO local reasoning - pure dispatch
|
||||
|
||||
dispatch:
|
||||
target: claw-code-harness
|
||||
endpoint: http://localhost:9000/harness/ezra
|
||||
timeout: 30s
|
||||
retry: 3
|
||||
|
||||
tools:
|
||||
allowed: all # Pass through to harness
|
||||
local: [] # No local tool execution
|
||||
|
||||
logging:
|
||||
level: info
|
||||
destination: /var/log/ezra/dispatch.log
|
||||
```
|
||||
|
||||
### 3. Claw Code Harness Configuration
|
||||
```yaml
|
||||
# /opt/claw/harness/ezra.yaml
|
||||
harness:
|
||||
version: 2.0
|
||||
archon: ezra
|
||||
|
||||
backend:
|
||||
primary: gemma4-26b-moe
|
||||
variants:
|
||||
- gemma4-2b # Ultra-edge tasks
|
||||
- gemma4-4b # Standard tasks
|
||||
- gemma4-31b # Complex reasoning
|
||||
- gemma4-26b-moe # Default (efficiency)
|
||||
|
||||
routing:
|
||||
logic: |
|
||||
if task.complexity > 0.8: use gemma4-31b
|
||||
elif task.speed_critical: use gemma4-4b
|
||||
elif task.audio_required: use gemma4-2b
|
||||
else: use gemma4-26b-moe
|
||||
|
||||
tools:
|
||||
registry: /etc/claw/tools/ezra/
|
||||
auto_load: true
|
||||
allowed_tools:
|
||||
- git
|
||||
- file_system
|
||||
- telegram
|
||||
- nostr
|
||||
|
||||
context:
|
||||
window: 8192
|
||||
summarization: auto
|
||||
|
||||
session:
|
||||
persistence: true
|
||||
storage: /var/lib/claw/ezra/sessions/
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Foundation (Week 1-2)
|
||||
- [x] Deploy Gemma 4 server locally (26B MoE primary)
|
||||
- [ ] Configure llama.cpp or vLLM backend
|
||||
- [ ] Download and verify GGUF models (26B MoE, 31B, 4B)
|
||||
- [ ] Benchmark inference speed (>20 tok/s target)
|
||||
|
||||
### Phase 2: Hermes Agent Fit Layer (Week 2-3)
|
||||
- [ ] Strip local intelligence from Ezra Hermes Agent
|
||||
- [ ] Configure dispatch-only mode
|
||||
- [ ] Set up Claw Code harness connection
|
||||
- [ ] Implement error handling and retry logic
|
||||
- [ ] Audit: Verify NO local reasoning remains
|
||||
|
||||
### Phase 3: Claw Code Harness Integration (Week 3-4)
|
||||
- [ ] Configure harness routing for Ezra
|
||||
- [ ] Set up tool registry access
|
||||
- [ ] Implement Gemma 4 function calling
|
||||
- [ ] Configure context window management (8192)
|
||||
- [ ] Add automatic summarization for long contexts
|
||||
|
||||
### Phase 4: Testing & Hardening (Week 4-5)
|
||||
- [ ] Tool use test suite passes
|
||||
- [ ] Multi-turn conversation handling
|
||||
- [ ] Fallback chain implementation
|
||||
- [ ] Network audit: No cloud AI calls
|
||||
- [ ] End-to-end integration test
|
||||
|
||||
### Phase 5: Deployment (Week 5-6)
|
||||
- [ ] Gitea webhook automation
|
||||
- [ ] Telegram bot integration
|
||||
- [ ] Nostr bridge configuration
|
||||
- [ ] Backblaze B2 backup setup
|
||||
- [ ] Monitoring and alerting
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
| ID | Criteria | Status | Notes |
|
||||
|----|----------|--------|-------|
|
||||
| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | ⬜ | Target: 25 tok/s |
|
||||
| A2 | Hermes Agent has NO local intelligence | ⬜ | Config audit required |
|
||||
| A3 | All queries route through Claw Code harness | ⬜ | Packet capture verify |
|
||||
| A4 | Tool use works via Gemma 4 function calling | ⬜ | Test suite TBD |
|
||||
| A5 | Ezra has independent fit layer config | ⬜ | File: ~/.hermes/profiles/ezra/ |
|
||||
| A6 | Gitea issues auto-route to Ezra | ⬜ | Webhook config |
|
||||
| A7 | Telegram bot responds via Gemma 4 | ⬜ | E2E test |
|
||||
| A8 | No cloud AI calls in packet log | ⬜ | tcpdump verify |
|
||||
|
||||
## Risk Analysis
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Gemma 4 inference too slow | Medium | High | Use 4B variant for speed-critical tasks |
|
||||
| GPU memory constraints | Medium | High | Q4_K_M quantization, dynamic GPU offloading |
|
||||
| Tool use failures | Low | Medium | Extensive prompt engineering, fallback chains |
|
||||
| Context window overflow | Medium | Medium | Auto-summarization, RAG integration |
|
||||
| Hardware failure | Low | High | Hot standby config, B2 backup restore |
|
||||
|
||||
## Resources
|
||||
|
||||
### Hardware Requirements
|
||||
- GPU: NVIDIA with 24GB+ VRAM (RTX 4090 or A10G recommended)
|
||||
- RAM: 32GB system memory
|
||||
- Storage: 100GB NVMe for models and logs
|
||||
- Network: Local-only (no external AI dependencies)
|
||||
|
||||
### Software Dependencies
|
||||
- llama.cpp (latest) or vLLM 0.4.0+
|
||||
- Python 3.10+
|
||||
- Hermes Agent v0.5.0+
|
||||
- Claw Code Harness v2.0+
|
||||
|
||||
### Model Files
|
||||
```
|
||||
/opt/models/
|
||||
├── gemma-4-2b-it-Q4_K_M.gguf (~1.5GB)
|
||||
├── gemma-4-4b-it-Q4_K_M.gguf (~3GB)
|
||||
├── gemma-4-26b-it-Q4_K_M.gguf (~18GB)
|
||||
└── gemma-4-31b-it-Q4_K_M.gguf (~20GB)
|
||||
```
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
### Metrics to Track
|
||||
- Inference tokens/second
|
||||
- GPU utilization
|
||||
- Memory usage
|
||||
- Request latency (p50, p95, p99)
|
||||
- Error rates
|
||||
- Tool use success rate
|
||||
|
||||
### Logging
|
||||
- Dispatch events: `/var/log/ezra/dispatch.log`
|
||||
- Harness events: `/var/log/claw/harness.log`
|
||||
- Model inference: `/var/log/gemma4/inference.log`
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- All models run locally - no data leaves the system
|
||||
- No API keys for external AI services
|
||||
- Network isolation for Gemma 4 server
|
||||
- File permissions: 600 for configs, 700 for model directory
|
||||
- Regular security audits via `lynis` and `rkhunter`
|
||||
|
||||
## References
|
||||
|
||||
1. ARCHITECTURE-SEED-EPIC.md - Master architecture specification
|
||||
2. `~/.hermes/profiles/gemma4/` - Gemma 4 Hermes profile (reference implementation)
|
||||
3. `/root/wizards/claw-code/docs/gemma4-integration.md` - Claw Code integration guide
|
||||
4. https://huggingface.co/blog/gemma4 - Gemma 4 announcement and benchmarks
|
||||
|
||||
## Issue Tracking
|
||||
|
||||
- **Gitea Issue:** #[TBD]
|
||||
- **Assigned to:** @ezra
|
||||
- **House:** Allegro
|
||||
- **Priority:** P0 - Foundation
|
||||
- **Estimated Duration:** 6 weeks
|
||||
|
||||
---
|
||||
|
||||
**Created by:** Allegro House
|
||||
**Last Updated:** 2026-04-02
|
||||
**Status:** Planning Phase
|
||||
Reference in New Issue
Block a user