feat(ezra): Implement SEED Architecture for Ezra Archon #6

Open
allegro wants to merge 1 commits from feature/ezra-archon-seed into main

View File

@@ -0,0 +1,226 @@
# Ezra Archon: SEED Architecture Implementation Plan
## Overview
This document outlines the implementation of the SEED Architecture for Ezra Archon as part of the Allegro House migration to the Gemma 4 intelligence layer.
## Architecture Reference
- **Epic:** ARCHITECTURE-SEED-EPIC.md
- **Stack:** Hermes Agent (Fit Layer) → Claw Code Harness → Gemma 4
- **Philosophy:** Hermes Agent does not think; it routes everything to the harness.
## Ezra-Specific Configuration
### 1. Gemma 4 Server Deployment
```yaml
# /opt/gemma4/ezra-config.yaml
gemma4:
default_model: 26b-moe
models:
26b-moe:
path: /opt/models/gemma-4-26b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
port: 8645
31b:
path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
port: 8646
4b:
path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
gpu_layers: -1
context: 8192
port: 8647
```
### 2. Hermes Agent Fit Layer
```yaml
# ~/.hermes/profiles/ezra/config.yaml
profile:
name: ezra-fit
type: dispatcher
house: allegro
intelligence:
enabled: false # NO local reasoning - pure dispatch
dispatch:
target: claw-code-harness
endpoint: http://localhost:9000/harness/ezra
timeout: 30s
retry: 3
tools:
allowed: all # Pass through to harness
local: [] # No local tool execution
logging:
level: info
destination: /var/log/ezra/dispatch.log
```
### 3. Claw Code Harness Configuration
```yaml
# /opt/claw/harness/ezra.yaml
harness:
version: 2.0
archon: ezra
backend:
primary: gemma4-26b-moe
variants:
- gemma4-2b # Ultra-edge tasks
- gemma4-4b # Standard tasks
- gemma4-31b # Complex reasoning
- gemma4-26b-moe # Default (efficiency)
routing:
logic: |
if task.complexity > 0.8: use gemma4-31b
elif task.speed_critical: use gemma4-4b
elif task.audio_required: use gemma4-2b
else: use gemma4-26b-moe
tools:
registry: /etc/claw/tools/ezra/
auto_load: true
allowed_tools:
- git
- file_system
- telegram
- nostr
context:
window: 8192
summarization: auto
session:
persistence: true
storage: /var/lib/claw/ezra/sessions/
```
## Implementation Phases
### Phase 1: Foundation (Week 1-2)
- [x] Deploy Gemma 4 server locally (26B MoE primary)
- [ ] Configure llama.cpp or vLLM backend
- [ ] Download and verify GGUF models (26B MoE, 31B, 4B)
- [ ] Benchmark inference speed (>20 tok/s target)
### Phase 2: Hermes Agent Fit Layer (Week 2-3)
- [ ] Strip local intelligence from Ezra Hermes Agent
- [ ] Configure dispatch-only mode
- [ ] Set up Claw Code harness connection
- [ ] Implement error handling and retry logic
- [ ] Audit: Verify NO local reasoning remains
### Phase 3: Claw Code Harness Integration (Week 3-4)
- [ ] Configure harness routing for Ezra
- [ ] Set up tool registry access
- [ ] Implement Gemma 4 function calling
- [ ] Configure context window management (8192)
- [ ] Add automatic summarization for long contexts
### Phase 4: Testing & Hardening (Week 4-5)
- [ ] Tool use test suite passes
- [ ] Multi-turn conversation handling
- [ ] Fallback chain implementation
- [ ] Network audit: No cloud AI calls
- [ ] End-to-end integration test
### Phase 5: Deployment (Week 5-6)
- [ ] Gitea webhook automation
- [ ] Telegram bot integration
- [ ] Nostr bridge configuration
- [ ] Backblaze B2 backup setup
- [ ] Monitoring and alerting
## Acceptance Criteria
| ID | Criteria | Status | Notes |
|----|----------|--------|-------|
| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | ⬜ | Target: 25 tok/s |
| A2 | Hermes Agent has NO local intelligence | ⬜ | Config audit required |
| A3 | All queries route through Claw Code harness | ⬜ | Packet capture verify |
| A4 | Tool use works via Gemma 4 function calling | ⬜ | Test suite TBD |
| A5 | Ezra has independent fit layer config | ⬜ | File: ~/.hermes/profiles/ezra/ |
| A6 | Gitea issues auto-route to Ezra | ⬜ | Webhook config |
| A7 | Telegram bot responds via Gemma 4 | ⬜ | E2E test |
| A8 | No cloud AI calls in packet log | ⬜ | tcpdump verify |
## Risk Analysis
| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| Gemma 4 inference too slow | Medium | High | Use 4B variant for speed-critical tasks |
| GPU memory constraints | Medium | High | Q4_K_M quantization, dynamic GPU offloading |
| Tool use failures | Low | Medium | Extensive prompt engineering, fallback chains |
| Context window overflow | Medium | Medium | Auto-summarization, RAG integration |
| Hardware failure | Low | High | Hot standby config, B2 backup restore |
## Resources
### Hardware Requirements
- GPU: NVIDIA with 24GB+ VRAM (RTX 4090 or A10G recommended)
- RAM: 32GB system memory
- Storage: 100GB NVMe for models and logs
- Network: Local-only (no external AI dependencies)
### Software Dependencies
- llama.cpp (latest) or vLLM 0.4.0+
- Python 3.10+
- Hermes Agent v0.5.0+
- Claw Code Harness v2.0+
### Model Files
```
/opt/models/
├── gemma-4-2b-it-Q4_K_M.gguf (~1.5GB)
├── gemma-4-4b-it-Q4_K_M.gguf (~3GB)
├── gemma-4-26b-it-Q4_K_M.gguf (~18GB)
└── gemma-4-31b-it-Q4_K_M.gguf (~20GB)
```
## Monitoring & Observability
### Metrics to Track
- Inference tokens/second
- GPU utilization
- Memory usage
- Request latency (p50, p95, p99)
- Error rates
- Tool use success rate
### Logging
- Dispatch events: `/var/log/ezra/dispatch.log`
- Harness events: `/var/log/claw/harness.log`
- Model inference: `/var/log/gemma4/inference.log`
## Security Considerations
- All models run locally - no data leaves the system
- No API keys for external AI services
- Network isolation for Gemma 4 server
- File permissions: 600 for configs, 700 for model directory
- Regular security audits via `lynis` and `rkhunter`
## References
1. ARCHITECTURE-SEED-EPIC.md - Master architecture specification
2. `~/.hermes/profiles/gemma4/` - Gemma 4 Hermes profile (reference implementation)
3. `/root/wizards/claw-code/docs/gemma4-integration.md` - Claw Code integration guide
4. https://huggingface.co/blog/gemma4 - Gemma 4 announcement and benchmarks
## Issue Tracking
- **Gitea Issue:** #[TBD]
- **Assigned to:** @ezra
- **House:** Allegro
- **Priority:** P0 - Foundation
- **Estimated Duration:** 6 weeks
---
**Created by:** Allegro House
**Last Updated:** 2026-04-02
**Status:** Planning Phase