diff --git a/archons/ezra/IMPLEMENTATION.md b/archons/ezra/IMPLEMENTATION.md new file mode 100644 index 0000000..2ab72d1 --- /dev/null +++ b/archons/ezra/IMPLEMENTATION.md @@ -0,0 +1,226 @@ +# Ezra Archon: SEED Architecture Implementation Plan + +## Overview +This document outlines the implementation of the SEED Architecture for Ezra Archon as part of the Allegro House migration to the Gemma 4 intelligence layer. + +## Architecture Reference +- **Epic:** ARCHITECTURE-SEED-EPIC.md +- **Stack:** Hermes Agent (Fit Layer) → Claw Code Harness → Gemma 4 +- **Philosophy:** Hermes Agent does not think; it routes everything to the harness. + +## Ezra-Specific Configuration + +### 1. Gemma 4 Server Deployment +```yaml +# /opt/gemma4/ezra-config.yaml +gemma4: + default_model: 26b-moe + models: + 26b-moe: + path: /opt/models/gemma-4-26b-it-Q4_K_M.gguf + gpu_layers: -1 + context: 8192 + port: 8645 + 31b: + path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf + gpu_layers: -1 + context: 8192 + port: 8646 + 4b: + path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf + gpu_layers: -1 + context: 8192 + port: 8647 +``` + +### 2. Hermes Agent Fit Layer +```yaml +# ~/.hermes/profiles/ezra/config.yaml +profile: + name: ezra-fit + type: dispatcher + house: allegro + +intelligence: + enabled: false # NO local reasoning - pure dispatch + +dispatch: + target: claw-code-harness + endpoint: http://localhost:9000/harness/ezra + timeout: 30s + retry: 3 + +tools: + allowed: all # Pass through to harness + local: [] # No local tool execution + +logging: + level: info + destination: /var/log/ezra/dispatch.log +``` + +### 3. Claw Code Harness Configuration +```yaml +# /opt/claw/harness/ezra.yaml +harness: + version: 2.0 + archon: ezra + +backend: + primary: gemma4-26b-moe + variants: + - gemma4-2b # Ultra-edge tasks + - gemma4-4b # Standard tasks + - gemma4-31b # Complex reasoning + - gemma4-26b-moe # Default (efficiency) + +routing: + logic: | + if task.complexity > 0.8: use gemma4-31b + elif task.speed_critical: use gemma4-4b + elif task.audio_required: use gemma4-2b + else: use gemma4-26b-moe + +tools: + registry: /etc/claw/tools/ezra/ + auto_load: true + allowed_tools: + - git + - file_system + - telegram + - nostr + +context: + window: 8192 + summarization: auto + +session: + persistence: true + storage: /var/lib/claw/ezra/sessions/ +``` + +## Implementation Phases + +### Phase 1: Foundation (Week 1-2) +- [x] Deploy Gemma 4 server locally (26B MoE primary) +- [ ] Configure llama.cpp or vLLM backend +- [ ] Download and verify GGUF models (26B MoE, 31B, 4B) +- [ ] Benchmark inference speed (>20 tok/s target) + +### Phase 2: Hermes Agent Fit Layer (Week 2-3) +- [ ] Strip local intelligence from Ezra Hermes Agent +- [ ] Configure dispatch-only mode +- [ ] Set up Claw Code harness connection +- [ ] Implement error handling and retry logic +- [ ] Audit: Verify NO local reasoning remains + +### Phase 3: Claw Code Harness Integration (Week 3-4) +- [ ] Configure harness routing for Ezra +- [ ] Set up tool registry access +- [ ] Implement Gemma 4 function calling +- [ ] Configure context window management (8192) +- [ ] Add automatic summarization for long contexts + +### Phase 4: Testing & Hardening (Week 4-5) +- [ ] Tool use test suite passes +- [ ] Multi-turn conversation handling +- [ ] Fallback chain implementation +- [ ] Network audit: No cloud AI calls +- [ ] End-to-end integration test + +### Phase 5: Deployment (Week 5-6) +- [ ] Gitea webhook automation +- [ ] Telegram bot integration +- [ ] Nostr bridge configuration +- [ ] Backblaze B2 backup setup +- [ ] Monitoring and alerting + +## Acceptance Criteria + +| ID | Criteria | Status | Notes | +|----|----------|--------|-------| +| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | ⬜ | Target: 25 tok/s | +| A2 | Hermes Agent has NO local intelligence | ⬜ | Config audit required | +| A3 | All queries route through Claw Code harness | ⬜ | Packet capture verify | +| A4 | Tool use works via Gemma 4 function calling | ⬜ | Test suite TBD | +| A5 | Ezra has independent fit layer config | ⬜ | File: ~/.hermes/profiles/ezra/ | +| A6 | Gitea issues auto-route to Ezra | ⬜ | Webhook config | +| A7 | Telegram bot responds via Gemma 4 | ⬜ | E2E test | +| A8 | No cloud AI calls in packet log | ⬜ | tcpdump verify | + +## Risk Analysis + +| Risk | Probability | Impact | Mitigation | +|------|------------|--------|------------| +| Gemma 4 inference too slow | Medium | High | Use 4B variant for speed-critical tasks | +| GPU memory constraints | Medium | High | Q4_K_M quantization, dynamic GPU offloading | +| Tool use failures | Low | Medium | Extensive prompt engineering, fallback chains | +| Context window overflow | Medium | Medium | Auto-summarization, RAG integration | +| Hardware failure | Low | High | Hot standby config, B2 backup restore | + +## Resources + +### Hardware Requirements +- GPU: NVIDIA with 24GB+ VRAM (RTX 4090 or A10G recommended) +- RAM: 32GB system memory +- Storage: 100GB NVMe for models and logs +- Network: Local-only (no external AI dependencies) + +### Software Dependencies +- llama.cpp (latest) or vLLM 0.4.0+ +- Python 3.10+ +- Hermes Agent v0.5.0+ +- Claw Code Harness v2.0+ + +### Model Files +``` +/opt/models/ +├── gemma-4-2b-it-Q4_K_M.gguf (~1.5GB) +├── gemma-4-4b-it-Q4_K_M.gguf (~3GB) +├── gemma-4-26b-it-Q4_K_M.gguf (~18GB) +└── gemma-4-31b-it-Q4_K_M.gguf (~20GB) +``` + +## Monitoring & Observability + +### Metrics to Track +- Inference tokens/second +- GPU utilization +- Memory usage +- Request latency (p50, p95, p99) +- Error rates +- Tool use success rate + +### Logging +- Dispatch events: `/var/log/ezra/dispatch.log` +- Harness events: `/var/log/claw/harness.log` +- Model inference: `/var/log/gemma4/inference.log` + +## Security Considerations + +- All models run locally - no data leaves the system +- No API keys for external AI services +- Network isolation for Gemma 4 server +- File permissions: 600 for configs, 700 for model directory +- Regular security audits via `lynis` and `rkhunter` + +## References + +1. ARCHITECTURE-SEED-EPIC.md - Master architecture specification +2. `~/.hermes/profiles/gemma4/` - Gemma 4 Hermes profile (reference implementation) +3. `/root/wizards/claw-code/docs/gemma4-integration.md` - Claw Code integration guide +4. https://huggingface.co/blog/gemma4 - Gemma 4 announcement and benchmarks + +## Issue Tracking + +- **Gitea Issue:** #[TBD] +- **Assigned to:** @ezra +- **House:** Allegro +- **Priority:** P0 - Foundation +- **Estimated Duration:** 6 weeks + +--- + +**Created by:** Allegro House +**Last Updated:** 2026-04-02 +**Status:** Planning Phase