1 changed files with 226 additions and 0 deletions
--- a/archons/ezra/IMPLEMENTATION.md
+++ b/archons/ezra/IMPLEMENTATION.md
@@ -0,0 +1,226 @@
+# Ezra Archon: SEED Architecture Implementation Plan
+
+## Overview
+This document outlines the implementation of the SEED Architecture for Ezra Archon as part of the Allegro House migration to the Gemma 4 intelligence layer.
+
+## Architecture Reference
+- **Epic:** ARCHITECTURE-SEED-EPIC.md
+- **Stack:** Hermes Agent (Fit Layer) → Claw Code Harness → Gemma 4
+- **Philosophy:** Hermes Agent does not think; it routes everything to the harness.
+
+## Ezra-Specific Configuration
+
+### 1. Gemma 4 Server Deployment
+```yaml
+# /opt/gemma4/ezra-config.yaml
+gemma4:
+  default_model: 26b-moe
+  models:
+    26b-moe:
+      path: /opt/models/gemma-4-26b-it-Q4_K_M.gguf
+      gpu_layers: -1
+      context: 8192
+      port: 8645
+    31b:
+      path: /opt/models/gemma-4-31b-it-Q4_K_M.gguf
+      gpu_layers: -1
+      context: 8192
+      port: 8646
+    4b:
+      path: /opt/models/gemma-4-4b-it-Q4_K_M.gguf
+      gpu_layers: -1
+      context: 8192
+      port: 8647
+```
+
+### 2. Hermes Agent Fit Layer
+```yaml
+# ~/.hermes/profiles/ezra/config.yaml
+profile:
+  name: ezra-fit
+  type: dispatcher
+  house: allegro
+
+intelligence:
+  enabled: false  # NO local reasoning - pure dispatch
+
+dispatch:
+  target: claw-code-harness
+  endpoint: http://localhost:9000/harness/ezra
+  timeout: 30s
+  retry: 3
+  
+tools:
+  allowed: all    # Pass through to harness
+  local: []       # No local tool execution
+  
+logging:
+  level: info
+  destination: /var/log/ezra/dispatch.log
+```
+
+### 3. Claw Code Harness Configuration
+```yaml
+# /opt/claw/harness/ezra.yaml
+harness:
+  version: 2.0
+  archon: ezra
+  
+backend:
+  primary: gemma4-26b-moe
+  variants:
+    - gemma4-2b      # Ultra-edge tasks
+    - gemma4-4b      # Standard tasks  
+    - gemma4-31b     # Complex reasoning
+    - gemma4-26b-moe # Default (efficiency)
+    
+routing:
+  logic: |
+    if task.complexity > 0.8: use gemma4-31b
+    elif task.speed_critical: use gemma4-4b
+    elif task.audio_required: use gemma4-2b
+    else: use gemma4-26b-moe
+    
+tools:
+  registry: /etc/claw/tools/ezra/
+  auto_load: true
+  allowed_tools:
+    - git
+    - file_system
+    - telegram
+    - nostr
+    
+context:
+  window: 8192
+  summarization: auto
+  
+session:
+  persistence: true
+  storage: /var/lib/claw/ezra/sessions/
+```
+
+## Implementation Phases
+
+### Phase 1: Foundation (Week 1-2)
+- [x] Deploy Gemma 4 server locally (26B MoE primary)
+- [ ] Configure llama.cpp or vLLM backend
+- [ ] Download and verify GGUF models (26B MoE, 31B, 4B)
+- [ ] Benchmark inference speed (>20 tok/s target)
+
+### Phase 2: Hermes Agent Fit Layer (Week 2-3)
+- [ ] Strip local intelligence from Ezra Hermes Agent
+- [ ] Configure dispatch-only mode
+- [ ] Set up Claw Code harness connection
+- [ ] Implement error handling and retry logic
+- [ ] Audit: Verify NO local reasoning remains
+
+### Phase 3: Claw Code Harness Integration (Week 3-4)
+- [ ] Configure harness routing for Ezra
+- [ ] Set up tool registry access
+- [ ] Implement Gemma 4 function calling
+- [ ] Configure context window management (8192)
+- [ ] Add automatic summarization for long contexts
+
+### Phase 4: Testing & Hardening (Week 4-5)
+- [ ] Tool use test suite passes
+- [ ] Multi-turn conversation handling
+- [ ] Fallback chain implementation
+- [ ] Network audit: No cloud AI calls
+- [ ] End-to-end integration test
+
+### Phase 5: Deployment (Week 5-6)
+- [ ] Gitea webhook automation
+- [ ] Telegram bot integration
+- [ ] Nostr bridge configuration
+- [ ] Backblaze B2 backup setup
+- [ ] Monitoring and alerting
+
+## Acceptance Criteria
+
+| ID | Criteria | Status | Notes |
+|----|----------|--------|-------|
+| A1 | Gemma 4 26B MoE serves locally at >20 tok/s | ⬜ | Target: 25 tok/s |
+| A2 | Hermes Agent has NO local intelligence | ⬜ | Config audit required |
+| A3 | All queries route through Claw Code harness | ⬜ | Packet capture verify |
+| A4 | Tool use works via Gemma 4 function calling | ⬜ | Test suite TBD |
+| A5 | Ezra has independent fit layer config | ⬜ | File: ~/.hermes/profiles/ezra/ |
+| A6 | Gitea issues auto-route to Ezra | ⬜ | Webhook config |
+| A7 | Telegram bot responds via Gemma 4 | ⬜ | E2E test |
+| A8 | No cloud AI calls in packet log | ⬜ | tcpdump verify |
+
+## Risk Analysis
+
+| Risk | Probability | Impact | Mitigation |
+|------|------------|--------|------------|
+| Gemma 4 inference too slow | Medium | High | Use 4B variant for speed-critical tasks |
+| GPU memory constraints | Medium | High | Q4_K_M quantization, dynamic GPU offloading |
+| Tool use failures | Low | Medium | Extensive prompt engineering, fallback chains |
+| Context window overflow | Medium | Medium | Auto-summarization, RAG integration |
+| Hardware failure | Low | High | Hot standby config, B2 backup restore |
+
+## Resources
+
+### Hardware Requirements
+- GPU: NVIDIA with 24GB+ VRAM (RTX 4090 or A10G recommended)
+- RAM: 32GB system memory
+- Storage: 100GB NVMe for models and logs
+- Network: Local-only (no external AI dependencies)
+
+### Software Dependencies
+- llama.cpp (latest) or vLLM 0.4.0+
+- Python 3.10+
+- Hermes Agent v0.5.0+
+- Claw Code Harness v2.0+
+
+### Model Files
+```
+/opt/models/
+├── gemma-4-2b-it-Q4_K_M.gguf      (~1.5GB)
+├── gemma-4-4b-it-Q4_K_M.gguf      (~3GB)
+├── gemma-4-26b-it-Q4_K_M.gguf     (~18GB)
+└── gemma-4-31b-it-Q4_K_M.gguf     (~20GB)
+```
+
+## Monitoring & Observability
+
+### Metrics to Track
+- Inference tokens/second
+- GPU utilization
+- Memory usage
+- Request latency (p50, p95, p99)
+- Error rates
+- Tool use success rate
+
+### Logging
+- Dispatch events: `/var/log/ezra/dispatch.log`
+- Harness events: `/var/log/claw/harness.log`
+- Model inference: `/var/log/gemma4/inference.log`
+
+## Security Considerations
+
+- All models run locally - no data leaves the system
+- No API keys for external AI services
+- Network isolation for Gemma 4 server
+- File permissions: 600 for configs, 700 for model directory
+- Regular security audits via `lynis` and `rkhunter`
+
+## References
+
+1. ARCHITECTURE-SEED-EPIC.md - Master architecture specification
+2. `~/.hermes/profiles/gemma4/` - Gemma 4 Hermes profile (reference implementation)
+3. `/root/wizards/claw-code/docs/gemma4-integration.md` - Claw Code integration guide
+4. https://huggingface.co/blog/gemma4 - Gemma 4 announcement and benchmarks
+
+## Issue Tracking
+
+- **Gitea Issue:** #[TBD]
+- **Assigned to:** @ezra
+- **House:** Allegro
+- **Priority:** P0 - Foundation
+- **Estimated Duration:** 6 weeks
+
+---
+
+**Created by:** Allegro House  
+**Last Updated:** 2026-04-02  
+**Status:** Planning Phase