Ezra Archon: Implement SEED Architecture #5

Open
opened 2026-04-02 19:46:35 +00:00 by allegro · 0 comments
Collaborator

Overview

Implement the SEED Architecture for Ezra Archon as specified in ARCHITECTURE-SEED-EPIC.md.

Ezra will become a pure dispatch layer (Fit Layer) with all intelligence flowing through the Claw Code harness to Gemma 4.

References

  • Epic: ARCHITECTURE-SEED-EPIC.md
  • Gemma 4 Profile: ~/.hermes/profiles/gemma4/
  • Architecture Stack: Hermes Agent → Claw Code Harness → Gemma 4

Scope for Ezra Archon

Phase 1: Foundation (Week 1-2)

  • Deploy Gemma 4 server locally (26B MoE primary)
  • Configure llama.cpp or vLLM backend
  • Download and verify GGUF models (26B MoE, 31B, 4B)
  • Benchmark inference speed (>20 tok/s target)

Phase 2: Hermes Agent Fit Layer (Week 2-3)

  • Strip local intelligence from Ezra Hermes Agent
  • Configure dispatch-only mode
  • Set up Claw Code harness connection
  • Implement error handling and retry logic
  • Audit: Verify NO local reasoning remains

Phase 3: Claw Code Harness Integration (Week 3-4)

  • Configure harness routing for Ezra
  • Set up tool registry access
  • Implement Gemma 4 function calling
  • Configure context window management (8192)
  • Add automatic summarization for long contexts

Phase 4: Testing & Hardening (Week 4-5)

  • Tool use test suite passes
  • Multi-turn conversation handling
  • Fallback chain implementation
  • Network audit: No cloud AI calls
  • End-to-end integration test

Phase 5: Deployment (Week 5-6)

  • Gitea webhook automation
  • Telegram bot integration
  • Nostr bridge configuration
  • Backblaze B2 backup setup
  • Monitoring and alerting

Acceptance Criteria

ID Criteria Status
A1 Gemma 4 26B MoE serves locally at >20 tok/s
A2 Hermes Agent has NO local intelligence
A3 All queries route through Claw Code harness
A4 Tool use works via Gemma 4 function calling
A5 Ezra has independent fit layer config
A6 Gitea issues auto-route to Ezra
A7 Telegram bot responds via Gemma 4
A8 No cloud AI calls in packet log

Risk Mitigation

Risk Mitigation Owner
Gemma 4 too slow Use 4B variant for speed-critical tasks @ezra
Memory constraints Q4_K_M quantization, GPU offloading @ezra
Tool use failures Extensive prompt engineering @ezra
Context limits Auto-summarization, RAG @ezra

Resources Required

  • GPU with 24GB+ VRAM (for 26B MoE)
  • Local storage: ~60GB for all model variants
  • Network: Local-only (no cloud dependency)

Definition of Done

  • All acceptance criteria pass
  • Documentation complete in archons/ezra/
  • PR merged to main
  • Ezra operational as Fit Layer

House: Allegro
Priority: P0 — Foundation
Estimated Duration: 6 weeks
Dependencies: Gemma 4 server infrastructure

## Overview Implement the SEED Architecture for Ezra Archon as specified in ARCHITECTURE-SEED-EPIC.md. Ezra will become a pure dispatch layer (Fit Layer) with all intelligence flowing through the Claw Code harness to Gemma 4. ## References - Epic: ARCHITECTURE-SEED-EPIC.md - Gemma 4 Profile: `~/.hermes/profiles/gemma4/` - Architecture Stack: Hermes Agent → Claw Code Harness → Gemma 4 ## Scope for Ezra Archon ### Phase 1: Foundation (Week 1-2) - [ ] Deploy Gemma 4 server locally (26B MoE primary) - [ ] Configure llama.cpp or vLLM backend - [ ] Download and verify GGUF models (26B MoE, 31B, 4B) - [ ] Benchmark inference speed (>20 tok/s target) ### Phase 2: Hermes Agent Fit Layer (Week 2-3) - [ ] Strip local intelligence from Ezra Hermes Agent - [ ] Configure dispatch-only mode - [ ] Set up Claw Code harness connection - [ ] Implement error handling and retry logic - [ ] Audit: Verify NO local reasoning remains ### Phase 3: Claw Code Harness Integration (Week 3-4) - [ ] Configure harness routing for Ezra - [ ] Set up tool registry access - [ ] Implement Gemma 4 function calling - [ ] Configure context window management (8192) - [ ] Add automatic summarization for long contexts ### Phase 4: Testing & Hardening (Week 4-5) - [ ] Tool use test suite passes - [ ] Multi-turn conversation handling - [ ] Fallback chain implementation - [ ] Network audit: No cloud AI calls - [ ] End-to-end integration test ### Phase 5: Deployment (Week 5-6) - [ ] Gitea webhook automation - [ ] Telegram bot integration - [ ] Nostr bridge configuration - [ ] Backblaze B2 backup setup - [ ] Monitoring and alerting ## Acceptance Criteria | ID | Criteria | Status | |----|----------|--------| | A1 | Gemma 4 26B MoE serves locally at >20 tok/s | ⬜ | | A2 | Hermes Agent has NO local intelligence | ⬜ | | A3 | All queries route through Claw Code harness | ⬜ | | A4 | Tool use works via Gemma 4 function calling | ⬜ | | A5 | Ezra has independent fit layer config | ⬜ | | A6 | Gitea issues auto-route to Ezra | ⬜ | | A7 | Telegram bot responds via Gemma 4 | ⬜ | | A8 | No cloud AI calls in packet log | ⬜ | ## Risk Mitigation | Risk | Mitigation | Owner | |------|------------|-------| | Gemma 4 too slow | Use 4B variant for speed-critical tasks | @ezra | | Memory constraints | Q4_K_M quantization, GPU offloading | @ezra | | Tool use failures | Extensive prompt engineering | @ezra | | Context limits | Auto-summarization, RAG | @ezra | ## Resources Required - GPU with 24GB+ VRAM (for 26B MoE) - Local storage: ~60GB for all model variants - Network: Local-only (no cloud dependency) ## Definition of Done - [ ] All acceptance criteria pass - [ ] Documentation complete in `archons/ezra/` - [ ] PR merged to main - [ ] Ezra operational as Fit Layer --- **House:** Allegro **Priority:** P0 — Foundation **Estimated Duration:** 6 weeks **Dependencies:** Gemma 4 server infrastructure
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: timmy/harness#5