System Architecture
mBedLM-core is a contract-driven platform built around layered runtime boundaries: model serving, memory and policy, orchestration, and domain adapters.
Architecture Overview
The platform separates concerns so each layer can evolve independently while preserving deterministic request behavior. Core requests flow through validated ingress, intent/routing, policy-governed inference, normalized response contracts, and observability hooks.
Runtime Layers
- Ingress and API Surfaces: entrypoints in web/API services accept requests and normalize runtime context.
- Orchestration and Routing: intent detection, route policy, fallback, and escalation decisions.
- Memory and Inference Policy: retrieval context, generation controls, caching/governors, and contract shaping.
- Tooling and Skills: structured action calls and specialist behavior extensions.
- Domain Adapters: forecasting, RL, OCR, and business-specific runtime modules.
High-Level Topology
Client/Web App -> API ingress and auth -> Intent and route policy -> Tool/skill mediation (optional) -> Memory + inference policy -> Model serving backends -> Response normalization (content_json.v1) -> Telemetry and persistence
Request Lifecycle (Simplified)
- Validate request envelope, tenant/session context, and safety constraints.
- Classify intent and select route strategy (direct, tool-assisted, or orchestrated).
- Hydrate contextual memory and optional retrieval/tool outputs.
- Execute inference with runtime policy controls (timeouts, fallbacks, gating).
- Normalize to response contract and emit observability metadata.
Reference Startup Sequence
Validation-first startup reduces drift between environments and prevents partial-route instability.
1. Validate config and secrets 2. Confirm model artifacts and endpoint reachability 3. Start serving backends (general and specialist) 4. Start memory and tool substrate 5. Start orchestration services 6. Start web/API product surfaces 7. Enable domain modules and optional enhancers
Governance and Safety Boundaries
- Canonical request paths prevent accidental traffic to experimental entrypoints.
- Trust tiers (production, canary, experimental) constrain auto-routing behavior.
- Response contracts keep downstream integrations stable across model changes.
- Guardrails enforce rewrite/salvage behavior when output quality degrades.
Extension Points
- Add new domain adapters without changing core contract semantics.
- Attach skill packs to orchestration for specialist workflows.
- Enable optional runtime systems (OCR, RL, prediction) after core health gates pass.
Curated Source Synthesis
This page is built from repository architecture documents and condensed into operator-safe guidance. Instead of raw document links, the key architecture outcomes are synthesized here.
- Runtime criticality is tiered so core availability remains isolated from optional modules.
- Startup is dependency-ordered: serving and memory substrate are ready before orchestration admission.
- Canonical request paths and trust tiers reduce accidental promotion of experimental routes.
- Response contract normalization protects downstream systems from provider/model churn.
- Ownership boundaries are mapped to platform, memory, tooling, and domain teams for clear accountability.
What Is Intentionally Not Included
- Environment-specific secrets, credentials, and private endpoint values.
- Detailed cutover/rollout playbooks and incident runbooks.
- Internal migration task sequencing and unreleased feature timelines.