~/blog — 42 entries

Notes on systems & AI

Embedded · edge AI · robotics · networking · multi-agent systems

Agent Systems Need Evidence Handoffs, Not Just Task Handoffs

Multi-step agent workflows often fail because intermediate stages pass conclusions forward without passing enough evidence. Stronger agent systems make evidence handoff explicit so downstream steps can verify, challenge, or safely constrain what came before.

Jul 7, 20263 min read

Read latest post

Archive signals42 posts

Agent Systems Need Evidence Handoffs, Not Just Task Handoffs

AIAgentsProduction SystemsReliability

Jul 6, 20263 min read

Edge Reliability Comes From Fewer Implicit Dependencies

A lot of edge incidents come from dependencies nobody modeled clearly enough: boot order assumptions, sensor timing expectations, config fallthrough, and service relationships that only exist in tribal memory. Reliability improves when those dependencies become explicit or disappear.

Edge ComputingReliabilitySystemsArchitecture

Jul 5, 20262 min read

Robotics Debugging Improves When State Changes Are Auditable

A lot of robotics debugging pain comes from not knowing exactly when the system changed modes, what triggered it, and what it believed at the time. Auditable state transitions make incident review much more mechanical and much less interpretive.

RoboticsDebuggingObservabilityReliability

Jul 2, 20263 min read

Agent Routing Needs Stop Conditions, Not Just Better Escalation Logic

Many agent systems spend too much effort deciding when to escalate and too little effort deciding when to stop. A routing system becomes much more reliable when it has clear termination rules for low-confidence or low-yield paths.

AIAgentsRoutingProduction Systems

Jul 2, 20263 min read

Edge Observability Should Start With Questions, Not Dashboards

A lot of observability stacks get built by collecting whatever is easy to visualize. Edge systems need a stricter approach: start from the questions incidents will force you to answer, then capture only the signals that make those answers possible.

Edge ComputingObservabilityReliabilityOperations

Jul 2, 20263 min read

Robotics Incident Reviews Should Produce Better State Models, Not Just Action Items

A robotics incident review should do more than generate a list of fixes. The best reviews improve the system’s state model so future failures become easier to detect, explain, and recover from.

RoboticsReliabilityIncident ResponseOperations

Jul 1, 20264 min read

Edge Deployments Need Clear Rollback Authority, Not Just Rollback Code

A lot of teams build rollback mechanics without deciding who or what is actually allowed to trigger them. For edge systems, rollback authority needs to be explicit across device logic, operators, and rollout policy, or the recovery path stays politically fragile.

Edge ComputingReliabilityRelease EngineeringEmbedded Linux

Jun 30, 20263 min read

Agent Evals Should Track Escalation, Not Just Accuracy

An agent can look accurate in offline evaluation while still being operationally weak because it escalates too often, too late, or for the wrong reasons. Good evals should measure escalation behavior as a first-class signal.

AIAgentsEvaluationProduction Systems

Jun 30, 20263 min read

Robotics Safety Modes Should Be Explicit, Observable, and Boring

When a robotics system enters a degraded or safety state, everyone involved should be able to tell immediately. Safety modes should not be implicit side effects buried in logs. They should be explicit operational states with predictable behavior.

RoboticsSafetyReliabilitySystems

Jun 30, 20263 min read

Robotics Test Rigs Should Mirror Recovery Paths, Not Just Happy-Path Behavior

A robotics test rig that only proves the nominal path is weaker than it looks. The stronger standard is a rig that also exercises restart behavior, degraded states, operator holds, and recovery transitions before those conditions show up in the field.

RoboticsTestingReliabilitySystems

Jun 29, 20263 min read

Eval-Driven Agent Rollouts: Ship New Agent Behaviors Like You Ship Infrastructure

Agent systems change behavior easily, which is exactly why rollout discipline matters. Prompt updates, tool policies, and routing changes should be treated like infrastructure changes with eval gates, canaries, and rollback paths.

AIAgentsEvaluationProduction Systems

Jun 28, 20263 min read

Edge Systems Need Budgeted Complexity, Not Just Budgeted Latency

Teams often budget CPU, memory, and latency while ignoring a different constraint: complexity. On edge systems, too much moving logic, too many special cases, and too many implicit assumptions can break reliability long before raw resource usage does.

Edge ComputingSystemsArchitectureReliability

Jun 27, 20264 min read

AI Agents Need Operational Boundaries, Not Just Better Prompts

A lot of agent systems fail for the same reason distributed systems fail: they have unclear boundaries, weak supervision, and no disciplined fallback path. Better prompts help, but operational boundaries matter more.

AIAgentsProduction SystemsReliability

Jun 25, 20264 min read

Edge Inference Bottlenecks Are Usually Around the Model, Not Inside It

Teams often chase model-level speedups while the real latency damage lives in memory movement, queueing, preprocessing, and synchronization. On edge systems, the bottleneck is often the pipeline around inference, not the kernel itself.

Edge AIPerformanceCUDATensorRT

Jun 24, 20263 min read

Field Debugging Needs Portable Evidence, Not Just Better Dashboards

Dashboards are useful when connectivity is good and incidents are reproducible. Field systems rarely get both. For edge and robotics work, the stronger pattern is portable evidence bundles that preserve enough context to debug failures away from the device.

SystemsReliabilityObservabilityEdge Computing

Jun 22, 20265 min read

Edge AI Release Candidate Discipline: What to Prove Before a Field Rollout

A model that looks ready in the lab is not automatically ready for the field. For edge AI, a release candidate needs proof across rollback behavior, hardware variance, latency budgets, and evidence capture before it deserves deployment.

Edge AIRelease EngineeringEmbedded LinuxReliability

Jun 20, 20265 min read

Robotics Integration Checklists: The Boring Discipline That Prevents Expensive Failures

Many robotics failures are not novel research problems. They are integration mistakes repeated under deadline pressure. A good checklist does not replace engineering judgment, but it stops teams from rediscovering the same avoidable failures during deployment.

RoboticsSystemsIntegrationReliability

Jun 18, 20265 min read

RAG Citation Quality Loops: Measure Whether the Evidence Actually Supports the Answer

A lot of RAG systems technically cite sources while still producing weakly supported answers. The next level of quality is not just attaching references, but measuring whether the cited evidence actually justifies each important claim.

RAGAIEvaluationGen AI

Jun 15, 20265 min read

Failure-First Edge AI Ops: Design the Recovery Path Before the Model Path

Edge AI systems usually get designed around model accuracy first and failure handling second. In production, that order is backwards. The system needs to prove how it degrades, recovers, and explains itself before anyone should trust the intelligence layer.

Edge AIReliabilityOperationsEmbedded Linux

Jun 14, 20265 min read

Robotics Observability Without the Cloud: What to Capture on the Device

Robotics systems often operate in places where cloud-native observability assumptions break down. If you want useful debugging in the field, the device itself has to preserve the traces, summaries, and replayable context that matter before anyone asks for them.

RoboticsObservabilityROS2Edge AI

Jun 12, 20265 min read

Multi-Model Routing for AI Systems: Use the Cheapest Model That Can Defend the Answer

A lot of AI products overspend because every task gets sent to the strongest model by default. In production, the better approach is routing: use the cheapest model that can complete the subtask correctly, and escalate only when the evidence or difficulty requires it.

AIGen AIRoutingProduction Systems

Jun 9, 20266 min read

Agentic RAG in Production: The Eval Loop Matters More Than the Demo

Most agentic RAG demos fail the moment they hit production because the system can retrieve, plan, and write, but nobody can prove when it was right, wrong, or expensive. The missing piece is an eval loop wired into the architecture from day one.

RAGAIMulti-Agent AIEvaluation

Jun 7, 20265 min read

DPU Control Plane Offload: Where Smart NICs Actually Start Paying Off

DPUs are often pitched like magic accelerators, but the real value shows up when you move noisy control-plane work and security enforcement off the host without complicating the data path. The payoff is isolation, determinism, and better operational boundaries.

DPUNetworkingDOCASystems

Jun 5, 20265 min read

Latency Budgets for Robotics Pipelines: Stop Optimizing Kernels Before You Budget the System

Robotics teams often jump straight to model or kernel optimization, but most missed deadlines come from unbudgeted end-to-end latency across sensing, transport, inference, control, and actuation. The first job is building a budget the whole pipeline can obey.

RoboticsEdge AICUDASystems

May 27, 20266 min read

Safe OTA Updates for Offline Edge Linux: Signing, Staging, Rollback

Field-deployed edge devices often have no cloud connectivity — but they still need updates that can't brick them. Here's how to build OTA with signed bundles, atomic staging, health-check promotion, and automatic rollback, fully offline.

Embedded LinuxOTACryptographyEdge Computing

May 26, 20266 min read

Why Does This Line Exist? Building a Temporal Context Graph for Code

git blame tells you who. It doesn't tell you why. Building CodebaseOS — a VS Code extension that reconstructs the full origin story of any line across commits, PRs, issues, and decisions in 200ms.

Developer ToolsKnowledge GraphVS CodeHydraDB

May 25, 20265 min read

Perspective-Routed RAG: When One Corpus Isn't Enough

Standard RAG collapses all evidence into one corpus, averaging away disagreement. Building Embodipedia — an AI-maintained encyclopedia that keeps optimistic, skeptical, and neutral claims in separate lanes so agents can debate.

RAGAIMulti-Agent AIGen AI

May 17, 20267 min read

Forward Deployed Engineering: What the Role Actually Is

Forward Deployed Engineering isn't sales engineering with extra steps. It's the discipline of solving customer problems in the field by writing production code under time pressure. Here's what makes it different — and what makes someone good at it.

Forward DeployedEngineeringCareerCustomer-Facing

May 16, 20267 min read

Production RAG Patterns: Beyond the Tutorial

Most RAG tutorials show toy examples that fall apart in production. Here's what actually works at scale — chunk strategy, hybrid retrieval, reranking, and the operational realities nobody mentions.

RAGAIVector DBGen AI

May 15, 20268 min read

Real-Time Linux for Robotics: PREEMPT_RT in Practice

Standard Linux can have 10ms latency spikes. Real-time robotics needs sub-millisecond. PREEMPT_RT bridges that gap — here's how to actually deploy it, what changes in your code, and the gotchas nobody warns you about.

LinuxPREEMPT_RTReal-TimeRobotics

May 14, 20268 min read

Why FieldFix Has Zero Cloud Dependencies: Designing AI for the Edge

Building an AI repair assistant that works in agricultural fields with no internet. The architectural choices for offline LLMs, deterministic safety, and why we picked Gemma 3 4B.

AIEdge ComputingOllamaGemma

May 13, 20268 min read

Inside Watchpoint: Architecture of a Robotics Incident Intelligence Platform

How Watchpoint captures robotic failures end-to-end — Go edge agent, replay bundles, rules-based RCA, and a correlation timeline. The architecture decisions that made it work in production.

RoboticsObservabilityGoROS2

May 10, 20267 min read

Programming NVIDIA BlueField DPUs with DOCA

How to build data-plane applications on NVIDIA BlueField DPUs using the DOCA SDK — packet processing, flow steering, and running AI inference inline with network traffic.

DOCADPUBlueFieldNetworking

May 8, 20267 min read

MuJoCo Sim-to-Real: Closing the Gap for Humanoid Robots

How we used MuJoCo simulation, teleoperation data, and NVIDIA GR00T to build locomotion policies for the Unitree G1 humanoid — and what the sim-to-real gap actually looks like in practice.

RoboticsMuJoCoSim-to-RealPhysical AI

May 5, 20266 min read

TensorRT in Production: The Complete Optimization Workflow

End-to-end TensorRT optimization — from PyTorch model to INT8 engine running at 60fps on Jetson Orin. Covers ONNX export, calibration, engine building, profiling, and common pitfalls.

TensorRTNVIDIAInferenceEmbedded AI

May 1, 20266 min read

Building Self-Improving Multi-Agent AI Systems

How we built HydraSwarm — a 7-agent system that gets measurably better at software engineering tasks with each run, using persistent vector memory and structured agent roles.

Multi-Agent AILLMSystem DesignHydraSwarm

Apr 28, 20266 min read

ROS2 for Physical AI: Building Real-Time Robot Pipelines

How ROS2's DDS middleware, lifecycle nodes, and executor model enable production robotics — lessons from building humanoid teleoperation and multi-sensor fusion pipelines.

ROS2RoboticsPhysical AISensor Fusion

Apr 10, 20264 min read

CUDA Kernel Optimization: What Actually Moves the Needle on Jetson

The profiling-driven workflow I use to squeeze real inference throughput out of Jetson Orin — memory coalescing, occupancy tuning, and why INT8 isn't always the answer.

CUDANVIDIA JetsonEdge AITensorRT

Mar 28, 20266 min read

Embedded Linux from Scratch: BSP, Kernel Config, and Device Drivers

What nobody tells you about bringing up embedded Linux on custom hardware — from BSP bringup and kernel config to writing your first character driver and surviving device tree.

Embedded LinuxLinux KernelDevice DriversBSP

Mar 5, 20263 min read

Winning Two Awards at the Intelligence at the Frontier Hackathon

How our team won Best Overall Use of DeepLake with HydraSwarm and the Physical AI & Robotics track with a Unitree G1 humanoid pipeline — all in 36 hours.

HackathonPhysical AIRoboticsMulti-Agent AI

Feb 15, 20263 min read

Deploying Edge AI Inference on Jetson Orin for Industrial Logistics

How we deployed CUDA and TensorRT inference pipelines on Jetson Orin at Ciena to detect conveyor anomalies in real time, cutting unplanned downtime by 22%.

Edge AITensorRTDeepStreamNVIDIA Jetson

Jan 20, 20263 min read

Lessons from Packet Processing and Data-Plane Engineering

What I've learned about high-performance packet processing across two roles — from DMA optimization at Ciena to P4-programmable data planes at Cisco.

NetworkingEmbedded LinuxP4DPDK