Latency Budgets for Robotics Pipelines: Stop Optimizing Kernels Before You Budget the System

When a robotics system misses timing, teams often react at the most visible hotspot: the model is too slow, the CUDA kernel needs work, the inference engine needs tuning. Sometimes that is true. Often it is not the first-order problem.

The bigger issue is usually that the pipeline never had a real latency budget in the first place.

If you do not know how much time each stage is allowed to spend, “optimize the model” is just a guess with good branding.

End-to-End Latency Is a Chain

A perception-to-action loop is not one number. It is a chain:

1. sensor capture
2. frame transport
3. preprocessing
4. inference
5. postprocessing
6. planner / controller
7. command dispatch
8. actuator response

If your deadline is 50 ms and one stage casually eats 22 ms because “it was still under the profiler radar,” the whole system is already in trouble.

Start With a Budget Table

Before tuning anything, write the system budget down:

Target control-loop budget: 50 ms

- sensor capture:       6 ms
- transport / copy:     4 ms
- preprocess:           5 ms
- inference:           12 ms
- postprocess:          5 ms
- planner / control:    8 ms
- dispatch / actuation: 6 ms
- slack / jitter:       4 ms

This table is not just documentation. It is an engineering contract.

If a stage cannot stay inside its budget, you now know whether to optimize it, parallelize it, decimate it, or redesign the data path entirely.

Measure Reality, Not Intent

Budgets are useful only if every stage emits timing data in the same frame of reference.

struct StageTiming {
  const char* name;
  uint64_t start_ns;
  uint64_t end_ns;
};

void record_stage(StageTiming& t, const char* name, uint64_t start_ns, uint64_t end_ns) {
  t.name = name;
  t.start_ns = start_ns;
  t.end_ns = end_ns;
}

For robotics pipelines, I want:

per-stage latency
queue wait time
dropped-frame count
p50 / p95 / p99 timing
cross-stage correlation for the same frame or control tick

Without queue wait and tail latency, the system can look fine in averages while still missing real deadlines.

The Common Hidden Costs

Teams underestimate the same categories over and over:

Memory Movement

A “fast” inference stage can still live inside a slow pipeline if the frame bounces between CPU and GPU memory three times before the controller sees it.

Queueing

A model that takes 10 ms on paper can feel like 25 ms in practice if upstream frames pile up and downstream consumers cannot keep up.

Serialization Boundaries

Cross-process hops, message copies, and oversized payloads quietly destroy budgets, especially in ROS-heavy stacks that were assembled for flexibility first.

Synchronization

One blocking call in the wrong thread can erase the gains from every micro-optimization below it.

Budget Violations Need a Policy

Real-time-ish robotics systems need to decide what happens when a stage exceeds its budget.

def on_budget_violation(frame: Frame, stage: str, latency_ms: float):
    if stage == "inference" and latency_ms > 12:
        return "drop_frame"
    if stage == "transport" and latency_ms > 4:
        return "switch_to_degraded_mode"
    if rolling_p99("controller") > 8:
        return "reduce_input_rate"
    return "continue"

Not every frame is worth saving. In many control systems, a stale result is worse than a dropped one.

That is why budget policy matters as much as the budget itself.

Optimize in the Right Order

Once the measurements are real, the optimization order becomes much clearer:

1. remove needless copies
2. reduce queue depth and backpressure
3. simplify synchronization boundaries
4. fuse cheap preprocessing stages
5. tune inference or kernels only after the system path is disciplined

This order is less glamorous than “we accelerated the model by 18%,” but it is usually more effective.

An end-to-end loop that drops 8 ms of copy and queue overhead beats a kernel-only win that ignores the rest of the pipeline.

Slack Is Not Waste

Engineers sometimes build budgets that sum exactly to the deadline. That is not a budget. That is wishful thinking.

You need slack for:

scheduler jitter
occasional I/O variance
thermal effects
interrupt noise
transient synchronization delays

If the deadline is 50 ms, a design that budgets 50 ms exactly is already over-committed.

What Good Looks Like

A healthy robotics latency discipline looks like this:

every stage has an owner
every stage has a numeric budget
the pipeline emits per-frame timing traces
p95 and p99 matter more than average
degraded behavior is defined before the first field test

At that point, performance work stops being folklore and starts being systems engineering.

The Practical Rule

Kernel optimization, TensorRT tuning, and model compression all matter. But they matter in the context of a pipeline that already knows what its deadline is and how that deadline is allocated.

If the system does not have a budget, you are not optimizing a pipeline. You are chasing hotspots one at a time and hoping the stopwatch eventually agrees.

For robotics, hope is too expensive. Budget the full loop first. Then optimize the stage that is actually breaking the contract.

Latency Budgets for Robotics Pipelines: Stop Optimizing Kernels Before You Budget the System

Latency Budgets for Robotics Pipelines: Stop Optimizing Kernels Before You Budget the System

End-to-End Latency Is a Chain

Start With a Budget Table

Measure Reality, Not Intent

The Common Hidden Costs

Memory Movement

Queueing

Serialization Boundaries

Synchronization

Budget Violations Need a Policy

Optimize in the Right Order

Slack Is Not Waste

What Good Looks Like

The Practical Rule

Edge Service Quality Needs Leading Indicators

Robotics Systems Need State Summaries for Humans

Edge Systems Should Measure How Often Humans Compensate