Message brokers and where Caspian could fit GitHub issue

vibecode
{"vibecode": {
    "doc": "message_brokers",
    "role": "research report and speculation. Stuart raised the idea of Caspian implementing message brokers for microservices. This document surveys the existing Linux message-broker ecosystem (the major brokers, the common patterns they share, the use cases they cover) and then speculates on where Caspian could realistically fit — as a client, as an embeddable broker, as a distributed extension of its in-process event system, or as a persistence-backed messaging layer via Mikobase.",
    "status": "speculative idea report — not a commitment to any of these paths",
    "raised_by": "Stuart",
    "audience": "Miko, Stuart, anyone thinking about Caspian's networking and concurrency story",
    "key_concepts": ["message_brokers_in_linux", "microservices_messaging_patterns",
        "caspian_as_client_vs_caspian_as_broker", "extending_the_in_process_event_system",
        "mikobase_as_message_store", "puck_protocol_as_messaging_primitive"]
}}

Stuart's proposal: Caspian should support message brokers for microservices. This report surveys the existing Linux message-broker ecosystem, identifies the patterns common across the major systems, and speculates on the angles from which Caspian could plausibly participate — what's already partly there, what would be additive, and what would be a fundamentally different scope of project.


Part 1: the major brokers GitHub issue

Six systems dominate Linux production deployments. Each occupies a different point in the design space.

RabbitMQ GitHub issue

The classic enterprise broker. Implements AMQP 0.9.1, with optional support for AMQP 1.0, MQTT, and STOMP. Publishers send to exchanges; exchanges route to queues via bindings with routing keys. Routing keys support wildcards (* for one segment, # for multiple), and exchanges come in flavors (direct, topic, fanout, headers) that decide how keys are matched.

Strengths: rich routing, mature management UI, broad ecosystem, well-understood. Queues can be persistent or transient, acknowledged or auto-acked, mirrored across nodes for HA.

Typical use: traditional task queues, request/reply patterns, decoupling synchronous APIs from background processing.

Apache Kafka GitHub issue

A distributed commit log, not a queue. Topics are split into partitions, each an append-only ordered log. Consumers track their own offsets — meaning the broker doesn't remember who has read what; the consumer does. Messages are retained for a configurable time/size regardless of consumption.

This inversion (log instead of queue) makes Kafka a great fit for event sourcing, stream processing, audit trails, and cross-system replication. Less so for simple "do a task" patterns where Kafka's consumer-group complexity is overkill.

Strengths: extreme throughput, durable replay, total ordering within a partition, ecosystem (Kafka Streams, Kafka Connect, Schema Registry, etc.). Costs: operational complexity, JVM-based runtime, requires ZooKeeper or KRaft for coordination.

Redis Pub/Sub and Streams GitHub issue

Redis ships two messaging primitives. Pub/Sub is fire-and-forget — subscribers receive messages only while they're connected, no persistence, no replay. Streams (added in Redis 5) are a Kafka-style append-only log with consumer groups, persistence, and replay.

The case for Redis-as-broker: Redis is already running in most stacks (as a cache), so adding messaging avoids deploying another service. Low operational overhead. Weaknesses: in-memory primary storage limits durability stories at high volumes; clustering is less mature than RabbitMQ or Kafka for messaging workloads.

NATS GitHub issue

A lightweight, cloud-native messaging system. Core NATS is at-most-once pub/sub: subjects with wildcard subscriptions, no persistence, very low latency. JetStream layers on persistence, streams, key-value, object storage — turning NATS into a full messaging+storage system without the operational weight of Kafka.

Strengths: tiny binary (under 20 MB), single-binary deploy, designed from the start for microservice meshes. Native support for request/reply, queue groups (load balancing), subject hierarchies.

Often the right choice for new microservice systems that want messaging without the Kafka tax.

MQTT brokers (Mosquitto, EMQX, HiveMQ) GitHub issue

MQTT is a lightweight pub/sub protocol designed for constrained devices and unreliable networks. Topics form a hierarchy (sensors/floor1/temp), wildcard subscriptions cover subtrees (sensors/+/temp), and three QoS levels (0/1/2) trade durability for latency.

MQTT dominates IoT and telemetry. The brokers (Mosquitto is the canonical OSS option; EMQX and HiveMQ for enterprise scale) are typically smaller and simpler than RabbitMQ.

ZeroMQ GitHub issue

Not a broker — a library. ZeroMQ provides socket-like primitives with pub/sub, request/reply, push/pull, and other patterns built in. There's no central broker; each program is its own message router.

Useful when you want messaging patterns without running a broker service. Distributed semantics are still up to the application (no built-in persistence, no routing service).

Honorable mentions GitHub issue


Part 2: the patterns GitHub issue

Across all of the above, the recurring patterns:

Topic-based pub/sub GitHub issue

A publisher sends to a named topic (or subject or exchange-key depending on the system); zero or more subscribers register interest in that topic and receive a copy. Wildcard subscriptions are common: orders.* matches orders.created, orders.shipped, etc.

Work queue (point-to-point with load balancing) GitHub issue

A queue with multiple consumers; each message is delivered to exactly one consumer. The broker load-balances. Used for background-job processing where N workers share a backlog.

Request / Reply GitHub issue

RPC over messaging. RPC — Remote Procedure Call — is the pattern of invoking a function or method on a remote machine using the same shape as a local call: pass arguments, wait for a return value, treat the network round-trip as an implementation detail. gRPC, JSON-RPC, XML-RPC, Java RMI, and ordinary HTTP request/response APIs are all RPC by this definition; Caspian's own Puck protocol is too. The defining property is "remote call that reads like a local call."

When RPC runs over a message broker instead of directly over HTTP, the mechanism is: the requester publishes a message that includes a reply-to address; the responder reads the request, does its work, and publishes the response to that address; the requester is subscribed to that address and consumes the reply. The caller and callee never share a direct connection — the broker carries the round-trip.

Correlation IDs. With more than one request in flight, the requester needs to match responses to the requests that asked for them. Every request carries a unique correlation ID; the responder echoes it in the reply. The requester maintains a small map of correlation_id → pending_callback (or → promise, or → thread) and dispatches incoming replies through it.

Reply-to address shapes. Three common patterns:

Timeouts. A request that gets no reply within some window has to expire — either because the responder is down, or the message was lost, or the work is taking longer than the requester can wait. The timeout is a client-side concern; the broker just forwards messages.

Load balancing across many responders. If multiple consumers subscribe to the request topic via a queue group (NATS), competing consumers (RabbitMQ), or consumer group (Kafka), the broker delivers each request to exactly one of them. The reply mechanism doesn't change — whichever responder handled the request publishes back to the same reply-to address.

Broker-specific support.

Why use it instead of HTTP? Three reasons stand out:

The cost: latency (extra hop through the broker), failure modes that are harder to reason about than HTTP's, and complexity if the system is small enough that direct HTTP would have just worked.

When it's the right pattern. Service-to-service calls where the responder is dynamic (autoscaled, may have N instances, may go down briefly), where the requester shouldn't be tightly coupled to the responder's address, and where the broker is already part of the system for other reasons. When none of those apply, plain HTTP is usually simpler.

Fan-out GitHub issue

One publisher, many independent subscribers — each subscriber sees every message. Pub/sub's degenerate case when there's no routing.

Fan-in / aggregator GitHub issue

Many publishers, one consumer that combines results.

Streams / event log GitHub issue

A categorically different model from queues. Where a queue is "broker holds messages until consumed, then deletes them," a log is "broker keeps every message forever (or until retention expires), and each consumer reads at its own pace from its own position." The shift looks like a small detail but reshapes what's possible.

Core properties:

Specific implementations.

Patterns the log model enables.

Concepts that don't appear in queue systems.

Cost trade-offs vs queues.

Concern Queue Log
Storage Low (deleted after consumed) High (retain everything for window)
Replay Not possible Native
Independent multi-consumer Hard (fan-out exchanges, etc.) Easy (each at own offset)
"Process exactly once and forget" Native Application-side dedup needed
Operational complexity Lower Higher (especially Kafka)
Best fit Transient task processing Event sourcing, streams, audit
Worst fit Audit trail (no history) Simple "do this work" tasks (overkill)

The choice between queue-shaped and log-shaped messaging is one of the larger architectural decisions in a microservice system. Reach for a log when history matters or when multiple downstream views of the same event stream are needed; reach for a queue when the work is transient and the broker as a system of record adds no value.

Dead letter queue GitHub issue

Messages that fail repeatedly get routed to a parking queue for inspection. Standard reliability pattern.

Delayed / scheduled delivery GitHub issue

A message is published with a "don't deliver until time T" instruction. Used for scheduled retries, delayed-effect workflows.

Persistent vs ephemeral GitHub issue

A message either survives a broker restart (persistent) or doesn't (ephemeral). The choice is per-topic, sometimes per-message; brokers usually support both modes.

Ephemeral. Messages live in broker memory. Cheapest to send, lowest latency, no disk I/O. If the broker restarts or a subscriber is offline, messages in flight are lost. Right choice when:

Persistent. Messages are written to disk before the broker acknowledges the producer — sends back a small confirmation message (an "ack," in shorthand) saying "I received it and stored it durably." Until the producer has this ack, it doesn't know whether the message landed. Persistence costs throughput (disk I/O is slower than memory), disk space (retention has to be managed), and added complexity in the broker's storage layer. Survives broker restart; can be re-delivered to consumers that were offline. Right choice when:

Acknowledgment timing within persistence. Even within "persistent," there's a spectrum of how durable an ack actually is:

Persistent doesn't mean "saved forever" — it means "survives the broker restart." Messages still age out per retention policy.

Handling client crashes GitHub issue

A persistent broker can survive its own restart, but most production failures are actually client failures: a producer dies mid-publish, a consumer dies mid-processing. Persistence is most of the answer, but the rest depends on protocol semantics.

Producer crashes between sending the message and receiving the ack. The producer doesn't know whether the message landed. Two strategies:

Consumer crashes after receiving a message but before processing it. This is the common case and is handled by explicit acknowledgment:

This guarantees at-least-once delivery to the consumer. The consumer is responsible for either being idempotent (processing the same message twice yields the same result) or for using transactional semantics that tie the side-effect to the ack (process-and-ack as one atomic operation).

Consumer crashes after processing but before acking. Indistinguishable from "crashed mid-processing" from the broker's perspective. Message gets re-delivered. Consumer processes again. Same answer: idempotency or transactional process-and-ack.

Visibility timeout. A reservation that bounds "how long can this consumer hold a message before I assume it's dead?" Too short and a slow consumer gets duplicate deliveries; too long and a crashed consumer's messages take forever to retry. Tunable; some brokers let the consumer extend the lease while it's still working.

Dead letter queues. When a message has been re-delivered N times and still keeps failing, the broker stops retrying and forwards the message to a parking queue for human inspection. Prevents infinite retry loops on permanently-bad messages.

Persistent subscriptions for absent clients. For pub/sub (broadcast) shapes, a "durable subscription" means: while a subscriber is disconnected, the broker buffers messages addressed to it; when the subscriber reconnects, the queued messages are delivered. Requires a stable client identity (MQTT's Client ID, AMQP's persistent queue name, Kafka's consumer group + member ID) so the broker knows which subscription belongs to a reconnecting client.

Stream / log consumers handle crashes differently. Log-based brokers (Kafka, NATS JetStream, Redis Streams) don't track in-flight state per consumer; they track committed offsets. When a consumer crashes, it resumes from its last committed offset on restart and re-reads any messages that were processed but not yet committed. Re-processing is still possible, so idempotency still matters — but there's no broker-side "re-delivery" mechanism, just consumer-side re-reading.

Producer-side message ordering across crashes. If a producer crashes and restarts, in-flight messages it had sent before the crash might land out of order relative to new messages it sends after restart. Brokers that care about producer-side ordering (Kafka with max.in.flight.requests.per.connection=1 plus idempotent producer, for example) require additional configuration to preserve order across producer restarts.

The summary: persistence on the broker side is necessary but not sufficient. The full crash-handling story requires coordinated discipline across both ends — durable storage, ack protocols, retry semantics, dedup or idempotency, dead-letter handling, stable client identity. Each broker offers a set of knobs; the right combination depends on what failure modes the application tolerates.

Delivery guarantees GitHub issue

Three levels, with sharply different cost. The choice between them is fundamentally about acknowledgments — the short confirmation messages introduced in Persistent vs ephemeral. Two kinds of acks matter in a broker workflow, on either end of the broker:

With those defined, the three delivery guarantees are about which acks are required and how the system recovers when they fail to arrive:

Backpressure / flow control GitHub issue

What happens when a publisher is faster than its consumers? Drop messages? Block the publisher? Buffer to disk? Each broker chooses a default and offers knobs.


Part 3: typical microservice use cases GitHub issue

The places message brokers actually earn their keep in microservice systems:


Part 4: where Caspian could fit GitHub issue

Several distinct angles, in roughly increasing scope of commitment.

Angle A: Caspian as a client of existing brokers GitHub issue

The most realistic V1.x move. Caspian programs participate in larger messaging systems by speaking the wire protocols of existing brokers. Each broker gets a Caspian package:

A higher-level abstraction could provide a uniform interface where the underlying broker is configurable:

caspian
$client = %puck['https://puck.uno/broker/client'].new(
    backend: 'rabbitmq',
    host:    'localhost'
)

$client.subscribe('orders.*') do($msg)
    %stdout.puts 'received: ' + $msg.body
end

$client.publish('orders.created', {order_id: 42})

Strengths. Realistic. Useful immediately. Doesn't require Caspian to invent anything new — just protocol implementations. Lets Caspian programs participate in established microservice meshes.

Costs. Each protocol is a real implementation lift (AMQP especially is complex). The "uniform abstraction" tax is real — each broker has subtly different semantics that don't fit a one-size interface.

Angle B: an embeddable Caspian-native broker GitHub issue

Build a small, in-process or sidecar broker for Caspian programs that don't want the operational weight of Kafka/RabbitMQ/NATS.

The model would parallel SQLite: small, embeddable, no separate service required, fine for many use cases that don't need cluster-scale infrastructure. Examples of fit:

The implementation could lean on existing Caspian pieces: TCP/UDS for transport, Mikobase for persistence, the role system for tenant separation, the existing event-system primitives for in-process delivery.

Strengths. A real gap in the ecosystem. The choice between "deploy a broker (operational weight)" and "do everything synchronously (no flexibility)" leaves room for an embedded option. Aligns with Caspian's design philosophy (small, self-contained, embeddable).

Costs. "Build a broker" is a substantial project. The persistence story is hard (Mikobase isn't optimized for append-only-log throughput). Cluster modes, replication, HA are not free.

Angle C: extending the in-process event system across processes GitHub issue

Caspian already has an event system%self.object.broadcast, listen_to, on_broadcast. Currently it's strictly in-process. The same API could be extended to cross process and host boundaries.

Specifically: a Caspian object's broadcasts could be made reachable to subscribers in other processes (other Caspian instances, maybe even other languages via a Puck-protocol surface). Local subscribers continue to work as today; remote subscribers participate through some transport (probably the broker from Angle B, or via existing brokers from Angle A).

caspian
# Local subscription — unchanged
$logger.object.listen_to $server, 'request_received', 'log'

# Remote subscription — same API, the other side just happens to be in another process
$remote_server = %puck['https://orders.example.com/']
$logger.object.listen_to $remote_server, 'request_received', 'log'

This is the most Caspian-native of the options. The developer-facing API doesn't change; the engine handles whether the subscription is local, cross-process on the same host, or cross-network.

Strengths. Maximum integration. The event system is already there; making it network-reachable just expands its territory. Developers learn one set of primitives.

Costs. Big design questions. How does cross-host broadcast handle ordering, delivery guarantees, authentication, failures? How does discovery work — how does $logger find $remote_server to register on? Some of these are answered by Puck (which is itself a remote-object protocol), but the synthesis isn't trivial.

Angle D: Mikobase-backed message store GitHub issue

Mikobase records are JSON objects with structured queries. Messages are JSON objects with structured filters. The fit is natural: a Mikobase database can serve as a message store, with queries replacing consumer subscriptions.

This wouldn't be a separate broker so much as a recognized pattern — "use Mikobase as your event log." Records have timestamps; queries can be temporal (WHERE created_at > $cursor). Consumer groups become "remember the cursor where you left off."

Strengths. Reuses Mikobase without adding new infrastructure. Persistence is automatic. Replay is just "query from earlier cursor."

Costs. Mikobase isn't optimized for append-only-log throughput; a high-volume messaging workload would stress patterns Mikobase wasn't designed for. Better suited for low-to-medium volume event sourcing than high-throughput streaming.

Angle E: Puck protocol as a messaging primitive GitHub issue

Puck is a remote-object protocol — instantiate a remote class, call methods on it. With explicit support for event-style payloads, it could approximate a messaging system without being one:

This is "build a broker on top of Puck" rather than "Caspian has a broker." Light architectural commitment; lets Puck be the foundation for application-level messaging when the developer wants it.

Strengths. Doesn't add new protocols. Stays in the Puck design space. Lets applications choose how much messaging structure they want.

Costs. Puts the persistence and reliability story on the application, not the platform. Doesn't replace a real broker for serious workloads.


Recommendations / open questions GitHub issue

If Caspian pursues messaging, the realistic path is probably:

  1. Angle A first — protocol clients for the dominant brokers (NATS and Redis are probably the cheapest first wins; Kafka next for the throughput-streaming case; RabbitMQ for enterprise reach; MQTT for IoT). This makes Caspian a useful participant in existing infrastructure without committing to building a broker.
  2. Angle C as a longer-term ambition — extending the in-process event system across processes feels like the most-Caspian answer. It's also the riskiest to design well. Worth specifying carefully before implementing.
  3. Angle B (embedded broker) is a possible adjunct — if the cross-process event-system work needs a default transport that doesn't require deploying RabbitMQ or NATS, an embedded broker could fill that gap. But it's hard to motivate Angle B on its own — it's an answer to a problem that mostly exists if Angle C exists.
  4. Angles D and E are reuse patterns, not platform commitments — worth documenting as ways to use Mikobase or Puck for messaging-shaped problems, but they don't justify a dedicated messaging-system project.

Open questions for Stuart:

The answers shape which angles are worth pursuing first. The framing of "Caspian should implement message brokers" is broad enough to mean anywhere from "add a NATS client library" to "build a distributed Kafka alternative" — narrowing that down is the first design conversation.

© 2026 Puck.uno