MQTT is widely used in IoT and messaging systems because of its lightweight protocol and features such as persistent sessions (so that clients can disconnect and later receive messages published during their offline period), QoS (Quality of Service), etc. When you have many devices or clients, persisting session state (subscriptions, unacknowledged messages, last message, etc.) can become a performance bottleneck.
TBMQ is an MQTT broker designed by ThingsBoard that in version 1.x used PostgreSQL to store persistent session data for “DEVICE” clients — clients configured to use persistence. However, as scale increased, throughput requirements, latency concerns, and the limits of relational DB for high write/read concurrency became bottlenecks. ThingsBoard migrated the persistence of persistent DEVICE client sessions from PostgreSQL to Redis, combined with Lua scripting for atomic operations, and switching from Jedis (a synchronous Redis client) to Lettuce (asynchronous + reactive) Redis client. These changes allow TBMQ to scale to very high message-per-second rates while maintaining low latency.
In this article we will cover:
- Why and when to replace PostgreSQL for session persistence with Redis
- Key design constraints (Redis Cluster, slot constraints, hashing, atomicity)
- How Lua scripts are used to guarantee atomic operations and improve efficiency
- How Lettuce async client is used for higher parallelism and throughput
- Example code snippets
- Trade-offs and when this is not appropriate
- Conclusion
Architectural Background: TBMQ Persistent Sessions, PostgreSQL vs Redis
What TBMQ does for persistent sessions
- TBMQ classifies MQTT clients as DEVICE or APPLICATION clients. DEVICE clients typically publish frequently and subscribe to few topics; APPLICATION clients may have many subscriptions and typically expect higher message rates. For DEVICE clients, persistent sessions may be used: when they reconnect, they should receive messages published while offline.
- Prior to version 2.0, these persistent sessions for DEVICE clients were stored in PostgreSQL. This included storing which messages were delivered, which were pending, etc.
Why PostgreSQL became a bottleneck
- PostgreSQL is great for relational data, consistency, complex queries, and such. But as the number of persistent clients grows and message rates increase, the write throughput (insert/update operations) and the read operations (to fetch pending messages when a client reconnects) place a heavy load.
- Relational databases often scale vertically (bigger machines, SSDs, more RAM), which has limits. Also, disk‐based storage introduces latency; frequent writes/read‐writes can cause I/O contention.
- In TBMQ’s testing, PostgreSQL-backed DEVICE persistence started failing to scale above ~30,000 messages per second in P2P message patterns.
Why Redis
- Redis is an in‐memory store with optional durability (via snapshots or AOF) and supports horizontal scaling via Redis Cluster. Low read/write latency (microseconds to low milliseconds).
- Redis also supports data structures (lists, sorted sets, hashes) that simplify storing queues of pending messages, ordered unacknowledged messages, etc.
- Lua scripting allows atomic multi-step operations on multiple keys, so one can avoid race conditions.
- Redis Cluster allows sharding; but care must be taken to ensure keys used in atomic operations (or multiple‐key Lua scripts) fall on the same shard (“hash slot”). TBMQ uses hash tags for this.
Design Constraints and Patterns in TBMQ’s Redis-based Session Persistence
Before we jump into the code, what are the main design constraints/patterns TBMQ had to follow when implementing this migration?
Redis Cluster & Key Slotting
- Redis Cluster splits the key space into 16,384 hash slots. Every key is mapped to one of the slots. Keys in a multi-key operation (or multi-key script) must map to the same slot or else the operation will be refused or be inefficient.
- To ensure multiple keys for a single client (e.g. pending messages, message‐queue list, metadata) are in same slot, TBMQ uses hash tags: putting the client identifier inside
{}
in the key name, e.g.persist:{clientId}:pending
or similar. This ensures all keys for that client map to the same slot, allowing multi-key atomic operations in Lua.
Atomicity via Lua scripts
- Many sequence of operations are needed per message: e.g. append pending message, increment sequence number, maybe delete expired entries, trim queue, mark delivered, etc. These operations must appear atomic so you don’t lose messages or double deliver. Lua scripts are executed on the Redis server in isolation: no other commands interleave.
- TBMQ uses Lua scripts per client session. Because of slot constraints, each script that works with multiple keys for same client must use keys with same hash tag.
Async Client and Batching
- Using a synchronous Redis client (e.g. Jedis) means each Redis call waits for its response; with high concurrency that becomes a bottleneck. TBMQ first migrated to Redis + Jedis + Lua, but throughput was only modestly improved (from ~30k msg/s to ~40k msg/s) because Jedis is synchronous.
- By migrating to Lettuce, which supports asynchronous (non-blocking) and reactive paradigms, TBMQ could send multiple commands in parallel, have better pipelining, command batching, and more overlap, increasing throughput further (to ~60k msg/s in TBMQ’s evaluation).
Expiration, cleanup, ordering
- Messages pending while a client is offline must be stored in a data structure that preserves order, often sorted sets or lists.
- Also, there must be expiration logic: if messages expire (e.g. TTL) or the persistent client loses its persistence, need to clean up structures to free memory. TBMQ uses Lua scripts or TTL features of Redis to do cleanup.
Example Code & Patterns
Below are simplified examples that capture the core patterns of TBMQ’s implementation. These are illustrative rather than exact TBMQ code, but reflect how one might implement session persistence, atomic operations, and async client usage.
Example: Lua script for adding a pending message for a client
Suppose we want to do this in one atomic operation:
- Add a message to the pending queue of client
- Increment a message sequence or counter
- Possibly expire old messages or limit queue length
Here’s a Lua script (add_pending_message.lua
):
local pendingKey = KEYS[1]-- KEYS:
-- KEYS[1] = "persist:{clientId}:pending" -- list or sorted set of pending messages
-- KEYS[2] = "persist:{clientId}:seq" -- seq counter for this client
-- ARGV:
-- ARGV[1] = messagePayload (string or serialized)
-- ARGV[2] = messageTimestamp or score (for sorted set), or 0 if list
-- ARGV[3] = maxQueueSize (integer)
-- ARGV[4] = messageTTLSeconds
local seqKey = KEYS[2]
local payload = ARGV[1]
local score = tonumber(ARGV[2])
local maxSize = tonumber(ARGV[3])
local ttl = tonumber(ARGV[4])
— increment sequence
local seq = redis.call(“INCR”, seqKey)
— add to pending
— you can choose data structure; sorted set by score gives ordering by timestamp
redis.call(“ZADD”, pendingKey, score, payload)
— Trim older entries if size exceeds
local currentSize = redis.call(“ZCARD”, pendingKey)
if currentSize > maxSize then
— Remove lowest (or highest) depending; here remove earliest
local numToRemove = currentSize – maxSize
redis.call(“ZREMRANGEBYRANK”, pendingKey, 0, numToRemove – 1)
end
— Set TTL on both keys (so structures will expire if client inactive)
redis.call(“EXPIRE”, pendingKey, ttl)
redis.call(“EXPIRE”, seqKey, ttl)
return { seq, redis.call(“ZCARD”, pendingKey) }
This script ensures that all operations for adding a pending message, trimming, TTL setting, etc. happen together atomically, in one server‐side call, minimizing network round-trips, race conditions, and ensuring consistency across multiple keys for that client (because both keys use hash tag {clientId}
in their names).
Example: Java usage with Lettuce async client
Below is a simplified Java snippet using Lettuce to load and use the Lua script, asynchronously.
import java.util.concurrent.CompletableFuture;import io.lettuce.core.RedisClient;
import io.lettuce.core.RedisURI;
import io.lettuce.core.api.StatefulRedisConnection;
import io.lettuce.core.api.async.RedisAsyncCommands;
public class PendingMessageHandler {
private final RedisAsyncCommands<String, String> asyncCommands;
private final String addPendingScriptSha;
public PendingMessageHandler(String redisUri) throws Exception {
RedisClient client = RedisClient.create(RedisURI.create(redisUri));
StatefulRedisConnection<String, String> connection = client.connect();
this.asyncCommands = connection.async();
// Load Lua script into script cache
String script = loadLuaScript(); // you load the string from resource
this.addPendingScriptSha = asyncCommands.scriptLoad(script).get();
}
public CompletableFuture<long[]> addPending(String clientId, String payload, long timestamp,
int maxQueueSize, int ttlSeconds) {
// Build keys. Use hash tag {clientId} so both map to same slot.
String pendingKey = “persist:{“ + clientId + “}:pending”;
String seqKey = “persist:{“ + clientId + “}:seq”;
return asyncCommands.evalsha(
addPendingScriptSha,
io.lettuce.core.ScriptOutputType.MULTI,
new String[] { pendingKey, seqKey },
payload, Long.toString(timestamp), Integer.toString(maxQueueSize), Integer.toString(ttlSeconds)
)
.thenApply(obj -> {
// obj is a List of two items: sequence number and current size
// Lettuce returns a RedisFuture<Object>, cast as needed
// Convert to long[]
java.util.List<String> list = (java.util.List<String>) obj;
long seq = Long.parseLong(list.get(0));
long size = Long.parseLong(list.get(1));
return new long[] { seq, size };
});
}
private String loadLuaScript() {
// Read your add_pending_message.lua into a single string
// Could use Files.readAllBytes(…) etc.
// For brevity returning inline
return “local pendingKey = KEYS[1] \n”
+ “local seqKey = KEYS[2] \n”
+ “local payload = ARGV[1] \n”
+ “local score = tonumber(ARGV[2]) \n”
+ “local maxSize = tonumber(ARGV[3]) \n”
+ “local ttl = tonumber(ARGV[4]) \n”
+ “local seq = redis.call(‘INCR’, seqKey)\n”
+ “redis.call(‘ZADD’, pendingKey, score, payload)\n”
+ “local currentSize = redis.call(‘ZCARD’, pendingKey)\n”
+ “if currentSize > maxSize then\n”
+ ” local numToRemove = currentSize – maxSize\n”
+ ” redis.call(‘ZREMRANGEBYRANK’, pendingKey, 0, numToRemove – 1)\n”
+ “end\n”
+ “redis.call(‘EXPIRE’, pendingKey, ttl)\n”
+ “redis.call(‘EXPIRE’, seqKey, ttl)\n”
+ “return { seq, redis.call(‘ZCARD’, pendingKey) }\n”;
}
}
This example shows:
- Loading the Lua script into Redis script cache (via
SCRIPT LOAD
) so subsequentEVALSHA
is faster. - Use hash tags in key names to guarantee slot consistency.
- Use asynchronous API in Lettuce to avoid blocking; multiple threads/tasks can issue evalsha calls concurrently.
- The returned value is parsed and used (sequence number, queue length) so TBMQ can maintain logic (e.g. acknowledgments, sending pending messages when client reconnects).
Example: Fetching and delivering pending messages when client reconnects
Another common pattern: when a persistent client reconnects, TBMQ needs to fetch all pending messages (i.e. those in Redis), deliver them, and then remove them (or mark them delivered). Also, in the process, maybe also clean expired ones. A Lua script can help do that atomically.
local pendingKey = KEYS[1]-- KEYS:
-- KEYS[1] = "persist:{clientId}:pending"
-- ARGV:
-- ARGV[1] = maxBatchSize -- maximum number of messages to deliver per reconnection
-- ARGV[2] = currentTimestamp (optional, for filtering expired, etc.)
-- ARGV[3] = messageExpirySeconds
local maxBatch = tonumber(ARGV[1])
local nowTs = tonumber(ARGV[2])
local expiry = tonumber(ARGV[3])
— get batch of messages, from lowest score upwards
local entries = redis.call(“ZRANGE”, pendingKey, 0, maxBatch – 1)
if #entries == 0 then
return {}
end
— Optionally filter by expiry: assume that score is timestamp
local toDeliver = {}
for i, payload in ipairs(entries) do
local score = redis.call(“ZSCORE”, pendingKey, payload)
if score and (nowTs – tonumber(score) <= expiry) then
table.insert(toDeliver, payload)
end
end
— Remove those toDeliver from the pending set
if #toDeliver > 0 then
for _, payload in ipairs(toDeliver) do
redis.call(“ZREM”, pendingKey, payload)
end
end
return toDeliver
Java Lettuce code to invoke:
public CompletableFuture<List<String>> fetchPending(String clientId, int maxBatchSize, long currentTs, int expirySeconds) {
String pendingKey = "persist:{" + clientId + "}:pending";
// Suppose fetchScriptSha has been loaded similarly
return asyncCommands.evalsha(
fetchScriptSha,
io.lettuce.core.ScriptOutputType.MULTI,
new String[] { pendingKey },
Integer.toString(maxBatchSize),
Long.toString(currentTs),
Integer.toString(expirySeconds)
).thenApply(obj -> {
java.util.List<String> list = (java.util.List<String>) obj;
return list;
});
}
These patterns allow TBMQ to:
- Deliver pending messages in batches
- Remove delivered messages
- Avoid overloading clients on reconnect
- Clean expired ones
All in atomic server‐side scripts that reduce network latency and complexity.
Performance Results from TBMQ
Based on the ThingsBoard / TBMQ research papers and blog posts, here are some observed improvements after the migration from PostgreSQL → Redis + Lua + Lettuce:
Setup | Throughput (messages-/second) | Key observations |
---|---|---|
PostgreSQL persistence (DEVICE clients) | ~30,000 msg/s | Disk‐based, vertically scaled; scaling beyond this was difficult. |
Redis + Jedis (synchronous) + Lua | ~40,000 msg/s | Better than Postgres, but Jedis synchronous nature limited concurrency. |
Redis + Lettuce (asynchronous) + Lua scripting + batching | ~60,000 msg/s | Significant improvement due to non‐blocking, pipelined and parallel Redis commands. |
Peak tests (P2P messaging) up to 1 million messages per second when scaling TBMQ brokers, Redis cluster, and Kafka appropriately. Latency remained in two-digit milliseconds. |
These results show that the architectural changes are not merely theoretical: real, measurable benefit in throughput and latency.
Trade-offs, Caveats, and Best Practices
While the Redis + Lua + Lettuce architecture offers significant advantages, there are trade-offs and situations where this design may not be ideal.
Durability vs Memory
- Redis is primarily in-memory. To prevent data loss you must configure persistence (snapshotting / RDB, AOF) and replication backups. Depending on configuration, recovering from a crash may lose recent writes if not using AOF or suitably frequent snapshots. TBMQ must guarantee QoS and message durability; this impacts how Redis durabilty is set up.
- Memory usage: pending messages, large payloads, and many persistent clients mean that memory usage on the Redis cluster can be large. Effective TTLs, trimming of queues, and limits are essential.
Lua Script Execution Time
- Lua scripts block the server for their execution. If they are long or do heavy work (e.g. scanning or removing many items), they can impact latency for other clients. There is a
lua-time-limit
configuration in Redis (default about 5 seconds) beyond which long scripts may start causing errors or busy responses. - Keep Lua scripts simple, efficient; avoid operations over large datasets inside a script. Possibly batch work or spread over multiple invocations.
Hash Tagging and Slot Constraints
- All keys that are involved in multi-key operations or used together in Lua scripts must be in the same Redis slot. Mis-naming or not including hash tags properly can lead to cross-slot errors (Redis Cluster refusing the command) or performance penalties.
- Key naming conventions must be consistent and designed from the start.
Asynchronous Client Complexity
- Using asynchronous clients like Lettuce adds complexity: error handling, backpressure, concurrency, ordering of commands, etc.
- Batching helps but one must tune batch size vs latency. If batches are too large or flush frequency is too low, latency may suffer. If too small, overhead increases.
When PostgreSQL Might Still Be Preferable
- For metadata heavy queries, relational integrity, complex joins, reporting, etc., PostgreSQL or another RDBMS might still be better.
- If the number of persistent clients is small and message rates modest, the complexity of Redis cluster + Lua + async may not pay off. Simpler architecture could be sufficient.
Putting It All Together: Sample High-Level Flow
Here is a high-level outline of how TBMQ (or any MQTT broker adopting similar design) might use this architecture in practice.
Publish by DEVICE client when client is online but wants persistence
-
- Broker receives message (QoS 1 or QoS 2 as needed)
- Broker uses a Lua script via Redis to add message into pending store for that client (pending set/list), assign sequence number, set TTLs, possibly trim oldest entries if necessary.
- Broker also sends message onward (to subscribers) immediately if needed.
Client goes offline
-
- Broker marks client as offline, but retains its pending messages in Redis.
Client reconnects
-
- Broker checks Redis for pending messages via Lua script (fetch a batch), remove them (or mark as delivered) atomically.
- Delivers those messages to client in proper order.
- Optionally, if no reconnect for a long time, TTLs or cleanup scripts purge the pending store for that client.
Periodic cleanup or expiry
-
- Parts of logic may happen via TTLs on keys.
- Possibly also background tasks to prune expired or too large pending queues, but care must be taken to avoid cross-slot issues or long Lua script execution times.
Monitoring, scaling, metrics
-
- Monitor CPU / memory / network utilization on broker and Redis nodes.
- Use metrics such as messages/sec per CPU core, latencies, commands/sec per Redis node.
- Scale Redis cluster horizontally by adding shards.
- Tune Lettuce client batch sizes, flush frequencies, thread pools.
Additional Code Example: Putting it in a TBMQ-like Broker Context
Suppose in a broker codebase, you have a handler for PUBLISH
with QoS=1 for persistent DEVICE clients. Rough pseudocode (Java) integrating these parts:
public void handlePublish(DeviceClient client, String topic, byte[] payload) {
if (client.isPersistent()) {
long nowTs = System.currentTimeMillis();
// serialize payload to string or bytes
String serialized = Base64.getEncoder().encodeToString(payload);
// schedule via async
pendingHandler.addPending(
client.getClientId(),
serialized,
nowTs,
clientConfig.getMaxPendingQueueSize(),
clientConfig.getPersistenceTTLSeconds()
).thenAccept(seqAndSize -> {
long seq = seqAndSize[0];
long queueSize = seqAndSize[1];
// maybe emit a metric: pending queue length, etc.
}).exceptionally(ex -> {
log.error("Failed to persist message for client {}", client.getClientId(), ex);
// decide fallback: maybe drop, maybe block, maybe retry
return null;
});
}
// deliver message to subscribers, etc.
}
Then, when client reconnects:
public void handleClientReconnect(DeviceClient client) {
long nowTs = System.currentTimeMillis();
fetchPendingHandler.fetchPending(
client.getClientId(),
clientConfig.getMaxBatchOnReconnect(),
nowTs,
clientConfig.getPersistenceExpirySeconds()
).thenAccept(pendingMessages -> {
for (String serialized : pendingMessages) {
byte[] msgBytes = Base64.getDecoder().decode(serialized);
// deliver message to client
client.sendMqttPublish(/* reconstruct message from topic, payload, QoS, seq etc */);
}
}).exceptionally(ex -> {
log.error("Failed to fetch pending for client {}", client.getClientId(), ex);
return null;
});
}
Summary of Gains in TBMQ After Replacing PostgreSQL
Putting together what we have seen:
- Postgres → Redis migration for persistent DEVICE client session persistence significantly reduces latency and improves throughput.
- The use of Lua scripting ensures atomic updates across keys (sequence counters, pending queues, trimming, TTLs) which avoids race conditions.
- The switch from a synchronous Redis client (Jedis) to an asynchronous client (Lettuce) enables buffering, pipelining, overlapping of I/O, improving utilization of Redis and network, unlocking higher throughput.
- Under TBMQ’s tests, with these architectural changes, the system can sustain message rates up to 1 million messages per second in P2P messaging scenarios, maintain stable latency, high CPU utilization (~90%) without being blocked by persistence layer.
Practical Steps for Someone Wanting To Do This Migration in Their Own MQTT Broker
If you are developing or operating an MQTT broker (or similar messaging system) and considering moving session persistence from PostgreSQL (or another relational database) to Redis + Lua + Lettuce, here are steps and considerations:
Audit your existing persistence usage
-
- What data do you store in Postgres for each session client: subscriptions, pending messages, unacknowledged messages, sequence numbers, QoS 2 states, etc.?
- How many clients, what average pending queue size, what message rates, what payload sizes?
- What latency and throughput constraints do you need?
Design key naming and slot strategy
-
- Choose a schema for Redis keys. For example:
persist:{clientId}:pending
persist:{clientId}:seq
persist:{clientId}:subscriptions
- Ensure hash tags
{clientId}
so all keys for a client map to same hash slot.
- Choose a schema for Redis keys. For example:
Define data structures
-
- For pending messages: list vs sorted set (if ordering by timestamp or other metric)
- For sequence numbers: string/integer keys
- Subscriptions: hash/datastructure
Write Lua scripts for core operations: add pending message, fetch pending batch on reconnect, remove/delivered messages, trim/expire, etc. Make them efficient. Load to Redis script cache, use EVALSHA
.
Set up Lettuce (or similar async client)
-
- Integrate script loading, invocation.
- Use async or reactive API to avoid blocking the broker thread when waiting for Redis I/O.
- Possibly batch or pipeline commands if multiple clients or multiple operations.
Configure Redis cluster for required scale
-
- Number of shards / nodes.
- Memory & persistence (AOF or RDB).
- TTL and eviction strategies.
- Monitor command rates, memory usage, latency.
Migration strategy
-
- Probably incremental. For clients with persistent sessions, gradually switch from Postgres to Redis.
- Ensure fallback or replication during migration.
- Possibly bulk migrate pending messages etc.
Testing & Benchmarking
-
- Run load tests (P2P messaging, reconnect scenarios).
- Measure messages/sec, per-CPU core efficiency, latency, failure recovery.
- Tune batch sizes, TTLs, script sizes.
Conclusion
Switching from PostgreSQL to Redis for session persistence in TBMQ (or any similar MQTT broker) can yield very large performance and scalability gains, particularly under high message throughput, high concurrency, or point-to-point (P2P) messaging patterns. The keys to achieving those gains are:
- Data modeling aligned to Redis strengths: Using in-memory data structures, TTLs, sorted sets or lists, keys per client, hash tagging, etc.
- Atomicity via Lua scripts: Essential when multiple keys or multiple operations are needed per operation (adding pending messages, trimming, removal, fetching pending). Without Lua (or transactions), you risk inconsistency, race conditions, or partial state.
- Asynchronous, non-blocking client: A synchronous client limits how many in flight requests or overlapping operations you can have; latency and throughput suffer. Lettuce (or another async/reactive client) lets you push concurrency, use pipelining, batching, etc., thereby leveraging Redis more fully.
- Cluster constraints & design discipline: Must ensure hash slots alignment, avoid cross-slot operations, limit script execution time, avoid huge payloads in scripts, monitor memory.
- Durability & operational concerns: Redis persistence has to be configured properly. Memory must be monitored. Systems for recovery, backup, and high-availability must be put in place.
In TBMQ’s case, the migration to Redis & Lua & Lettuce allowed moving from ~30k msg/s to ~60k msg/s in normal cases, and even up to 1 million msg/s in large, distributed peer-to-peer messaging scenarios — while maintaining QoS guarantees, acceptable latency, and without PostgreSQL becoming the bottleneck.
If you are building a broker or messaging system and anticipate scale (lots of persistent clients, high publish rates), this architecture is very compelling. However, for smaller or simpler systems, or when message persistence requirements are modest, sticking with a relational DB might still make sense for simplicity.