VRF balancing, backfilling mechanism

Current block chain (as of 12/17/2024) has many slots without a single block producer (BP). Then again, many slots have more than a BP allocated. This slows down the chain’s throughput as blocks are not produced deterministically.

Backfilling mechanism should be designed. A simple example mechanism could be allowing previous block VRF losers to produce a block for the next slot (perhaps with a small compensation delay to identify empty slot).

For instance, if slot N+1 is empty, the producer for slot N can create a block for N+1 as well, if it has not produced a block for slot N.

3 Likes

In practice, this approach could implement a waitlist mechanism, where the public keys of VRF losers are added to a VRF-output-ordered list. This list would be maintained by a daemon for a sliding window duration (e.g. an epoch time, 7,200 slots, or less).

If no block is produced within a 1/3 of the slot time, a daemon can approve a block submitted by a block producer (BP) from the waitlist, based on the order of their VRF output (with lower outputs prioritized).

BPs on the waitlist are removed under two conditions:

  1. When they successfully produce a block, as a result of being selected from the waitlist, or they have an assigned slot.
  2. When the sliding time window elapses.

BP behaviour is changed quite a bit with the above approach. Whenever a BP is on the top of the waitlist, it shall produce a block to all following slots and monitor for an empty slot. This creates additional load for the BP.

Even if the waitlist is ordered, BPs on the waitlist might send their block to an empty slot all at once. This may cause load to the network if the waitlist is long.

Next step would be to do the analysis how this would change the chain behaviour and what slot time-window would achieve good enough fill rate for empty slots.

1 Like

A brief security analysis suggests the following enhancements:
- The highest-priority BP on the waitlist is the designated BP for the next empty slot.
- The designated BP is removed from the waitlist after either producing a block or missing the empty slot.
- Other waitlist BPs are prohibited from producing a block for the empty slot.

With the above settings, the incentive for all BPs to send a block to an empty slot is removed. This also eliminates the incentive to interfere with other BPs on the waitlist to produce a block for an empty slot.

1 Like

The designated BP must be included in the chain to verify its validity.

One approach is to maintain the waitlist in the nodes using a Sparse Merkle Tree (SMT), and include the designated BP’s public key as an optional parameter in the consensus state, along with a zk-SNARK proof of its eligibility. The zk-SNARK proof would ensure the following:

  • The designated BP added to the chain is the highest priority in the node local waitlist.
  • The VRF output of the designated BP is below the slot’s VRF threshold.

The use of an SMT allows for slight synchronization differences between nodes, as long as the designated BP is present in the waitlist.

When an empty slot is reached, the block from the designated BP can be accepted and forwarded by other BPs. The designated BP information will be updated in the chain by the creator of the empty block, according to their waitlist. If the designated BP misses the slot, the information will be updated by the next BP, based on their waitlist.

In addition to zk-SNARK proof for the VRF output, the chain includes the slot number N for the proof. The proof is thus a copy of the block proof sent by the BP for the slot N and when included in the SMT new proofs are not needed.

During bootstrap, the SMT waitlist can be requested from several other peers (preferably 3) and combined into one SMT waitlist entity. As a security measure, the bootstrap BP does not include waitlist entries from the peer BP public key as part of its SMT, these are ignored by the requester during the SMT fill procedure.

The designated BP + slot + proof are an optional part of chain and thus backwards compatible. If the next BP node does not support them, they are ignored which shall leave the slot empty as before.

BP shall only forward or accept designated BP blocks for “empty” slots that are received after the 1/3 fraction of the slot time and have verified proof info for their slot. The slot must not be outdated either.

Next I’ll calculate statistics for epoch 13 what is the current fill rate and how much of it would be achieved with above backfilling mechanism. This also gives approximate estimate what slot duration would be a good sliding window timeout for the waitlist members.

Epoch 13 slots are now available and got enough slots that analysis could be made (via Minascan.io Blockberry API, thanks!). The result summary below. In brief, it works amazingly well, empty slots after the backfilling mechanism is introduced were less than 1.5 % of all slots, compared to original 45 %!

For the results, a minor change was introduced to earlier mechanism details: the requirement to remove a BP from the waitlist if it produced a block during its real slot was removed. Also, we can simplify the waitlist algorithm to be simple first-in, first-out (FIFO) mechanims with a limit of a unique BP public key in the waitlist, so SMT is not needed. SMT was originally thought to be used if Merkle paths of designated BP need to be added to consensus state.

Summary of Epoch 13: Slots 1528-7139 (5611 slots in total)
==========================================================
Total non-canonical blocks:		6761
Total empty slots:			    84
Original empty slots:			2541
Waitlist time-window:			47
Waitlist length after last block:	10
Empty slots % without backfilling:	45.29 %
Empty slots % with backfilling:		1.50 %
Empty slots:
[1532, 1533, 1537, 1692, 3642, 3946, 3947, 3953, 3954, 3955, 3974, 4144, 4145, 4149, 4150, 4155, 4312, 4327, 4328, 4335, 4338, 4339, 4346, 4347, 4348, 4349, 4350, 4351, 4380, 4416, 4740, 5456, 5459, 5478, 5479, 5525, 5671, 5672, 5673, 5730, 5732, 5733, 5734, 5736, 5738, 5739, 5743, 5779, 5798, 5829, 5838, 5841, 5842, 5981, 5982, 5984, 5985, 5986, 5987, 5988, 6037, 6038, 6039, 6325, 6445, 6514, 6517, 6538, 6541, 6546, 6551, 6552, 6553, 6555, 6559, 6560, 6561, 6637, 6638, 6639, 6838, 6839, 6844, 6845]

AFAICT, the consensus state should have an extension of

  • designated slot
  • designated BP
  • and designated slot proof

for the mechanism to work as analyzed (and described earlier).

All the data is available, so if you want to verify any parts of the analysis, I am happy to provide it! Just ping me in Discord for the data. Happy to also answer to questions here as well, of course!

As tizoc pointed out, the remaining challenge is to incorporate the designated BP into the empty block proof, where the designated BP’s VRF output exceeds the threshold. The proof could follow the logic of ‘my VRF output is below the threshold (i.e., valid), or I am the designated BP.’ However, achieving this appears to be non-trivial.

And here below are the results for full epoch 14 (data provided by MinaExplorer, thanks!):

Summary of Epoch 14: Slots 0-7139 (7140 slots in total)
==========================================================
Total non-canonical blocks:		7936
Total empty slots:			    127
Original empty slots:			3249
Waitlist time-window:			53
Waitlist length after last block:	8
Empty slots % without backfilling:	45.51 %
Empty slots % with backfilling:		1.78 %

The results are very similar to epoch 13 - now with all epoch slots, though.