VRF balancing, backfilling mechanism

Current block chain (as of 12/17/2024) has many slots without a single block producer (BP). Then again, many slots have more than a BP allocated. This slows down the chain’s throughput as blocks are not produced deterministically.

Backfilling mechanism should be designed. A simple example mechanism could be allowing previous block VRF losers to produce a block for the next slot (perhaps with a small compensation delay to identify empty slot).

For instance, if slot N+1 is empty, the producer for slot N can create a block for N+1 as well, if it has not produced a block for slot N.

3 Likes

In practice, this approach could implement a waitlist mechanism, where the public keys of VRF losers are added to a VRF-output-ordered list. This list would be maintained by a daemon for a sliding window duration (e.g. an epoch time, 7,200 slots, or less).

If no block is produced within a 1/3 of the slot time, a daemon can approve a block submitted by a block producer (BP) from the waitlist, based on the order of their VRF output (with lower outputs prioritized).

BPs on the waitlist are removed under two conditions:

  1. When they successfully produce a block, as a result of being selected from the waitlist, or they have an assigned slot.
  2. When the sliding time window elapses.

BP behaviour is changed quite a bit with the above approach. Whenever a BP is on the top of the waitlist, it shall produce a block to all following slots and monitor for an empty slot. This creates additional load for the BP.

Even if the waitlist is ordered, BPs on the waitlist might send their block to an empty slot all at once. This may cause load to the network if the waitlist is long.

Next step would be to do the analysis how this would change the chain behaviour and what slot time-window would achieve good enough fill rate for empty slots.

1 Like

A brief security analysis suggests the following enhancements:
- The highest-priority BP on the waitlist is the designated BP for the next empty slot.
- The designated BP is removed from the waitlist after either producing a block or missing the empty slot.
- Other waitlist BPs are prohibited from producing a block for the empty slot.

With the above settings, the incentive for all BPs to send a block to an empty slot is removed. This also eliminates the incentive to interfere with other BPs on the waitlist to produce a block for an empty slot.

1 Like

The designated BP must be included in the chain to verify its validity.

One approach is to maintain the waitlist in the nodes using a Sparse Merkle Tree (SMT), and include the designated BP’s public key as an optional parameter in the consensus state, along with a zk-SNARK proof of its eligibility. The zk-SNARK proof would ensure the following:

  • The designated BP added to the chain is the highest priority in the node local waitlist.
  • The VRF output of the designated BP is below the slot’s VRF threshold.

The use of an SMT allows for slight synchronization differences between nodes, as long as the designated BP is present in the waitlist.

When an empty slot is reached, the block from the designated BP can be accepted and forwarded by other BPs. The designated BP information will be updated in the chain by the creator of the empty block, according to their waitlist. If the designated BP misses the slot, the information will be updated by the next BP, based on their waitlist.

In addition to zk-SNARK proof for the VRF output, the chain includes the slot number N for the proof. The proof is thus a copy of the block proof sent by the BP for the slot N and when included in the SMT new proofs are not needed.

During bootstrap, the SMT waitlist can be requested from several other peers (preferably 3) and combined into one SMT waitlist entity. As a security measure, the bootstrap BP does not include waitlist entries from the peer BP public key as part of its SMT, these are ignored by the requester during the SMT fill procedure.

The designated BP + slot + proof are an optional part of chain and thus backwards compatible. If the next BP node does not support them, they are ignored which shall leave the slot empty as before.

BP shall only forward or accept designated BP blocks for “empty” slots that are received after the 1/3 fraction of the slot time and have verified proof info for their slot. The slot must not be outdated either.

Next I’ll calculate statistics for epoch 13 what is the current fill rate and how much of it would be achieved with above backfilling mechanism. This also gives approximate estimate what slot duration would be a good sliding window timeout for the waitlist members.

Epoch 13 slots are now available and got enough slots that analysis could be made (via Minascan.io Blockberry API, thanks!). The result summary below. In brief, it works amazingly well, empty slots after the backfilling mechanism is introduced were less than 1.5 % of all slots, compared to original 45 %!

For the results, a minor change was introduced to earlier mechanism details: the requirement to remove a BP from the waitlist if it produced a block during its real slot was removed. Also, we can simplify the waitlist algorithm to be simple first-in, first-out (FIFO) mechanims with a limit of a unique BP public key in the waitlist, so SMT is not needed. SMT was originally thought to be used if Merkle paths of designated BP need to be added to consensus state.

Summary of Epoch 13: Slots 1528-7139 (5611 slots in total)
==========================================================
Total non-canonical blocks:		6761
Total empty slots:			    84
Original empty slots:			2541
Waitlist time-window:			47
Waitlist length after last block:	10
Empty slots % without backfilling:	45.29 %
Empty slots % with backfilling:		1.50 %
Empty slots:
[1532, 1533, 1537, 1692, 3642, 3946, 3947, 3953, 3954, 3955, 3974, 4144, 4145, 4149, 4150, 4155, 4312, 4327, 4328, 4335, 4338, 4339, 4346, 4347, 4348, 4349, 4350, 4351, 4380, 4416, 4740, 5456, 5459, 5478, 5479, 5525, 5671, 5672, 5673, 5730, 5732, 5733, 5734, 5736, 5738, 5739, 5743, 5779, 5798, 5829, 5838, 5841, 5842, 5981, 5982, 5984, 5985, 5986, 5987, 5988, 6037, 6038, 6039, 6325, 6445, 6514, 6517, 6538, 6541, 6546, 6551, 6552, 6553, 6555, 6559, 6560, 6561, 6637, 6638, 6639, 6838, 6839, 6844, 6845]

AFAICT, the consensus state should have an extension of

  • designated slot
  • designated BP
  • and designated slot proof

for the mechanism to work as analyzed (and described earlier).

All the data is available, so if you want to verify any parts of the analysis, I am happy to provide it! Just ping me in Discord for the data. Happy to also answer to questions here as well, of course!

As tizoc pointed out, the remaining challenge is to incorporate the designated BP into the empty block proof, where the designated BP’s VRF output exceeds the threshold. The proof could follow the logic of ‘my VRF output is below the threshold (i.e., valid), or I am the designated BP.’ However, achieving this appears to be non-trivial.

And here below are the results for full epoch 14 (data provided by MinaExplorer, thanks!):

Summary of Epoch 14: Slots 0-7139 (7140 slots in total)
==========================================================
Total non-canonical blocks:		7936
Total empty slots:			    127
Original empty slots:			3249
Waitlist time-window:			53
Waitlist length after last block:	8
Empty slots % without backfilling:	45.51 %
Empty slots % with backfilling:		1.78 %

The results are very similar to epoch 13 - now with all epoch slots, though.

Perhaps a simpler solution compared to the designated BP approach could be a tuning parameter for VRF threshold fraction per previous epoch results. It would compare block height to slot amount. Target there could be zero i.e. with empty slots, the threshold would have a related correction addition for the next epoch. The only downside is the increased amount of duplicate BPs per slot, thus increased traffic. This is avoided with the designated BP approach.

And the most simplest solution would be to just change the fill rate target to be 1.0 in the next hard fork. As the current target is 0.75 (according to Discord sources, gareth, sheriff), the 0.55 fill rate is probably as expected. With 1.0, we would probably get around 0.8 or a bit more, which is much better.

Based on insight from georgee (thanks!), we cannot really change or correct the fill rate target easily as the k (290 now) and possibly other security parameters would need to be revisited. Perhaps the original suggestion is the best so far and achieves close 100 % fill rate.

For the epoch 15, the stats are as follows:

Summary of Epoch 15: Slots 0-7138 (7139 slots in total)
==========================================================
Total non-canonical blocks:		7528
Total empty slots:			    200
Original empty slots:			3302
Waitlist time-window:			43
Waitlist length after last block:	7
Empty slots % without backfilling:	46.26 %
Empty slots % with backfilling:		2.80 %

Stats for Epoch 16 below:

Summary of Epoch 16: Slots 2-7138 (7137 slots in total)
==========================================================
Total non-canonical blocks:		5561
Total empty slots:			    525
Original empty slots:			3296
Waitlist time-window:			58
Waitlist length after last block:	4
Empty slots % without backfilling:	46.19 %
Empty slots % with backfilling:		7.36 %

As a recap, the mechanism

  • fills empty slots with a block that is produced by previous orphaned blocks BP i.e. by a designated BP
  • the designated BP filling the empty slot is selected by FIFO principle and is added to the consensus state in advance
  • a block is considered empty when in 1/3 slot time blocks have not been received
  • BPs keep FIFO waitlist up-to-date locally and verify validity of the chain updates against it. It is synced from other BPs during bootstrap. A waitlist time-window is used to limit the length of the waitlist.
  • Block approval is changed to allow to accept blocks that either are under VRF threshold fraction or are produced by the designated BP in the chain. Under VRF threshold fraction block has always the priority (as the slot is not empty with such a block).

And results for epoch 17:

Summary of Epoch 17: Slots 0-7138 (7139 slots in total)
==========================================================
Total non-canonical blocks:		4793
Total empty slots:			    709
Original empty slots:			3291
Waitlist time-window:			36
Waitlist length after last block:	0
Empty slots % without backfilling:	46.11 %
Empty slots % with backfilling:		9.93 %

Results for epoch 18. They seem to get worse every epoch, pretty interesting.

Summary of Epoch 18: Slots 2-7139 (7138 slots in total)
==========================================================
Total non-canonical blocks:		3442
Total empty slots:     			1119
Original empty slots:			3209
Waitlist time-window:			39
Waitlist length after last block:	0
Empty slots % without backfilling:	44.96 %
Empty slots % with backfilling:		15.68 %

Just saw this. For some context, stake participation has been on a consistent downward trend since genesis, which creates more empty slots. That shouldn’t be too interesting. With 100% of the network actively staking, the fill rate is 75%, and I believe the network started at around 66% at genesis. I’m not sure what the exact meaning of these numbers you are outputting is, though.

I’m not in favor of an orphan BP waiting and creating that empty block, as that would favor the bigger block producers more.

The current consensus has a 3-minute slot per block. If a block producer doesn’t produce a block, you can’t place another block there in any form in the current structure. The new block would mess up the upcoming block and could cause valid blocks to become invalid.

With the current network, I don’t see any simple way to fix this, other than having 100% of stake equaling 100% block production, which would increase block production. I’m not sure that would be viable with the current consensus, but the only simple solution we have is to do that and reduce block rewards to balance it, if possible.

Thanks for the comment! The stats above show the current empty slots rate (44.96%) compared to the theoretical empty slots rate with the proposed designated BP solution (15.68 %) for epoch 18. So far the empty slots % with the proposed backfilling mechanism has been between 1.5 - 15.68 % for epochs 13-18, so quite a lot of variation but still much better than 45 % of empty slots. These results are generated by a script that simulates the designated BP implementation in action for an epoch. Let me know if you need more info.