Automated Protocol Upgrade Mechanism

Automated Protocol Upgrade Mechanism

Since the last hard fork, there have been lots of thoughts about how to improve the process and achieve automated backwards incompatible upgrades of the protocol. I’ve started this thread as a place to consolidate those thoughts in one place, and get towards an implementation.

A rough idea for the process would be to:

  1. Align on requirements for an automated upgrade mechanism. What does a successful automated upgrade mechanism look like for core protocol developers, node operators, zkApp developers, and other stakeholders?
  2. Ideate and agree on a technical design and implementation plan to meet those requirements. What changes need to be made to which components? What teams have the knowledge and time to implement those changes and when? What does that mean for node operators, node developers, and zkApp developers downstream from the changes?
  3. Discuss whether we should go through an MIP process for this feature.
  4. Kick off implementation and operationalization of the plan.

Let’s start with gathering the requirements in this thread. To seed the conversation, below are a few thoughts on what an automated hard fork mechanism should achieve:

Automated upgrades for node operators: node operators can upgrade their node software like any other node upgrade package, and the actual switch from one protocol version to the next is performed automatically, without manual intervention, at the appropriate time.

No need for centralized action at the time of the protocol upgrade: at the last upgrade, there was a script that created the new ledger state. The script was used to package a new ledger, as well as for validation of its integrity by community members. In the future, it is preferable that each node constructs the new ledger independently.

Automated hard fork signaling to the network: node operators signal their readiness for the hard fork to the network, and the nodes automatically switch over to the hard fork version only when a sufficient amount of stake has signaled their readiness to switch over (as measured, for example, by some predetermined number of blocks in an epoch produced by node operators who have signaled their readiness to switch). This might be included in the original solution, or as a later upgrade.

Please share your thoughts in the comments below. What’s important to you? What are your concerns? In particular, if you are running a node (either for producing blocks, or for running a service such as a wallet), what kind of features would make your life easier around a hard fork? If you are working on an implementation team on the node, are there features of an upgrade process that you would like to see that would make your work on new features easier?

For now, let’s try to limit the discussion to the requirements, we’ll move to solutions later once we have converged on the requirements.

Your input will directly shape how we approach this important evolution of Mina’s infrastructure.

4 Likes

Another feature that will be very valuable:

The update mechanism should allow automatic testing of the transition from one protocol version to the next.

Not only will that increase our confidence when we approach a protocol upgrade, but also will it reduce the manual effort every time we do an upgrade.

Here’s another requirement I’d like to propose.

There Is No Fork: when there is a protocol upgrade and there is a stake minority of nodes which are not upgraded yet to support the new protocol version, those nodes will stop producing blocks.

This would ensure us that we don’t actually get a hard fork, in the sense that there will be two incompatible networks, when doing a protocol upgrade.

I would be very interested in hearing about requirements from people running services like an explorer or a wallet that interface with a node. For example, it could be desirable to trigger an update of your own component after a successful protocol upgrade.

In order not to bury information in one long forum thread, I have created a GitHub repository for this. I’ll be capturing requirements that we come up with in the form of issues, to make it easier to have all the requirements at a glance.

Are we talking about a rust-node or not?

As of today, we have uptime. All nodes are updated in turn. Who will be responsible if someone loses uptime during automatic version update?

Good question, thanks for asking! The mechanism should work for all nodes in the network, so both the rust and the ocaml node. I’ll add that as a requirement.

1 Like

No one should lose uptime with the delegation program because of a protocol upgrade. I think we should have one of two situations: either the upgrade is so smooth that it works without any downtime at all, or we should have a planned downtime, during which we do not expect nodes to be up.

If the second option, then most likely we should have two or more time windows.

Philipp,
We’ve been discussing this internally at Encapsulate and wanted to share a few thoughts:

  • Cosmos does a great job tying upgrades to block heights in our case we could consider epoch boundaries. It keeps things predictable and easy to manage—might be worth considering for Mina.

  • Introducing an SLA for validators could ensure timely upgrades, minimizing delays and keeping the network running smoothly.
    SLAs could also be extended to other scenarios, such as delegation policies or ensuring prompt validator responses during updates.

  • In-chain signaling can always come later. For now, a simple off-chain solution (like Discord or forums) could handle readiness tracking without complicating the protocol.

  • Letting nodes build the new ledger independently is a big one—it removes reliance on centralized scripts and keeps everything decentralized and trustless.

Thank you!
We will be appending our comment if we have any additional inputs and concerns!