Automated Protocol Upgrade Mechanism

Automated Protocol Upgrade Mechanism

Since the last hard fork, there have been lots of thoughts about how to improve the process and achieve automated backwards incompatible upgrades of the protocol. I’ve started this thread as a place to consolidate those thoughts in one place, and get towards an implementation.

A rough idea for the process would be to:

  1. Align on requirements for an automated upgrade mechanism. What does a successful automated upgrade mechanism look like for core protocol developers, node operators, zkApp developers, and other stakeholders?
  2. Ideate and agree on a technical design and implementation plan to meet those requirements. What changes need to be made to which components? What teams have the knowledge and time to implement those changes and when? What does that mean for node operators, node developers, and zkApp developers downstream from the changes?
  3. Discuss whether we should go through an MIP process for this feature.
  4. Kick off implementation and operationalization of the plan.

Let’s start with gathering the requirements in this thread. To seed the conversation, below are a few thoughts on what an automated hard fork mechanism should achieve:

Automated upgrades for node operators: node operators can upgrade their node software like any other node upgrade package, and the actual switch from one protocol version to the next is performed automatically, without manual intervention, at the appropriate time.

No need for centralized action at the time of the protocol upgrade: at the last upgrade, there was a script that created the new ledger state. The script was used to package a new ledger, as well as for validation of its integrity by community members. In the future, it is preferable that each node constructs the new ledger independently.

Automated hard fork signaling to the network: node operators signal their readiness for the hard fork to the network, and the nodes automatically switch over to the hard fork version only when a sufficient amount of stake has signaled their readiness to switch over (as measured, for example, by some predetermined number of blocks in an epoch produced by node operators who have signaled their readiness to switch). This might be included in the original solution, or as a later upgrade.

Please share your thoughts in the comments below. What’s important to you? What are your concerns? In particular, if you are running a node (either for producing blocks, or for running a service such as a wallet), what kind of features would make your life easier around a hard fork? If you are working on an implementation team on the node, are there features of an upgrade process that you would like to see that would make your work on new features easier?

For now, let’s try to limit the discussion to the requirements, we’ll move to solutions later once we have converged on the requirements.

Your input will directly shape how we approach this important evolution of Mina’s infrastructure.

4 Likes

Another feature that will be very valuable:

The update mechanism should allow automatic testing of the transition from one protocol version to the next.

Not only will that increase our confidence when we approach a protocol upgrade, but also will it reduce the manual effort every time we do an upgrade.

Here’s another requirement I’d like to propose.

There Is No Fork: when there is a protocol upgrade and there is a stake minority of nodes which are not upgraded yet to support the new protocol version, those nodes will stop producing blocks.

This would ensure us that we don’t actually get a hard fork, in the sense that there will be two incompatible networks, when doing a protocol upgrade.

I would be very interested in hearing about requirements from people running services like an explorer or a wallet that interface with a node. For example, it could be desirable to trigger an update of your own component after a successful protocol upgrade.

In order not to bury information in one long forum thread, I have created a GitHub repository for this. I’ll be capturing requirements that we come up with in the form of issues, to make it easier to have all the requirements at a glance.

Are we talking about a rust-node or not?

As of today, we have uptime. All nodes are updated in turn. Who will be responsible if someone loses uptime during automatic version update?

Good question, thanks for asking! The mechanism should work for all nodes in the network, so both the rust and the ocaml node. I’ll add that as a requirement.

1 Like

No one should lose uptime with the delegation program because of a protocol upgrade. I think we should have one of two situations: either the upgrade is so smooth that it works without any downtime at all, or we should have a planned downtime, during which we do not expect nodes to be up.

If the second option, then most likely we should have two or more time windows.

Philipp,
We’ve been discussing this internally at Encapsulate and wanted to share a few thoughts:

  • Cosmos does a great job tying upgrades to block heights in our case we could consider epoch boundaries. It keeps things predictable and easy to manage—might be worth considering for Mina.

  • Introducing an SLA for validators could ensure timely upgrades, minimizing delays and keeping the network running smoothly.
    SLAs could also be extended to other scenarios, such as delegation policies or ensuring prompt validator responses during updates.

  • In-chain signaling can always come later. For now, a simple off-chain solution (like Discord or forums) could handle readiness tracking without complicating the protocol.

  • Letting nodes build the new ledger independently is a big one—it removes reliance on centralized scripts and keeps everything decentralized and trustless.

Thank you!
We will be appending our comment if we have any additional inputs and concerns!

Thanks for adding to the discussion!

Good point about the epoch boundaries. I expect that we will only have protocol upgrades at epoch boundaries. In general, it is very hard to do it at another time. For example, think of a change in the protocol that changes how the stake distribution is calculated: the calculation only happens once per epoch anyway, so changing the protocol within an epoch doesn’t make much sense.

I’m curious about the suggestion of introducing SLAs. The complicating factor is that the network is decentralised, so there is no legal entity that owns the network and could enforce SLAs (except of course for such block producers that are participating in a delegation program). One could introduce a kind of protocol level SLA by tying the rewards to some timeliness of updates. Would love to hear more thoughts on this.

And yes, building the ledgers independently is the main thing that removes the need for coordination.

for app smart contracts, it will be preferred to have a decentralized upgrade mechanism. for an app like Lumina, where users deploy the pools, we’re reliant on that user performing the upgrade. if Lumina holds this key as a service, it puts Lumina in control. a multisig would help, like if mina foundation was also a signer. also mina foundation could put in charge of this hardfork upgrade key service?

1 Like

If protocols give users the freedom to deploy smartcontracts, this makes users the owners of the smartcontract signature keys. In the event of a hardfork, the user could then deploy malicious code, or even never update these contracts, thereby blocking user funds.

Ideally we need something like delegate signature, this property would be added to the account properties, and we’d also need to add an update permission based on this delegated signature.

@zekoxyz @youtpout, the topic you are addressing is really important! How do we enable zkApp maintainers to update zkApps, without having to trust them not to do anything malicious? Probably something like a committee that is appointed upon first deploy and will commit to doing code review before an update is a workable solution.

The updates that I wanted to talk about in this thread are updates of the protocol itself: things like the Berkeley hard fork, where o1js was introduced to mainnet. When we did that, there was quite some manual work and coordination involved, and we’d like to reduce that in the future.

So we’ll have to talk about that on a separate topic, but your solution would be interesting and sounds like multisig.