Draft: Mina Data Availability Layer

Hi everyone,

I’m Chiro founder and CTO of Orochi Network, I’m a grantee of zkIgnite Cohort #2. My zkDatabase project is focus on solving data availability and data correctness. I start this topic to discuss an improvement in off-chain storage solution at protocol level. This proposal is being drafted so feel free to discuss and contribute your opinion.

Abstract

This proposal establishes a new ability to store data on an off-chain layer; this data CAN NOT be accessible on zkApp but its commitment does by which a larger amount of data can be served and data layer can acting as source of trust for zkApp’s UI.

Motivation

Implement Data Availability Layer for Mina Protocol by which all ZK Application, L2 solutions can access the off-chain data securely. Improving data availability means people can develop featureful applications.

Prevent fragmentation of data and overhead in building temporary and short-term solutions.

Objectives

  • Build a consistent solution for all ZK Applications
  • Provide data commitment that compatible with Kimchi Proof System and o1js
  • Allowed zkApps and L2s to rent the data storage with MINA token (the blobs should be freed/disposed after the rental token is run out)
  • Implement data sharding to reduce the average cost per byte
  • Free developers from doing implementation for short-term solutions

Specification

Parameters

Data type

Data structure

Data validation

Commitment scheme

Network design

API

Full-node Integration

Proof-system Integration

Security Consideration

Trade-off

3 Likes

Could some one please move it back to MIP?, it isn’t a zkApp since there are some modifies in protocol and consensus level to make it live. @moderators

1 Like

Hey @chiro-hiro, I think this is an interesting idea! May I suggest the next steps would be to host a community call on the topic and set up a working group to focus on detailing this proposal out. Some things that working group could do would be to clearly define the desired developer experience and what specific use-cases this benefits and unlocks, and more!

Happy to be a part of that and help out :slight_smile:

2 Likes

@chiro-hiro zkDatabase does not provide any data availability guarantees. In fact, if used for data availability, the database provider can completely hide block data, in which case it is inherently worse than archival nodes.

I do believe that it is a great data storage solution, especially since we can have an off chain tamper proof db, but there’s an large difference between data storage and data availability.

Zeko’s litepaper had an overview of the DA options that they investigated a few months ago, see the Section 5. Now that Celestia and Eigenlayer have launched, and the cryptography and o1js have advanced considerably, I’m hoping we could get a renewed look of the issue from experts.

I had a twitter post here where Teddy and maht0rz had chipped in the discussion.

2 Likes

Let’s start by outlining what a DA-layer must do and what scenarios it must satisfy. From my understanding a data availability layer’s core feature is that the application state-tree is always available; a security feature which gives users confidence in the scenario where any service were ever to stop functioning, they can always prove custody of their funds.

1 Like

I completly agree with @teddyjfpender, we have to decide on what we actually want.
So there is this industry-wide term “data availability” floating around, but most people have different understandings on what that actually means. How I think most prominent projects define it is as something like “guaranteed data observability”. That means that if the DA-layer publishes a block, firstly every full node can check if all data was submitted (which is kinda trivial in our context), but additionally, every light client has the ability to download only the block header and the subset of the data it wants plus some additional stuff and verify that all data committed to in the block head is actually available. That additional stuff enables the light client to trustlessly verify that all the data that should be in that block is actually there, and nothing was changed / omitted by the producer. This is mostly done via sampling over some erasure-coded extension of the data. Remember, the light client shouldn’t have to download all the data in that block to ensure it’s availablility.
So what is a light client in this context? Most would think of ordinary users that want to participate in the network somehow, but in our case, light clients actually are the systems that submit data to the DA layer. For example, if a rollup wants to settle on an L1, it has to prove data availability. He does that by executing the verification steps of a light client and attaching a proof of that to the settlement. Basically, we want some sort of DA-proof to come along with L1 settlement. That convinces the L1 that the data corresponding to the settled computation is actually available in some external system.

This leads us to the second thing we might want: data storage and retrievability. Storage basically makes some assurance that for a certain time period, all data that was available at some point, will also be stored. This makes sure that some data will be stored somewhere if you weren’t online in the time that it was available without relying on archive nodes.
Retrievability again is a different thing, and pretty difficult to create guarantees around without economical or social assumptions. It says that anyone has to be able to retrieve the data that was submitted at some point in the past.
It seems that the industry settled on data availability providing enough guarantees for the time being and for retrievability and storage, we can safely rely on archive nodes and such. Altough I might add, that data storage alone doesn’t help much without retrievability. And since retrievability hasn’t been solved on a technological level, storage doesn’t really add any benefits.

5 Likes

I think there are three discussions going on in parallel here:

  1. How can we get a data storage protocol that is Mina-aligned (that’s what the first post in this thread is about)
  2. How can we integrate existing DA layers with Mina (Celestia, Avail, EigenDA). Notably, I believe this requires writing a groth16 verifier in Mina
  3. How can we have a Mina-aligned DA layer instead of using one of these other DA layers (Cardano is going through the same discussion at the moment and they also use Ouroboros so anybody interested in this may want to check out their new paper on this topic https://twitter.com/rom1_pellerin/status/1719318640498241980 )
3 Likes

Thanks to @teddyjfpender @rpanic @SebastienGllmt for your replies. As there is a long thread of discussions on twitter, mainly from @rpanic, I’ll try to categorise them here using the sub-discussions outlined by @SebastienGllmt. The initial post of this thread here was about data storage, not DA, so not continuing below). Also note that @rpanic 's comments on twitter may have not covered How, but more on why and discussions on the pros and cons of the approaches.

Integrate existing DA layers with Mina (Celestia, Avail, EigenDA). Notably, I believe this requires writing a groth16 verifier in Mina.

@rpanic: “tbh, bridging attestations over from celestia through some quorum of validators is a horrible idea. It removes all the properties of why we built DA in the first place resulting in really bad guarantees. But that is something that I find concerning with current DA archs anyways.”

Is there a way to use existing products to achieve so eg Celestia (prob no?), Avail, Eigenlayer?

“Afaik, no. They semm to all rely on some sort of state-root attestation on a L1 contract, that is done by some quorum of validators (how that works in detail I don’t know). What we can instead do on mina is prove the consensus of the DA layer itself (like the Mina L1).
This enables us to remove that trust assumption and at the same time reduce settlement cost.
EigenDA does it best, because they provide signatures of all the attesting validators so that is pretty strong. Still not as strong as proving consensus itself though.
So they problem with integrating existing solutions is: 1. Prove the state inclusion inside kimchi (most of them are KZG-based => difficult) 2. Have that attested stateroot coming from the DA-solution on the Mina L1 (either they collaborate on that or we bridge it from ETH).”

A Mina-aligned DA layer instead of using one of these other DA layers (Cardano is going through the same discussion at the moment and they also use Ouroboros so anybody interested in this may want to check out their new paper on this topic https://twitter.com/rom1_pellerin/status/1719318640498241980 )

“The DA I specced out and envision allows appchains to verify availability by simply merging in another proof, settlement cost stays constant. This also means that devs mostly don’t have to change any of their existing appchain-design patterns.”

@teddyjfpender: " What@rpanic46 is saying about the transaction cost must not go lost here. Running an app-chain, bespoke computation layer, L2, etc. need not pay more than anyone else to settle a transaction on the L1 ledger. Transactions need not be replayed, only verified, no gas or wasted computation."

@rpanic: "And we don’t want to add extra cost for proving DA as well. If we follow the traditional architectures that don’t utilize zk, that would be the case, so we can improve that by building a zk-native DA layer. ",

what’s the difficulty and limitation of settling on Mina L1 then, and how would you do it?

@rpanic: “The difficulty is not in the settlement itself, but in what you want to settle. Every Mina smartcontract is it’s own mini-rollups in the end. Problem with DA is that we have quite low limits on events and actions. But this is by design and DA should always be seperated imo.
But, the external DA layer has to have good guarantees, which DACs don’t have. Therefore we need a strong, well-designed, L1-aligned DA layer that integrates seemlessly in the current DX”

@teddyjfpender: “Couldn’t agree more with @rpanic46 he speaks eminent sense. What I would like to know more about, and study, is how a DA-layer can be optimized to specific applications & use-cases; gaming and DeFi might be quite different but perhaps there is a root of common requirements.”

3 Likes

Hi @teddyjfpender, I think it’s a good idea to have a community call for detailing this proposal. Each part of the proposal need different expertise to build up no doubt that we need more experts to join this discussion.

1 Like

I’d be happy to host that!

2 Likes

I really like what @SebastienGllmt mentioned, thanks for bringing that clarity! There are three discussions going on in parallel, I’d like to separate these topics out to prevent commingling topics.

  1. How can we get a data storage protocol that is Mina-aligned (that’s what the first post in this thread is about)

The flexibility to build one’s own data storage solution on Mina is quite nice, particularly if one does not want to be constrained solutions that are perhaps otherwise unoptimised for their application; in a stoic-ish way developing zkApps on Mina does require one to think thrice and act once. However, this is not appealing to every Web3 developer when Ethereum, Solana, Algorand et al. each give developers the luxury of easily storing rather large amounts of data on-chain; its a friendlier on-boarding experience.

What are people’s thoughts on a data storage protocol that is Mina-aligned? How would you desire to interact with it as a developer?

  1. How can we integrate existing DA layers with Mina (Celestia, Avail, EigenDA). Notably, I believe this requires writing a groth16 verifier in Mina

For integrating existing DA layers with Mina, one thing I am not totally sure of is the design pattern/architecture of those zkApps. If people can share their thoughts on what an end-to-end flow would look like (sequence diagrams are welcome!) for integrating existing DA layers with Mina? That would be great to discuss.

Also there is some fantastic work going on in the Navigator’s program on writing a groth16 verifier in o1js: GitHub - onurinanc/o1js-groth16!

  1. How can we have a Mina-aligned DA layer instead of using one of these other DA layers

Same question different scenario as above, if people can share their thoughts on what an end-to-end flow would look like (sequence diagrams are welcome!) for building a Mina-aligned DA layer? That would be great to discuss.

Mithril 2.0 is interesting & naturally I think Mina is well suited for this set of features given its zk-native nature. To extend the question above, what are people’s thoughts on a public-private DA layer; the ability to prove properties about a specific zkApp’s state without revealing specific data…?

3 Likes

We previously made a proposal under zkIgnite for an app-level DA integration that is opt in and non invasive to the chain itself.

We already made a few strides in this direction, but since the proposal didn’t get funder, we weren’t able to continue on it. I’m also attaching an image that outlines the same

The actual architecture has changed a little bit since this diagram was created, I’ll create an updated flow

1 Like

We’re also creating an integration with Avail and are in talks with the Avail team, since proving inclusion in Avail is arguably more efficient when doing so in a Kimchi proof

3 Likes

what about eigen DA? They are primarily based on KZG commitments and are launching soon: Intro to EigenDA: Hyperscale Data Availability for Rollups

1 Like

Here is my solution to the Data Availability Layer. Succinct Labs implemented VectorX as a light Client for Avail’s consensus. The VectorX light client tracks both the state of Avail’s Grandpa consensus and Vector as the data commitments.

BlobstreamX is also a light Client for Celestia’s consensus layer which tracks the data commitments.

By implementing these with o1js, we can track the data commitments. Together with this, a PLONK verifier will be needed.

For example, you can see the latest VectorX proofs from the right: Succinct Platform.

Here is the example proof: Succinct Platform.

You can see an example PLONK verifier here: https://platform-artifacts.f29dee52805d2df0d34ac5b3f297e6bd.r2.cloudflarestorage.com/main/releases/633bae47-2178-4d24-9fae-c40f9b93ac2b/FunctionVerifier.sol?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=9808523fba11f9f61dc8e51a858d537d%2F20240215%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20240215T190308Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=35de9e401f7db9c7001849c0afa249f7d69942c1c612a463045747f511b7c687.

Note that: SuccinctX uses a Plonky2 which is (PLONK + FRI + Goldilocks Field) → wrapping this into (PLONK + FRI + BN254) → GNARK (PLONK + BN254). So, implementing a plonk verifier would solve the problem.

Here is my related repository for the BN254 part: GitHub - onurinanc/o1js-groth16

1 Like