ZK Regex in o1js (Enhancements)

This topic is to discuss the proposal submitted by @Shigoto-dev19
Please see below for the details of the proposal and discussion.

14th June,2024
Current status: Funded
Funding Note: We see this proposal to introduce a package/tooling that could be very useful to build several applications in the ecosystem and thus has a high impact. The builder is part of the ecosystem and has delivered successfully so the risk is classified as low. Budget is inline with the work proposed.

2nd June, 2024
Current status: Under Consideration.
Opened for community discussion on : 02 June, 2024.

Title

Further Development of ZK Regex in o1js

Project Background

  • ZK Regex in o1js was a project developed as part of the Mina Navigators S1 program during March and April 2024.
  • The project evolved from initial research and theoretical learning to a functional circuit compiler by the end of April 2024.
  • This work is inspired by the original zk-regex by zkEmail, a Rust compiler that outputs regex Circom or Halo2 circuits as strings in the terminal given inputs as a JSON file.
  • I can confidently declare that zk-regex in o1js excels in both performance and developer experience (devX) compared to the original work.
  • ZK Regex in o1js outputs a regex circuit with a single command, given a regex pattern and optional sub-patterns to reveal if desired.
  • ZK Regex represents a groundbreaking advancement in zero-knowledge proof ZKP technology, enabling the verification of text data within a ZKP framework.

For more details on the project background, please refer to the project’s README.

Proposal Overview

The zk-regex project was developed in a short amount of time and still has room for improvement, which is the main purpose of this proposal.

Problems

  1. Currently, zk-regex-o1js is a compiler that relies on process arguments to write a specific regex circuit into the terminal.
    • Some users asked to use the exisiting zk-regex as an npm package but so far, this is not possible because the compiler is not structured as an exectuable binary to be published.
    • In order to use the regex compiler, it is required to clone the repo and follow the readme to run the following possible commands.
  2. Some regex patterns require a deep understanding of syntax support.
  3. The compiler code has interconnecting components that make it delicate to change and difficult for other developers to contribute.
  4. It’s challenging to implement conditional control flow on field element processing due to the algebraic nature of zk circuits. Users rely on Provable.switch or Provable.if to build nested flow to validate conditions
    • Example: f1 or f2 or (f1 and f3 and f4) or (range(f2, f3)).

Solutions

  1. Wrap the compiler into a CLI using the oclif/commander package to create a zk-regex CLI and publish the binary as an npm package.
  2. Parse raw regex patterns to specify the exact number of repetitions for certain inputs.
  3. Refactor the compiler code into a class to separate the regex circuit building process and facilitate component tweaking.
  4. Utilize regex theory to operate on field elements instead of UTF-8 bytes. Users can use the regex utility to generate this control flow:
    • Enter regex pattern (f1 | f2 | (f1 f3 f4) | [f2-f3])
    • Copy generated circuit from terminal
    • Paste the circuit as a function inside a zkapp
    • Use the function to validate the control flow
    • Also possible for any desired regex/flow pattern

Impact

  • Adoption: Enhances the Mina ecosystem by providing a tool for processing text/strings inside circuits, bridging many web2 protocols to Mina.
  • Novel Applications: With regex theory operating on finite fields, it is possible to build unprecedented and novel primitives in o1js, expanding the toolkit available for developers.
    • For example, the (a|b|[0-9]) regex pattern matches the character a alone, b alone, or any digit between 0 and 9.
    • The current compiler matches the UTF8 bytes of the pattern characters, in this case, 61 for a, 62 for b, and the range between 30 and 39 for UTF8 digits.
    • The circuit accepts the bytes as UInt8 field elements, but the compiler can be upgraded to process full field elements or Bool inputs for weaving complex multiplexers or any other pattern following DFA theory.
    • In this example, if we replace a and b with certain field elements, i.e., f1 for a and f2 for b, and 0 and 9 with f3 and f4, we create a multiplexer that returns Bool(true) for the equality of f1 or f2 alone or any field in the range between f3 and f4.
    • This is a simple example, but imagine the extent of implementations that DFA inside a circuit can open up for developers on Mina.
  • Attraction: zk-regex will attract developers to Mina by offering ready-to-use primitives for deploying web2 protocols.
  • Innovation: zk-regex in o1js is the first adaptation of the original zk-regex in a different proof system, following the release of work in Circom and Halo2 by the zkEmail team.

Audience

  • Developers in the Mina ecosystem.

Architecture & Design

Detailed Design/Architecture

  • For development workflow details, please refer to the project’s readme.

Vision

  • Currently, the compiler outputs a regex circuit in the terminal as a string to be copied.
    • Ideally, the compiler would output generic circuits at runtime.
    • Develop a generic compiler that operates on arbitrary field elements or other types (UInt32, Bool, UInt64, etc.) in o1js.
    • This proposal is a step toward realizing this vision.
  • Support rich regex syntax similar to common programming languages.

Existing Work

Production Timeline

  • The project is already operational in o1js.
  • Requesting funding for one month, with the possibility of extending with a second proposal depending on future progress/changes.

Budget & Milestones

Deliverables

  • An npm package with an executable to generate zk-regex circuits in o1js based on various regex patterns.
  • Expanded regex pattern syntax support:
    • Specific range grouping (e.g., [b-x]).
    • Specific repetitions (e.g., 2, 3, 5 times).
  • Refactor the compiler code into a class to separate the regex circuit building process.
  • Innovative example of a regex circuit operating on full-size field elements or Bool instead of UTF-8 bytes.
  • Updated documentation for easy usability and adoption.

Project Timeline

  • 1 month.

Budget Requested

  • 8,000 MINA.

Wallet Address

  • B62qpeTKjVfijTqEmpBmCej2kj2oEpoMTWrbt1U48EZEz22aP9FsFvX

Team Info

Proposer Github

Proposer Experience

  • Built various projects during Navigators S1:
    • sha256-o1js.
    • mina-battleships.
    • zk-regex-o1js.
  • Conducted a mina-mastermind project during the Encode: ZK Bootcamp.
  • Building zkEmail and its primitives in o1js.

Team Members

  • Mahmoud Kaffel.

Achievements

  • ZK Hack Istanbul “Chewing Glass” prize winner for building technically difficult projects (sha256-o1js). Also won the o1Labs prize for the crypto primitives category.
  • Navigators S1 badge.
  • zkIgnite cohort3 grantee for building zkEmail in o1js.

Risks & Mitigations

  • Complexity: zk-regex development is delicate.
  • Syntax Expansion: Expanding syntax support might require significant changes to AST parsing and extensive testing.
  • Code Refactoring: Wrapping compiler code might disrupt component connections; if issues arise, changes will be reverted to maintain functionality.

So the improvements will allow to derive functions logically composed of simpler regex functions by extending the currenly supported syntax instead of dealing with o1js.Provable control flow functions?

For a complicated control flow, it’s possible to use a simple regex pattern and compile a function instead of weaving the logic manually with Provable.if or Provable.switch APIs.

Expanding the regex syntax is a separate deliverable that will aid in abstracting long or repetitive regex patterns.

I hope this clarifies things! :sunny:

hey, great work!

I wanted to understand bit more why there is need to make the package as CLI ?
Maybe I’m oversimplifying it but if a developer want to create some particular circuit based on some pattern, ‘in-line’ code could potentially create it.

Maybe that can be somehow useful

regardless even as CLI I see it as very valuable package !

Good point! Thank you for the comment! This question was the motivation to add a CLI deliverable to the proposal.

I’m not sure if it’s possible to compile binary executable files from the ‘Compiler API’ link you attached. Before the proposal, I even tried using external packages to generate binary files from the existing TS regex compiler files to publish as npm packages, but they didn’t work and things got chaotic.

Initially, my intention was to have zk-regex in o1js as executable ‘in-line’ code. This would allow Mina devs to execute the regex commands as shown in the project’s readme, offering greater flexibility and integration within a project’s codebase.

However, some devs requested the project as an npm package. To accommodate this, the regex compiler would be better suited as a binary executable CLI. CLI packages like oclif or commander offer cross-platform commands in npm package scripts, enhancing usability.

Additionally, a CLI provides better in-terminal documentation, abstracts complexity, and is more practical for developers. For example, instead of opening the codebase and executing a command with a specific regex pattern, a global executable package would streamline the process and be more convenient.

All in all, while ‘in-line’ code is sufficient and operational, the proposed CLI package will be more practical, readable, and easier to use.

This application is approved for Funding.

We see this proposal to introduce a package/tooling that could be very useful to build several applications in the ecosystem and thus has a high impact. The builder is part of the ecosystem and has delivered successfully so the risk is classified as low. Budget is inline with the work proposed.

1 Like