ZK Regex in o1js (Enhancements)

This topic is to discuss the proposal submitted by @Shigoto-dev19
Please see below for the details of the proposal and discussion.

16th July,2024
The end date of this project is extended to 22nd, July after discussing with the proposer

14th June,2024
Current status: Funded
Funding Note: We see this proposal to introduce a package/tooling that could be very useful to build several applications in the ecosystem and thus has a high impact. The builder is part of the ecosystem and has delivered successfully so the risk is classified as low. Budget is inline with the work proposed.

2nd June, 2024
Current status: Under Consideration.
Opened for community discussion on : 02 June, 2024.

Title

Further Development of ZK Regex in o1js

Project Background

  • ZK Regex in o1js was a project developed as part of the Mina Navigators S1 program during March and April 2024.
  • The project evolved from initial research and theoretical learning to a functional circuit compiler by the end of April 2024.
  • This work is inspired by the original zk-regex by zkEmail, a Rust compiler that outputs regex Circom or Halo2 circuits as strings in the terminal given inputs as a JSON file.
  • I can confidently declare that zk-regex in o1js excels in both performance and developer experience (devX) compared to the original work.
  • ZK Regex in o1js outputs a regex circuit with a single command, given a regex pattern and optional sub-patterns to reveal if desired.
  • ZK Regex represents a groundbreaking advancement in zero-knowledge proof ZKP technology, enabling the verification of text data within a ZKP framework.

For more details on the project background, please refer to the project’s README.

Proposal Overview

The zk-regex project was developed in a short amount of time and still has room for improvement, which is the main purpose of this proposal.

Problems

  1. Currently, zk-regex-o1js is a compiler that relies on process arguments to write a specific regex circuit into the terminal.
    • Some users asked to use the exisiting zk-regex as an npm package but so far, this is not possible because the compiler is not structured as an exectuable binary to be published.
    • In order to use the regex compiler, it is required to clone the repo and follow the readme to run the following possible commands.
  2. Some regex patterns require a deep understanding of syntax support.
  3. The compiler code has interconnecting components that make it delicate to change and difficult for other developers to contribute.
  4. It’s challenging to implement conditional control flow on field element processing due to the algebraic nature of zk circuits. Users rely on Provable.switch or Provable.if to build nested flow to validate conditions
    • Example: f1 or f2 or (f1 and f3 and f4) or (range(f2, f3)).

Solutions

  1. Wrap the compiler into a CLI using the oclif/commander package to create a zk-regex CLI and publish the binary as an npm package.
  2. Parse raw regex patterns to specify the exact number of repetitions for certain inputs.
  3. Refactor the compiler code into a class to separate the regex circuit building process and facilitate component tweaking.
  4. Utilize regex theory to operate on field elements instead of UTF-8 bytes. Users can use the regex utility to generate this control flow:
    • Enter regex pattern (f1 | f2 | (f1 f3 f4) | [f2-f3])
    • Copy generated circuit from terminal
    • Paste the circuit as a function inside a zkapp
    • Use the function to validate the control flow
    • Also possible for any desired regex/flow pattern

Impact

  • Adoption: Enhances the Mina ecosystem by providing a tool for processing text/strings inside circuits, bridging many web2 protocols to Mina.
  • Novel Applications: With regex theory operating on finite fields, it is possible to build unprecedented and novel primitives in o1js, expanding the toolkit available for developers.
    • For example, the (a|b|[0-9]) regex pattern matches the character a alone, b alone, or any digit between 0 and 9.
    • The current compiler matches the UTF8 bytes of the pattern characters, in this case, 61 for a, 62 for b, and the range between 30 and 39 for UTF8 digits.
    • The circuit accepts the bytes as UInt8 field elements, but the compiler can be upgraded to process full field elements or Bool inputs for weaving complex multiplexers or any other pattern following DFA theory.
    • In this example, if we replace a and b with certain field elements, i.e., f1 for a and f2 for b, and 0 and 9 with f3 and f4, we create a multiplexer that returns Bool(true) for the equality of f1 or f2 alone or any field in the range between f3 and f4.
    • This is a simple example, but imagine the extent of implementations that DFA inside a circuit can open up for developers on Mina.
  • Attraction: zk-regex will attract developers to Mina by offering ready-to-use primitives for deploying web2 protocols.
  • Innovation: zk-regex in o1js is the first adaptation of the original zk-regex in a different proof system, following the release of work in Circom and Halo2 by the zkEmail team.

Audience

  • Developers in the Mina ecosystem.

Architecture & Design

Detailed Design/Architecture

  • For development workflow details, please refer to the project’s readme.

Vision

  • Currently, the compiler outputs a regex circuit in the terminal as a string to be copied.
    • Ideally, the compiler would output generic circuits at runtime.
    • Develop a generic compiler that operates on arbitrary field elements or other types (UInt32, Bool, UInt64, etc.) in o1js.
    • This proposal is a step toward realizing this vision.
  • Support rich regex syntax similar to common programming languages.

Existing Work

Production Timeline

  • The project is already operational in o1js.
  • Requesting funding for one month, with the possibility of extending with a second proposal depending on future progress/changes.

Budget & Milestones

Deliverables

  • An npm package with an executable to generate zk-regex circuits in o1js based on various regex patterns.
  • Expanded regex pattern syntax support:
    • Specific range grouping (e.g., [b-x]).
    • Specific repetitions (e.g., 2, 3, 5 times).
  • Refactor the compiler code into a class to separate the regex circuit building process.
  • Innovative example of a regex circuit operating on full-size field elements or Bool instead of UTF-8 bytes.
  • Updated documentation for easy usability and adoption.

Project Timeline

  • 1 month.

Budget Requested

  • 8,000 MINA.

Wallet Address

  • B62qpeTKjVfijTqEmpBmCej2kj2oEpoMTWrbt1U48EZEz22aP9FsFvX

Team Info

Proposer Github

Proposer Experience

  • Built various projects during Navigators S1:
    • sha256-o1js.
    • mina-battleships.
    • zk-regex-o1js.
  • Conducted a mina-mastermind project during the Encode: ZK Bootcamp.
  • Building zkEmail and its primitives in o1js.

Team Members

  • Mahmoud Kaffel.

Achievements

  • ZK Hack Istanbul “Chewing Glass” prize winner for building technically difficult projects (sha256-o1js). Also won the o1Labs prize for the crypto primitives category.
  • Navigators S1 badge.
  • zkIgnite cohort3 grantee for building zkEmail in o1js.

Risks & Mitigations

  • Complexity: zk-regex development is delicate.
  • Syntax Expansion: Expanding syntax support might require significant changes to AST parsing and extensive testing.
  • Code Refactoring: Wrapping compiler code might disrupt component connections; if issues arise, changes will be reverted to maintain functionality.

So the improvements will allow to derive functions logically composed of simpler regex functions by extending the currenly supported syntax instead of dealing with o1js.Provable control flow functions?

For a complicated control flow, it’s possible to use a simple regex pattern and compile a function instead of weaving the logic manually with Provable.if or Provable.switch APIs.

Expanding the regex syntax is a separate deliverable that will aid in abstracting long or repetitive regex patterns.

I hope this clarifies things! :sunny:

hey, great work!

I wanted to understand bit more why there is need to make the package as CLI ?
Maybe I’m oversimplifying it but if a developer want to create some particular circuit based on some pattern, ‘in-line’ code could potentially create it.

Maybe that can be somehow useful

regardless even as CLI I see it as very valuable package !

Good point! Thank you for the comment! This question was the motivation to add a CLI deliverable to the proposal.

I’m not sure if it’s possible to compile binary executable files from the ‘Compiler API’ link you attached. Before the proposal, I even tried using external packages to generate binary files from the existing TS regex compiler files to publish as npm packages, but they didn’t work and things got chaotic.

Initially, my intention was to have zk-regex in o1js as executable ‘in-line’ code. This would allow Mina devs to execute the regex commands as shown in the project’s readme, offering greater flexibility and integration within a project’s codebase.

However, some devs requested the project as an npm package. To accommodate this, the regex compiler would be better suited as a binary executable CLI. CLI packages like oclif or commander offer cross-platform commands in npm package scripts, enhancing usability.

Additionally, a CLI provides better in-terminal documentation, abstracts complexity, and is more practical for developers. For example, instead of opening the codebase and executing a command with a specific regex pattern, a global executable package would streamline the process and be more convenient.

All in all, while ‘in-line’ code is sufficient and operational, the proposed CLI package will be more practical, readable, and easier to use.

This application is approved for Funding.

We see this proposal to introduce a package/tooling that could be very useful to build several applications in the ecosystem and thus has a high impact. The builder is part of the ecosystem and has delivered successfully so the risk is classified as low. Budget is inline with the work proposed.

1 Like

Here is the mid-point progress report for the zk-regex enhancements:

  • Added Features:

    • Implemented support for specific range expressions.
    • Adapted negation and escape patterns for these ranges.
    • Added support for specific repetitions, e.g., {3}.
  • Enhanced Meta Character Support:

    • Refined and added support for various meta characters, including:
      • \w for word characters
      • \W for non-word characters
      • \d for digits
      • \D for non-digits
      • \s for whitespace characters
      • \S for non-whitespace characters
  • Ensured Comprehensive Syntax Parsing:

    • Verified that all syntax parsing components work together seamlessly.
    • Examples of supported patterns include:
      • [a-s5-9]{3}
      • [^A-X0-3]
      • [^c-g5-9A-F]{2}
      • \d\w{5}
      • [5-8]\W[c-f]{5}[^A-D]
  • Performance Improvements:

    • Improved the performance of substring tests, reducing execution time by more than half.
  • Code Refinements:

    • Refined notations of DFA-related terms.
    • Restructured and refactored compiler code into a class.
    • Refactored substring tests.
2 Likes

The end date of this project is extended to 22nd, July after discussing with the proposer

1 Like

Repository Link: GitHub - Shigoto-dev19/zk-regex-o1js: A library for verifying zero-knowledge regular expressions on the Mina blockchain.

With these milestones achieved, the deliverables of this proposal are successfully completed. Here is a report detailing the progress and enhancements made after the midpoint milestones:

  • ZK Regex CLI Development

    • Build a CLI with the same functionality as the zk-regex script.
    • Split reveal options into two variadic options for better collection of inputs for subpatterns or manually entered transitions.
    • Add a checker to validate manually entered reveal transitions:
      • Previously, the compiler wouldn’t complain if some transitions were entered that didn’t exist among the full pattern transitions.
      • It used to attach such transition pairs but would give an error when using the circuit, as non-existent transitions would cause the circuit to throw an error.
      • This was impractical and allowed room for user mistakes.
      • Now, the checker will complain if a non-existent transition pair is entered, logging the first invalid transition pair in the terminal.
      • It will also log all the transitions of the full regex pattern to provide insight into valid transitions that can be revealed.
      • This keeps the user informed about valid reveal transitions and prevents the entry of invalid transitions.
    • Add a --functionName option to specify the function name for the compiled regex circuit:
      • Previously, users would manually add the function name in the TypeScript file where the regex function body was pasted.
      • With this CLI option, the circuit is compiled directly with the functionName specified.
    • Add a --filePath option to manage file output for the compiled regex circuit:
      • Requires the --functionName option to specify the function name for correct integration.
      • If the file already exists, only the compiled regex circuit will be appended.
      • If the file does not exist, it will be created with an import statement for required types and the regex circuit will be added.
      • Previously, users had to manually create the file, import the necessary statements, and paste the compiled regex circuit code.
    • Capture and prepend the command used to execute the CLI as a comment in the compiled regex function:
      • This adds the command used to run the CLI as a comment at the top of the compiled regex circuit function.
      • It helps automate and document the command execution, making it easier to record and reproduce zk-regex commands.
  • Add a new shell script to automate the compilation of regex examples:

    • The ./src/compileExamples.sh script clears the existing examples file and writes all updated regex circuits.
    • This simplifies recompiling regex circuits following updates to the compiler or CLI changes.
    • Running the tests after executing the script ensures the integrity of the changes.
    • Previously, the process involved manually compiling, copying, and pasting circuits into the examples file and then running tests, which was time-consuming and error-prone.
    • The new script streamlines the process, making it quick and efficient to update examples and verify changes with no more manual effort.
  • Add various field regex examples that operate on full field elements, demonstrating how regex DFA theory can be leveraged to create different primitives for processing field arrays and handling conditional control flow:

    • This includes inclusionRegex for matching ranges and specific exact values.
    • sequenceRegex to validate the existence of a sequence of fields or a subarray within a field array.
    • countInRangeRegex for counting values within a specified range in an array of fields.
    • excludeRegex to validate that an array of fields does NOT contain particular field elements.
    • Unit tests to showcase the integrity and usage of these functionalities.
  • Update README

    • Update instructions following CLI commands.
    • Add more details on instructions and exception handling.
    • Update graphs to reflect the latest changes.
  • Publish NPM package containing the ZK Regex CLI

    • Install the package by running npm i -g zk-regex-o1js.
    • Use zkr or zk-regex as a binary in the terminal.