# RegSet **Repository Path**: mirrors_allenai/RegSet ## Basic Information - **Project Name**: RegSet - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-05-01 - **Last Updated**: 2026-02-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # RegSet ## Datasets The `data` directory contains `.jsonl` files with train/dev/test splits of the Exploration, Hard, and Mixed datasets. ## Scripts This repository also contains a number of scripts for generating data and computing attributes of data instances. The python scripts in the root directory are as follows: - `exps2.py` provides definintions of the `regex` type as well as functions for enumerating regexs (for use in sampling) and computing properties. - `dfa2.py` provides definitions of the `dfa` (Deterministic Finite Automaton) type as well methods for computing properties of regular languages and converting from `regex` to `dfa`. - `sample_v2.py` generates a cache of `regex`s and the Exploration set. - `make_hard_set.py` provides a script to generate our Hard dataset from the cached `regex`s not used in the Exploration training set. - `compute_properties.py` provides additional helper functions for computing attributes. - `cache_loader.py` provides utilities for loading cached regexs generated by `sample_v2.py`. Generate the Exploration dataset: ``` python sample_v2.py \ -d \ [--depths ] ``` Generate the Hard dataset: ``` python make_hard_set.py ```