# lc-nl2sql

**Repository Path**: mirrors_GoogleCloudPlatform/lc-nl2sql

## Basic Information

- **Project Name**: lc-nl2sql
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-25
- **Last Updated**: 2026-02-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Long Context Evaluation for NL2SQL

This is not an officially supported Google product. This project is not
eligible for the [Google Open Source Software Vulnerability Rewards
Program](https://bughunters.google.com/open-source-security).

We share the codes, artifacts and experiments for evaluating our long context LLM-based NL2SQL [arxiv](https://arxiv.org/abs/2501.12372). The original experiments were run using `gemini-1.5-pro` and `gemini-1.5-flash` -- via [Vertex AI](https://cloud.google.com/vertex-ai/docs/reference/rest). If you find this work, the experiments and insights useful, please cite (to appear at VLDB25)
```
@article{chung2025long,
  title={Is Long Context All You Need? Leveraging LLM's Extended Context for NL2SQL},
  author={Chung, Yeounoh and Kakkar, Gaurav T and Gan, Yu and Milne, Brenton and Ozcan, Fatma},
  journal={arXiv preprint arXiv:2501.12372},
  year={2025}
}
```

Some utility functions are taken from [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub) and stored under `lc_nl2sql/third_party/` under MIT license.

All the experiments are packaged under `lc_nl2sql/experiments/` as individual shell script.


## Setup
The dependencies are tracked insdie `pyproject.toml`. We recommend `poetry` to manage the pakcages.
```
# at the root of the project
pip install poetry
poetry install
```

The full example generation relies on separate column selection results, we used the LLM-based approach from [CHESS](https://arxiv.org/abs/2405.16755). 
The example generation process in `sql_data_process.py` takes this optional parameter, 
```
--filtered_schema_file lc_nl2sql/data/bird/col_selection_schema.csv
```
The column selection results csv files for different benchmakrs are also provided under `lc_nl2sql/data/{benchmark}/`

For some table retrieval experiments require separate table retrieval results, some were directly dumped from our production pipelien. The example generation process in `sql_data_process.py` takes this optional parameter, 
```
--tbr_selection_file lc_nl2sql/data/bird/crs_dump.json \
```
The results are stored under `lc_nl2sql/data/bird/crs_dump.json` and `lc_nl2sql/data/kaggle/tbr_dump.json`.

We ran experiments for 4 different NL2SQL benchmarks (BIRD, SPIDER 1.0, KaggleDBQA, BEAVER). Running the experiments require setting `input_data_path` (required) `input_table_path` (required)`output_file_path` (required -- where to store the prepared dataset) `db_folder_path` (required) and other optional experiment specific parameters. 
we recommend installing/copying each benchamrk dataset under `lc_nl2sql/data/{benchmark}/` for convenience and compatibility with the existing experiment scripts and the paths.

For data preparation,
```
python lc_nl2sql/data_process/sql_data_process.py \
  --input_data_path lc_nl2sql/data/bird/dev/dev.json \
  --input_table_path lc_nl2sql/data/bird/dev/dev_tables.json \
  --output_file_path "$input_file_sk100" \
  --db_folder_path lc_nl2sql/data/bird/dev/dev_databases \
  --tbr_selection_file lc_nl2sql/data/bird/crs_dump.json \
  --num_col_values 10 \
  --use_hint 1 \
  --filtered_schema_file lc_nl2sql/data/bird/col_selection_schema.csv \
  --use_column_filtering 1 \
  --synthetic_examples 1 \
  --num_examples 100
```

For inference/prediction,
```
python lc_nl2sql/predict/predict.py \
  --predicted_input_filename "$input_file_sk100" \
  --num_beams 10 \
  --temperature 0.5 \
  --use_self_correction 1 \
  --use_disambiguation 1 \
  --db_folder_path lc_nl2sql/data/bird/dev/dev_databases \
  --predicted_out_filename "lc_nl2sql/output/pred/bird_benchmark"
```

## Runnning the experiments
Run the experiment script as follows:
```
poetry bash lc_nl2sql/experiments/bird_benchmark.sh
```
The output and artifacts will be stored under `lc_nl2sql/output/pred/` or as specified in the experiment script (e.g., `--predicted_out_filename "lc_nl2sql/output/pred/bird_benchmark"`).
This also dumps some execution statistics (e.g., latency, n_retries) at the root (e.g., `stats_bird_benchmark.txt`)

## Evaluation
We have added evaluation scripts for different benchmarks. For instance,
```
poetry run bash lc_nl2sql/scripts/process_bird_output.sh bird_benchmark_flash lc_nl2sql/output/pred/
```
will output 
```
Processing file: lc_nl2sql/output/pred//bird_benchmark_flash
start calculate
                     simple               moderate             challenging          total               
count                925                  464                  145                  1534                
====================================== Exec Accuracy =====================================
accuracy             72.00                57.76                53.79                65.97               
===========================================================================================
Finished evaluation
```