# TACO-Instructions **Repository Path**: droliven/TACO-Instructions ## Basic Information - **Project Name**: TACO-Instructions - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-08 - **Last Updated**: 2025-07-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # :sparkles: [CVPR 2024] TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding :sparkles: [CVPR 2024] Official repository of "TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding". ### :page_with_curl:[Paper](https://arxiv.org/pdf/2401.08399.pdf) | :house:[Project](https://taco2024.github.io/) | :movie_camera:[Video](https://youtu.be/bIgHylU1oZo) | :file_folder:[Dataset (pre-released version)](https://1drv.ms/f/s!Ap-t7dLl7BFUfmNkrHubnoo8LCs?e=1h0Xhe) | :file_folder:[Dataset](https://www.dropbox.com/scl/fo/8w7xir110nbcnq8uo1845/AOaHUxGEcR0sWvfmZRQQk9g?rlkey=xnhajvn71ua5i23w75la1nidx&st=9t8ofde7&dl=0) #### Authors Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi ## Data Instruction ### Pre-released Version The [pre-released version](https://1drv.ms/f/s!Ap-t7dLl7BFUfmNkrHubnoo8LCs?e=1h0Xhe) contains 244 high-quality motion sequences spanning 137 triplets. Please refer to the "Data Visualization" section for data usage. We back up the data at [BaiduNetDisk](https://pan.baidu.com/s/1gANrhzdUyvsUGXcDB4xMfQ?pwd=kg7j). Some of the files are split due to file size limitations. To get the original zip files, please use the following commands: ``` cat Allocentric_RGB_Videos_split.* > Allocentric_RGB_Videos.zip cat Egocentric_Depth_Videos_split.* > Egocentric_Depth_Videos.zip ``` Dataset contents: * **244** high-quality motions sequences spanning **137** `````` triplets * **206** High-resolution object models (10K~100K faces per object mesh) * Hand-object pose and mesh annotations * Egocentric RGB-D videos * **8**-view allocentric RGB videos ### Whole Dataset (Version 1) The [whole dataset (version 1)](https://www.dropbox.com/scl/fo/8w7xir110nbcnq8uo1845/AOaHUxGEcR0sWvfmZRQQk9g?rlkey=xnhajvn71ua5i23w75la1nidx&st=9t8ofde7&dl=0) contains 2317 motion sequences. Please refer to the "Data Visualization" section for data usage. [This link](https://www.dropbox.com/scl/fo/6wux06w26exuqt004eg1a/AM4Ia7pK_b0DURAVyxpHLuY?rlkey=e76q06hyj9yqbahhipmf5ij1o&st=c30zhh8s&dl=0) is a backup of this dataset. Dataset contents: * **2317** motions sequences spanning **151** `````` triplets * **206** High-resolution object models (10K~100K faces per object mesh) * Hand-object pose and mesh annotations * Egocentric RGB-D videos * **12**-view allocentric RGB videos * Camera parameters * Automatic Hand-object 2D segmentations * Marker-removed 12-view allocentric RGB videos If you have questions about the dataset, please contact ```yun-liu22@mails.tsinghua.edu.cn```. ## Data Organization The files of the dataset are organized as follows: ```x |-- Allocentric_RGB_Videos |-- |-- |-- 22070938.mp4 |-- 22139905.mp4 ... |-- ... |-- ... |-- Egocentric_Depth_Videos |-- |-- egocentric_depth.avi |-- ... |-- ... |-- Egocentric_RGB_Videos |-- |-- color.mp4 |-- ... |-- ... |-- Hand_Poses |-- |-- left_hand_shape.pkl left_hand.pkl right_hand_shape.pkl right_hand.pkl |-- ... |-- ... |-- Object_Poses |-- |-- target_.npy tool_.npy |-- ... |-- ... |-- Object_Models |-- 001_cm.obj ... |-- 218_cm.obj |-- Marker_Removed_Allocentric_RGB_Videos |-- |-- |-- 22070938.mp4 |-- 22139905.mp4 ... |-- ... |-- ... ``` ## Data Visualization [1] Environment Setup: Our code is tested on Ubuntu 20.04 with NVIDIA GeForce RTX 3090. The driver version is 535.146.02. The CUDA version is 12.2. Please install the environment using the following commands: ```x conda create -n taco python=3.9 conda activate taco = 1.7.1, we use PyTorch 1.11.0> = 0.6.1, we use PyTorch3D 0.7.2> pip install -r requirements.txt ``` [2] Download [MANO models](https://mano.is.tue.mpg.de/), and put ```MANO_LEFT.pkl``` and ```MANO_RIGHT.pkl``` in the folder ```dataset_utils/manopth/mano/models```. [3] Visualize Hand-Object Poses in a Fixed View: ```x cd dataset_utils python visualization.py --dataset_root --object_model_root --triplet --sequence_name --save_path --device ``` For example, if you select the following data sequence: ```x python visualization.py --dataset_root --object_model_root --triplet "(stir, spoon, bowl)" --sequence_name "20231105_019" --save_path "./example.gif" --device "cuda:0" ``` You can obtain the following visualization result: To visualize hand-object poses in dataset's third-person views, please change the constants ```IMAGE_SIZE```, ```INTRINSIC```, and ```EXTRINSIC``` to the values from the data folder ```Allocentric_Camera_Parameters```. In the json files of the allocentric camera parameters, ```imgSize``` denotes the resolution ([width, height]) of the video, ```K``` denotes the 3x3 camera intrinsic matrix, ```R``` denotes the 3x3 rotation matrix of the camera extrinsic, and ```T``` denotes the 3D translation vector of the camera extrinsic. [4] Visualize Hand-Object Poses in Egocentric Frames: ```x cd dataset_utils python project_pose_to_egocentric_view.py --dataset_root --object_model_root --triplet --sequence_name --save_path --device ``` For example, if you select the following data sequence: ```x python project_pose_to_egocentric_view.py --dataset_root --object_model_root --triplet "(brush, brush, box)" --sequence_name "20231006_163" --save_path "./egocentric_example.gif" --device "cuda:0" ``` You can obtain the following visualization result: [5] Parse Egocentric Depth Videos: Please use the following command for each video: ```x ffmpeg -i -f image2 -start_number 0 -vf fps=fps=30 -qscale:v 2 ``` For example: ```x mkdir ./decode ffmpeg -i ./egocentric_depth.avi -f image2 -start_number 0 -vf fps=fps=30 -qscale:v 2 ./decode/%05d.png ``` Each depth image is a 1920x1080 uint16 array. The depth scale is 1000 (i.e. depth values are stored in millimeters). ## Data Lists We provide three data lists in the folder ```data_lists```: * ```data_lists/v1_overall_data_sequences.txt``` (2317 sequences) is the overall data list of the whole dataset (Version 1) with all data modalities except the egocentric and the marker-removed videos. * ```data_lists/v1_egocentric_data_available_sequences.txt``` (2212 sequences) is the data sequences of the whole dataset (Version 1) where all data modalities are available. * ```v1_overall_data_train_test_split.txt``` (2317 sequences) is the train-test split of the whole dataset (Version 1): + "train" is the training set + "test_1" is the test set S1 (no generalization) + "test_2" is the test set S2 (geometry-level generalization) + "test_3" is the test set S3 (interaction-level generalization) + "test_4" is the test set S4 (compound generalization) * ```v1_marker_removed_data_available_sequences.txt``` (2120 sequences) is the data sequences of the whole dataset (Version 1) where marker-removed allocentric RGB videos are available. ## How to Possess the Real Objects? We purchased the objects from the [Jingdong online mall](https://www.jd.com/) in China. We provide the URLs of most of the objects in ```object_additional_information/object_purchase_list.xlsx```. ## Citation If you find our work useful in your research, please consider citing: ``` @inproceedings{liu2024taco, title={Taco: Benchmarking generalizable bimanual tool-action-object understanding}, author={Liu, Yun and Yang, Haolin and Si, Xu and Liu, Ling and Li, Zipeng and Zhang, Yuxiang and Liu, Yebin and Yi, Li}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={21740--21751}, year={2024} } ``` ## License This work is licensed under a [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/). ## Email If you have any questions, please contact ```yun-liu22@mails.tsinghua.edu.cn```.