UNOBench tutorial

Data, starter kit, and baseline reproduction

This page is a compact guide for getting started with UNOBench. It explains which repository to use for each task and links to the detailed documentation maintained in the method, challenge, and dataset repositories.

Dataset

UNOBench on Hugging Face

Download RGB images, Set-of-Mark images, annotations, challenge queries, and metadata.

Open dataset README
Starter kit

UNOBench Challenge example

Use the minimal runnable examples to verify query files, prediction format, and local evaluators.

Open challenge README
Baseline

UnoGrasp reproduction code

Run UnoGrasp inference and evaluation on the released synthetic small split with public checkpoints.

Open method README

Which repo should I use?

Goal Use this repo Go to details
Download or inspect UNOBench files and metadata. Hugging Face dataset repo Dataset structure
Check challenge input and output format before submitting. Challenge starter kit Challenge quick start
Reproduce UnoGrasp inference and evaluation. UnoGrasp method repo Method quick start

Recommended workflow

1Download the dataset

Start from the Hugging Face dataset repo. It contains the full file list, download commands, archive extraction instructions, and metadata descriptions.

hf download FBK-TeV/UNOBench \
  --repo-type dataset \
  --local-dir ./UNOBench/UNOBenchSyn
See full download instructions

2Choose an evaluation setting

UNOBench supports a Set-of-Mark setting and a natural-language setting. Use the challenge starter kit to understand the exact query and prediction formats.

Setting Input Expected output Details
SoM Set-of-Mark image + query object ID Obstructing object IDs Track 1
NLP RGB image + natural-language description of query object Point coordinates on the obstructing objects Track 2

3Run either the starter kit or the full method

For a format sanity check, run the challenge starter kit evaluators. For reproducing the released UnoGrasp results, use the method repo with the released checkpoints and the synthetic small split.

Dataset at a glance

Both evaluation settings can be used by VLM-based methods. The Set-of-Mark setting also supports non-language classical methods, graph-based reasoning, and modular robotic pipelines.

RGB / NLP input

UNOBench RGB example
Query object red and orange toy drill
Expected target output [(x1, y1)]
Note Any point on the yellow detergent bottle is accepted.

Set-of-Mark input

UNOBench Set-of-Mark example
Query object ID 2
Expected target IDs [3]

Metadata

Adapting Metadata to Your Method

You can use the provided UNOBench metadata to generate datasets tailored to your own method. For example, UnoGrasp uses a prompt-generation script to convert UNOBench metadata into VQA-style data with human instruction prompts and the UnoGrasp system prompt for training and evaluation.