BioProVLA-Agent
Affordable, protocol-driven, vision-enhanced embodied multi-agent system using Vision-Language-Action (VLA) models for biological laboratory manipulation.
| Affiliation | Key Laboratory of Smart Manufacturing in Energy Chemical Process and Department of Computer Science and Engineering, East China University of Science and Technology, with Ruijin Hospital (Shanghai Jiao Tong University School of Medicine) and Shihezi University |
| First introduced | 2026-05 (arXiv:2605.07306) |
| Lifecycle stages | Experiment design (protocol-to-execution decomposition with closed-loop verification on physical labware) |
| Autonomy level | Semi-autonomous — closed-loop-capable with explicit human-intervention triggers at verification failure points |
| Domain focus | Biological wet-lab manipulation (transparent and reflective labware) |
| Availability | Unknown — no repository announced in the paper |
Approach
BioProVLA-Agent uses natural-language biological protocols as the task interface and couples protocol understanding with closed-loop reasoning and embodied execution. Four cooperating agents:
- Guiding Decision Agent — coordinates task scheduling, execution flow, retry decisions, and exception handling.
- Tailored LLM Protocol Agent — transforms unstructured biological protocols into executable and verifiable subtask units with action instructions, preconditions, completion criteria, and knowledge-based indices.
- VLM-RAG Verification Agent — reasons over real-time visual observations, robot states, retrieved operation knowledge, and reference success/failure examples to assess task readiness and completion before and after execution.
- VLA Embodied Agent — executes verified subtasks via a lightweight VLA policy.
To address wet-lab visual perturbations (transparent labware, specular reflections, illumination shifts, overexposure), the authors introduce AugSmolVLA, an online visual augmentation strategy applied during fine-tuning rather than as a separate offline pipeline. The system runs on a low-cost robotic platform (~US$800–850 hardware).
Validation
Hierarchical biological manipulation benchmark covering 15 atomic tasks, 6 composite workflows, and 3 representative bimanual tasks (centrifuge-tube loading, tube sorting, waste disposal, cap twisting, liquid pouring). Evaluated across normal and high-exposure visual settings against ACT, X-VLA, and original SmolVLA baselines.
Notable results
- AugSmolVLA improves execution stability over ACT, X-VLA, and the original SmolVLA across normal and high-exposure settings, with pronounced gains in precise placement, transparent-object manipulation, composite workflows, and visually degraded scenes.
- Demonstrates closed-loop, protocol-centered manipulation on a sub-US$1000 hardware platform.
- Adds a vision-enabled, affordable embodied wet-lab counterpart to higher-cost autonomous laboratory systems.
Primary paper
Other references
None yet.
Code
Unknown — no repository announced in the paper.