BioProVLA-Agent

Affordable, protocol-driven, vision-enhanced embodied multi-agent system using Vision-Language-Action (VLA) models for biological laboratory manipulation.

   
Affiliation Key Laboratory of Smart Manufacturing in Energy Chemical Process and Department of Computer Science and Engineering, East China University of Science and Technology, with Ruijin Hospital (Shanghai Jiao Tong University School of Medicine) and Shihezi University
First introduced 2026-05 (arXiv:2605.07306)
Lifecycle stages Experiment design (protocol-to-execution decomposition with closed-loop verification on physical labware)
Autonomy level Semi-autonomous — closed-loop-capable with explicit human-intervention triggers at verification failure points
Domain focus Biological wet-lab manipulation (transparent and reflective labware)
Availability Unknown — no repository announced in the paper

Approach

BioProVLA-Agent uses natural-language biological protocols as the task interface and couples protocol understanding with closed-loop reasoning and embodied execution. Four cooperating agents:

  • Guiding Decision Agent — coordinates task scheduling, execution flow, retry decisions, and exception handling.
  • Tailored LLM Protocol Agent — transforms unstructured biological protocols into executable and verifiable subtask units with action instructions, preconditions, completion criteria, and knowledge-based indices.
  • VLM-RAG Verification Agent — reasons over real-time visual observations, robot states, retrieved operation knowledge, and reference success/failure examples to assess task readiness and completion before and after execution.
  • VLA Embodied Agent — executes verified subtasks via a lightweight VLA policy.

To address wet-lab visual perturbations (transparent labware, specular reflections, illumination shifts, overexposure), the authors introduce AugSmolVLA, an online visual augmentation strategy applied during fine-tuning rather than as a separate offline pipeline. The system runs on a low-cost robotic platform (~US$800–850 hardware).

Validation

Hierarchical biological manipulation benchmark covering 15 atomic tasks, 6 composite workflows, and 3 representative bimanual tasks (centrifuge-tube loading, tube sorting, waste disposal, cap twisting, liquid pouring). Evaluated across normal and high-exposure visual settings against ACT, X-VLA, and original SmolVLA baselines.

Notable results

  • AugSmolVLA improves execution stability over ACT, X-VLA, and the original SmolVLA across normal and high-exposure settings, with pronounced gains in precise placement, transparent-object manipulation, composite workflows, and visually degraded scenes.
  • Demonstrates closed-loop, protocol-centered manipulation on a sub-US$1000 hardware platform.
  • Adds a vision-enabled, affordable embodied wet-lab counterpart to higher-cost autonomous laboratory systems.

Primary paper

Du et al., “BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation,” arXiv:2605.07306 (2026).

Other references

None yet.

Code

Unknown — no repository announced in the paper.