SurgΣ: Large-Scale Multimodal Data and Foundation Models for Surgery
SurgΣ: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence
"Excited to share SurgΣ — a new collaborative initiative between NUS, CUHK (Qi Dou), SJTU (Yutong Ban), and NVIDIA Cosmos-H-Surgical (Daguang Xu), for the mission to construct large-scale surgical video database with high-quality annotation, and advance multimodal foundation models for surgical intelligence & autonomy
SurgΣ consists of a family of complementary foundation models designed to advance surgical understanding, reasoning, and autonomy:
• Basic Surgical Action (BSA: https://lnkd.in/gs6ZWm7q) — a unified model capable of recognizing 10 types of basic actions that commonly exist in diverse surgical procedures.
• SurgVLM (https://lnkd.in/gDk2uYxq) — a multimodal vision–language model enabling diverse surgical tasks within a unified framework.
• Surg-R1 (https://lnkd.in/gy7PJP6z) — a multimodal foundation model with hierarchical reasoning for interpretable decision support.
• Cosmos-H-Surgical (https://lnkd.in/gnigeNDS) — a surgical world model enabling scalable robot policy learning from surgical videos.
These are built upon SurgΣ-DB, a large-scale database with high-quality annotations for diverse tasks (~5.98M multimodal conversations across 18 surgical tasks in the current first version).
More details: https://lnkd.in/g_NP7jJn"
Source: LinkedIn



Comments