SurgΣ: Large-Scale Multimodal Data and Foundation Models for Surgery

SurgΣ: Large-Scale Multimodal Data and Foundation Models for Surgery

SurgΣ: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence

"Excited to share SurgΣ — a new collaborative initiative between NUS, CUHK (Qi Dou), SJTU (Yutong Ban), and NVIDIA Cosmos-H-Surgical (Daguang Xu), for the mission to construct large-scale surgical video database with high-quality annotation, and advance multimodal foundation models for surgical intelligence & autonomy
SurgΣ consists of a family of complementary foundation models designed to advance surgical understanding, reasoning, and autonomy:

• Basic Surgical Action (BSA: https://lnkd.in/gs6ZWm7q) — a unified model capable of recognizing 10 types of basic actions that commonly exist in diverse surgical procedures.

• SurgVLM (https://lnkd.in/gDk2uYxq) — a multimodal vision–language model enabling diverse surgical tasks within a unified framework.

• Surg-R1 (https://lnkd.in/gy7PJP6z) — a multimodal foundation model with hierarchical reasoning for interpretable decision support.

• Cosmos-H-Surgical (https://lnkd.in/gnigeNDS) — a surgical world model enabling scalable robot policy learning from surgical videos.

These are built upon SurgΣ-DB, a large-scale database with high-quality annotations for diverse tasks (~5.98M multimodal conversations across 18 surgical tasks in the current first version).

More details: https://lnkd.in/g_NP7jJn"

Source: LinkedIn

Comments