SurgΣ: Large-Scale Multimodal Data and Foundation Models for Surgery
SurgΣ: A Spectrum of Large-Scale Multimodal Data and Foundation Models for Surgical Intelligence "Excited to share SurgΣ — a new collaborative initiative between NUS, CUHK (Qi Dou), SJTU (Yutong Ban), and NVIDIA Cosmos-H-Surgical (Daguang Xu), for the mission to construct large-scale surgical video database with high-quality annotation, and advance multimodal foundation models for surgical intelligence & autonomy SurgΣ consists of a family of complementary foundation models designed to advance surgical understanding, reasoning, and autonomy: • Basic Surgical Action (BSA: https://lnkd.in/gs6ZWm7q) — a unified model capable of recognizing 10 types of basic actions that commonly exist in diverse surgical procedures. • SurgVLM (https://lnkd.in/gDk2uYxq) — a multimodal vision–language model enabling diverse surgical tasks within a unified framework. • Surg-R1 (https://lnkd.in/gy7PJP6z) — a multimodal foundation model with hierarchical reasoning for interpretable decision sup...




.jpg)


