Surgical robotics beyond NVIDIA GTC
"Training a surgical robot is one of the hardest problems in medicine. Real surgeries are rare, unpredictable, and ethically irreplaceable. And traditional simulators? They're hand-coded, they can't realistically replicate how tissue stretches, bleeds, or reacts to a blade. So NVIDIA solved two problems at once. First, they built the world's largest open surgical robotics dataset. It's called Open-H.
- 26,500+ surgical task demonstrations
- Nearly 5 million video-and-motion frame pairs
- Collected across 9 robot platforms and 10+ institutions worldwide.
This is real surgical knowledge, how tools move, how tissue responds, how procedures actually unfold , captured, standardised, and made available for AI to learn from. But a dataset alone isn't enough. So they built an engine to turn that data into infinite simulations. So NVIDIA built Cosmos-H. Cosmos-H is a world model for surgery. Give it a single frame of a surgical video and tell it exactly how the robot is going to move and it shows you what happens next. Tissue deforming. Tools interacting. Smoke, reflections, blood.. all rendered realistically. Not from rules. From learning. It learned surgery the way humans do by watching a lot of it.
Here's what that unlocks: Instead of running 600 test scenarios on a physical surgical robot over 2 days, you can run them in simulation in 40 minutes. CMR Surgical is already doing this with their Versius robot, generating synthetic surgical sequences entirely in simulation, before anything goes near a patient.
Open-H gave the AI real-world surgical knowledge at scale. Cosmos-H turns that knowledge into an engine that generates endless realistic scenarios on demand."
- 26,500+ surgical task demonstrations
- Nearly 5 million video-and-motion frame pairs
- Collected across 9 robot platforms and 10+ institutions worldwide.
This is real surgical knowledge, how tools move, how tissue responds, how procedures actually unfold , captured, standardised, and made available for AI to learn from. But a dataset alone isn't enough. So they built an engine to turn that data into infinite simulations. So NVIDIA built Cosmos-H. Cosmos-H is a world model for surgery. Give it a single frame of a surgical video and tell it exactly how the robot is going to move and it shows you what happens next. Tissue deforming. Tools interacting. Smoke, reflections, blood.. all rendered realistically. Not from rules. From learning. It learned surgery the way humans do by watching a lot of it.
Here's what that unlocks: Instead of running 600 test scenarios on a physical surgical robot over 2 days, you can run them in simulation in 40 minutes. CMR Surgical is already doing this with their Versius robot, generating synthetic surgical sequences entirely in simulation, before anything goes near a patient.
Open-H gave the AI real-world surgical knowledge at scale. Cosmos-H turns that knowledge into an engine that generates endless realistic scenarios on demand."
"Cosmos-H-Surgical-Predict is a fine-tuned world foundation model for surgical robotics applications, built on NVIDIA's Cosmos platform. Fine-tuned from Cosmos-Predict2.5-2B, the model takes a first frame image and a text description as input and predicts the next 92 future frames of surgical video. This enables synthetic data generation (SDG) for training downstream policy models for surgical robotics. The model is adapted from the Cosmos foundation model to the surgical domain. The functionality is the same as the original Cosmos-Predict2.5-2B, with one key difference: Cosmos-H-Surgical-Predict drops text-only video generation and requires a first frame image as input alongside the text description.
Cosmos-H-Surgical-Transfer is a fine-tuned world foundation model for surgical robotics applications, built on NVIDIA's Cosmos platform. Fine-tuned from Cosmos-Transfer2.5-2B, the model transfers control input videos (depth maps, segmentation masks, edge maps, or blurred RGB) into photorealistic surgical videos. This bridges the simulation-to-real (sim2real) gap by converting synthetic/CG-rendered videos into photorealistic equivalents."
Cosmos-H-Surgical-Transfer is a fine-tuned world foundation model for surgical robotics applications, built on NVIDIA's Cosmos platform. Fine-tuned from Cosmos-Transfer2.5-2B, the model transfers control input videos (depth maps, segmentation masks, edge maps, or blurred RGB) into photorealistic surgical videos. This bridges the simulation-to-real (sim2real) gap by converting synthetic/CG-rendered videos into photorealistic equivalents."
Source: Huggingface, LinkedIn



Comments