Article intro - EndoTracker: Robustly Tracking Any Point in Endoscopic Surgical Scene

    


LNCS International Workshop on Foundation Models for General Medical AI has published "EndoTracker: Robustly Tracking Any Point in Endoscopic Surgical Scene" by Lalithkumar Seenivasan, Joanna Cheng, Jin Fang, Jin Bai, Roger D. Soberanis-Mukul, Jan Emily Mangulabnan, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor & Mathias Unberath.


Abstract
The most exciting frontiers of contemporary endoscopic video processing include tissue and instrument tracking and scene reconstruction together with their applications in surgical navigation, mixed reality visualization, and surgical automation. These tasks are enabled by task-specific techniques such as segmentation, object tracking, structure from motion, simultaneous localization and mapping (SfM and SLAM), and neural rendering. While these techniques vary in purpose and methodology, fundamentally, they rely on the ability to reliably and robustly establish point correspondences across video frames. However, point tracking in endoscopic scenes is hard due to the lack of distinct yet repetitive features, varying illuminations, and continuously changing visual appearance of corresponding points. A dense point-tracking model capable of reliably establishing point correspondences across video frames could catalyze endoscopic video processing and its downstream applications, and drive significant advancement in surgical data science. While any point-tracking foundation models have recently been proposed, they are trained on large simulated and natural scene videos and need to be adapted to endoscopic scenes to address domain-specific challenges. In this work, we present a dense point tracking foundation model for endoscopic scenes by fine-tuning a public foundation model on a large custom dataset comprising 13k endoscopic video sequences (314k frames). We present three benchmark datasets with ground-truth point correspondences to quantitatively evaluate the point tracking performance in variable endoscopic scenes. Through quantitative analysis, we find that models fine-tuned on endoscopic scenes outperform out-of-the-box models, especially when considering conservative thresholds for tracking success, suggesting improved suitability for downstream tasks due to error propagation.

Source: Springer

Comments

Popular Posts