自学内容网 自学内容网

SurgiTrack:外科手术视频中的细粒度多类别多工具跟踪|文献速递-视觉大模型医疗图像应用|文献速递-视觉大模型医疗图像应用

Title

题目

SurgiTrack: Fine-grained multi-class multi-tool tracking in surgical videos

SurgiTrack:外科手术视频中的细粒度多类别多工具跟踪

01

文献速递介绍

手术器械跟踪在计算机辅助手术系统中发挥着至关重要的作用,可为一系列应用提供有价值的支持,包括技能评估(Pedrett 等人,2023)、视觉伺服(Xu 等人,2023)、导航(Xu 等人,2022)、腹腔镜定位(Dutkiewicz 等人,2005)、安全和风险区域评估(Richa 等人,2011)以及增强现实(Martin-Gomez 等人,2023)。相比于仅在单帧图像中识别目标器械的器械检测,器械跟踪更进一步,还包括在视频后续帧中对器械位置的估计和预测。

传统的器械跟踪依赖于基于颜色、纹理、SIFT 和几何特征的传统机器学习方法(Pezzementi 等人,2009;Sznitman 等人,2012;Alsheakhali 等人,2015;Dockter 等人,2014;Du 等人,2016)。近年来,深度学习的进展(Bouget 等人,2017;Lee 等人,2019;Nwoye 等人,2019;Zhao 等人,2019a,b;Robu 等人,2021;Nwoye,2021;Fathollahi 等人,2022;Wang 等人,2022;Rueckert 等人,2023)引领了一个新时代,使得可以提取更具鲁棒性的特征来实现器械重新识别(re-ID)。尽管取得了显著进展,但仍存在诸多挑战。现有研究主要集中在单器械跟踪(Zhao 等人,2019b)、单类别多器械跟踪(Fathollahi 等人,2022)或多类别单器械跟踪(Nwoye 等人,2019)。然而,在实际手术场景中,通常会同时使用多个类别的器械,这需要多类别多器械的跟踪,这一领域因缺乏必要的数据集而未得到充分探索。最近,一个名为 CholecTrack20 的新数据集(Nwoye 等人,2023)被引入,为多类别多器械跟踪提供了所需的支持。该数据集还定义了三种不同的轨迹视角:(1) 器械在手术过程中的全生命周期,(2) 器械在体内的循环过程,以及 (3) 器械在摄像机视野内的可见时长(如图 1 所示)。同时在这三种视角下跟踪器械被称为多视角跟踪。CholecTrack20 数据集提供了丰富的多视角跟踪标注,可适应多样化的手术需求,但迄今为止尚未有深度学习模型在该数据集上用于自动器械跟踪。为开发一种适用于手术视频中多视角多类别多器械跟踪的方法,我们首先在 CholecTrack20 数据集上对 10 种最先进的检测方法进行基准测试,并对适用于手术领域的 re-ID 方法进行了广泛的消融研究。re-ID 模块在管理手术视频中器械身份的时间一致性方面起着关键作用。然而,由于器械的复杂运动模式、频繁遮挡以及手术场景中有限的视野范围,挑战依然存在。特别是当多个同类器械实例具有相同的外观特征时,在器械被遮挡、移出摄像机视野或重新插入手术场景后重新识别它们是一项艰巨的任务。

与现有方法不同,我们的初步实验表明,仅依赖器械外观线索进行轨迹区分并不理想,尤其是在区分同一类别的实例时。为了解决这一问题,我们引入了领域知识,特别是器械的使用模式和器械操作员的信息。后者,即器械操作员,指的是操作器械的外科医生的手部动作,在区分同类器械实例时比外观特征更为准确。然而,手术内镜图像中并未直接观察到操作员信息,这使得其自动预测成为一项挑战。受到这些发现的启发,我们提出了一种名为 SurgiTrack 的新型深度学习方法用于手术器械跟踪。SurgiTrack 将器械操作员的动作近似为器械的起始方向,并采用注意力机制对器械运动方向进行编码,有效模拟不可见的外科医生手部动作或穿刺点的位置,用于器械重新识别。我们的模型设计允许方向估计器在没有操作员标签的数据集上进行自监督学习,其性能可与有监督方法相媲美。这一技术确保了我们的方法可以在缺乏操作员标签的手术数据集上进行探索。此外,为了应对器械轨迹的多视角特性,我们的网络通过协调的二分匹配图算法关联轨迹。该算法除了常规的线性分配外,还解决了跨视角轨迹的身份冲突问题,并在总体上提高了轨迹身份重新分配的准确性。

总结而言,我们的贡献包括以下几点:正式化了多视角器械跟踪建模,并在 CholecTrack20 数据集上对最先进方法进行了基准测试。开发了依赖于基于自监督注意力的运动方向估计和协调二分图匹配的 SurgiTrack 模型用于器械跟踪。对不同轨迹视角下的器械跟踪进行了广泛评估,涵盖不同的视频帧率以及诸如出血、烟雾和遮挡等各种视觉挑战。

这些贡献共同推动了手术器械跟踪领域的研究,促进了计算机辅助手术系统和人工智能干预技术的进一步发展。

Abatract

摘要

Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts oftenmodeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially trackingscenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 datasetprovides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2)intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. Thesefine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools afterocclusion or re-insertion into the body remains challenging due to high visual similarity, especially amongtools of the same category. This work recognizes the critical role of the tool operators in distinguishingtool track instances, especially those belonging to the same tool category. The operators’ information arehowever not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learningmethod that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model theoriginating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tooltrajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts andensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack’seffectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. Thiswork sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable andprecise assistance in minimally invasive surgeries.

准确的器械跟踪是计算机辅助干预成功的关键。然而,之前的研究往往刚性地建模器械轨迹,忽略了手术过程中动态变化的特性,特别是在器械离体或离开摄像头视野的跟踪场景中。为了解决这一局限,新的 CholecTrack20 数据集提供了详细的标签,涵盖了多种器械轨迹的三种视角:(1) 手术操作视角,(2) 腔内视角,(3) 可见性视角,代表了器械轨迹的不同时间维度。这些细粒度的标签提高了跟踪的灵活性,但也增加了任务的复杂性。由于器械类别内部的高视觉相似性,特别是在遮挡或重新插入体内后重新识别器械时,仍然面临挑战。

本研究认识到器械操作员在区分属于同一器械类别的轨迹实例中的关键作用。然而,手术视频中并未明确捕捉操作员的信息。因此,我们提出了一种名为 SurgiTrack 的新型深度学习方法,该方法利用 YOLOv7 实现精确的器械检测,并采用注意力机制对器械的发源方向(作为操作员的代理)进行建模,以实现器械重新识别。为了应对多样化的器械轨迹视角,SurgiTrack 采用了一种协调的二分匹配图算法,最大限度地减少冲突,确保器械身份的准确关联。CholecTrack20 数据集上的实验结果表明,SurgiTrack 具有出色的效果,在实时推理能力下,性能优于基线和最新的先进方法。本研究为手术器械跟踪设立了新的标准,提供了动态轨迹,为微创手术提供了更加灵活和精确的辅助。

Method

方法

We present SurgiTrack, a deep learning method for surgical tooltracking based on tool direction of motion features. SurgiTrack is designed as a multi-class multi-object tracking (MCMOT) model capableof tracking tools jointly across multiple trajectory perspectives, namelyvisibility, intracorporeal, and intraoperative. The motivation to trackbeyond camera’s field of view is to offer more flexible trajectories thatensure continuous and reliable identification of surgical tools, tailoredto the complex dynamics of a surgical scene, preventing errors andmaintaining safety even when tools temporarily move out of view.The architecture of our proposed tracking model is conceptuallydivided into the main components of object tracking: spatial detectionand data association, with the later further split into re-identificationfeature modeling and track identity matching, as illustrated in Fig. 3(a).

我们提出了一种名为 SurgiTrack 的深度学习方法,用于基于器械运动方向特征的手术器械跟踪。SurgiTrack 设计为一种多类别多目标跟踪(MCMOT)模型,能够在多个轨迹视角下对器械进行联合跟踪,包括可见性、腔内轨迹和手术全程轨迹。将跟踪扩展到摄像机视野之外的动机是提供更灵活的轨迹,以确保手术器械的持续可靠识别,适应手术场景中的复杂动态,即使在器械暂时移出视野时,也能防止错误并维护安全。我们提出的跟踪模型的架构从概念上分为目标跟踪的主要组成部分:空间检测和数据关联。数据关联进一步细分为重新识别特征建模和轨迹身份匹配,如图 3(a) 所示。

Conclusion

结论

In this work, we propose, SurgiTrack, a novel deep learning approach for multi-class multi-tool tracking in surgical videos. Our approach utilizes an attention-based deep learning model for tool identityassociation by learning the tool motion direction which we conceivedas a proxy to linking the tools to the operating surgeons’ hands via thetrocars. We demonstrate that the motion direction features are superiorto location, appearance, and similarity features for the re-identificationof surgical tools given the non-distinctiveness of most tools’ appearance, especially the ones from the same or similar classes. We show thatthe direction features can be learnt in 3 different paradigms of full-,weak-, and self-supervision depending on the availability of traininglabels. We also design a harmonizing bipartite matching graph toenable non-conflicting and synchronized tracking of tools across threeperspectives of intraoperative, intracorporeal, and visibility within thecamera field of view, which represent the various ways of consideringthe temporal duration of a tool trajectory. Additionally, we benchmarkseveral deep learning methods for tool detection and tracking on thenewly introduced CholecTrack20 dataset and conducted ablation studies on the suitability of existing re-identification features for accuratetool tracking. Our proposed model emerges as a promising solutionfor multi-class multi-tool tracking in surgical procedures, showcasingadaptability across different training paradigms and demonstratingstrong performance in essential tracking metrics. We also evaluateour model across different surgical visual challenges such as bleeding,smoke, occlusion, camera fouling, light reflection, etc., and presentsinsightful findings on their impact on visual tracking in surgical videos.Qualitative results also show that our method is effective in handlingchallenging situations compare to the baselines and can effortlesslytrack tools irrespective of the video frame sampling rate.

在本研究中,我们提出了一种新颖的深度学习方法 SurgiTrack,用于手术视频中的多类别多器械跟踪。我们的方法采用基于注意力的深度学习模型,通过学习器械运动方向来实现器械身份关联。我们将这种运动方向视为将器械与通过穿刺器操作的外科医生手部动作连接的代理。研究表明,运动方向特征在器械重新识别中优于位置、外观和相似性特征,特别是在外观相似度较高的同类或相似类别器械中。我们展示了可以通过全监督、弱监督和自监督三种不同的学习范式学习方向特征,具体取决于训练标签的可用性。此外,我们设计了一种协调的二分匹配图算法,以实现器械在手术全程轨迹、腔内轨迹和摄像机视野内可见性三个视角下的无冲突且同步的跟踪,这些视角分别代表了器械轨迹的不同时间维度。我们还在新引入的 CholecTrack20 数据集上对多种深度学习方法进行了器械检测和跟踪的基准测试,并对现有重新识别特征在准确器械跟踪中的适用性进行了消融研究。结果表明,SurgiTrack 在手术操作中的多类别多器械跟踪方面是一种极具潜力的解决方案,展现了其在不同训练范式下的适应性,并在关键跟踪指标中表现出色。

此外,我们评估了模型在不同手术视觉挑战(如出血、烟雾、遮挡、摄像头污损、光反射等)下的表现,并揭示了这些因素对手术视频中视觉跟踪的影响。定性结果表明,与基线方法相比,SurgiTrack 能够有效应对复杂场景,并能在不同视频帧采样率下轻松跟踪器械。

Results

结果

First, our base detector, YOLOv7 (Wang et al., 2023a), yields80.6% 𝐴𝑃0.5 and 56.1% 𝐴𝑃0.5∶0.*95 for tool detection at an inferencespeed of 20.6 FPS, demonstrating its effectiveness as a detector for ourtracking model.

6.1. Results of surgeon operator prediction

We measure the quality of the direction features used for trackre-identification in this task. The proposed Estimator, built on theEfficientNet-B0 (Tan and Le, 2019) backbone, demonstrates remarkableperformance in surgical tool re-identification across different supervision settings as evidenced by the results presented in Table 3. Leveraging EfficientNet-B0’s known efficiency and speed, it outperformsseveral strong baselines: Siamese baseline, ResNet (He et al., 2016),ViT (Dosovitskiy et al., 2020), CrossViT (Chen et al., 2021), etc., incapturing essential features for tool re-identification and achieving thehighest mean Average Precision (mAP) of 81.2% under supervisedsetting. The inclusion of an attention head enhances its ability to learndirection features, which is crucial for distinguishing between similar tool instances, thereby outperforming the baseline EfficientNet-B0(+4.2% mAP). Category-wise, the main surgeon’s right hand (MSRH),which is the busiest and handles most of the tools, exhibits the greatestdetection difficulty with a 53% AP.In the self-supervised setting, the proposed Estimator showcasesits versatility by consistently achieving high re-identification accuracy(≥88%) over various time intervals, including longer time differences upto the start of the video (e.g., 𝑇 𝑡 𝑡 −25 and 𝑇**𝑡 𝑡0). This demonstrates that thelearned direction-aware features effectively maintain consistency, evenwhen dealing with challenging long-term tracking scenarios; thanksto the innovative image preprocessing technique of ‘‘image slicing andpadding* ’’ that effectively addresses the visibility issue of tool shaftsin images, enhancing the model’s ability to capture directional information. The direction features align closely with the tool operator’shand direction from the trocar port, emphasizing their relevance asrobust re-identification features for surgical tool tracking. This analysisfocus on grasper because it is the only tool with multiple instancewhen considering intraoperative trajectory. We, however, observe aslightly inferior performance when models are weakly-supervised ontrack ID labels, which is the same supervisory signal for conventionalappearance re-ID models (Zhang et al., 2022; Aharon et al., 2022; Wanget al., 2023b).

首先,我们的基础检测器 YOLOv7(Wang 等人,2023a)在工具检测任务中取得了 80.6% 的 𝐴𝑃0.5 和 56.1% 的 𝐴𝑃0.5∶0.95,推理速度为 20.6 FPS,显示出其作为跟踪模型检测器的有效性。

6.1 外科医生操作员预测结果

我们评估了在此任务中用于轨迹重新识别的方向特征的质量。所提出的估计器(Estimator)基于 EfficientNet-B0(Tan 和 Le,2019)骨干网络,在不同监督设置下展示了卓越的手术器械重新识别性能,结果如表 3 所示。凭借 EfficientNet-B0 的高效性和速度,Estimator 在捕捉器械重新识别的关键特征方面,优于多个强基线模型,包括 Siamese 基线、ResNet(He 等人,2016)、ViT(Dosovitskiy 等人,2020)、CrossViT(Chen 等人,2021)等。在监督学习设置下,Estimator 达到了最高的平均精度(mAP)81.2%。加入注意力头后,其学习方向特征的能力得到了增强,这对于区分相似器械实例至关重要,相较于基线 EfficientNet-B0 提升了 4.2% 的 mAP。按类别来看,主刀医生的右手(MSRH)检测难度最大,平均精度(AP)仅为 53%,因为其负责大部分器械的操作,任务负担最重。在自监督设置下,所提出的 Estimator 展现了极高的灵活性,在不同时间间隔的重新识别任务中均保持了较高的准确率(≥88%),即使是长时间间隔(如从视频开始至 𝑇 𝑡 𝑡 −25 和 𝑇**𝑡 𝑡 0)的场景,也能够有效保持一致性。这得益于创新的图像预处理技术“图像切片和填充”,有效解决了图像中器械柄部的可见性问题,增强了模型捕捉方向信息的能力。方向特征与从穿刺口到操作员手部的方向高度一致,表明其作为手术器械重新识别的鲁棒特征具有重要意义。这一分析以抓持器(Grasper)为研究重点,因为它是唯一在手术全程轨迹中存在多个实例的器械。然而,当模型对轨迹 ID 标签的监督较弱时,其性能稍逊于常规基于外观的重新识别模型的监督信号(Zhang 等人,2022;Aharon 等人,2022;Wang 等人,2023b)。

Figure

图片

Fig. 1. Surgical tool tracking demonstrating (top) qualitative fine-grained trackingresult across multiple tools, classes, and perspectives and (bottom) superior quantitativeresults compared to the state-of-the-art.

图 1. 手术器械跟踪示例: (顶部) 展示了跨多个器械、类别和视角的细粒度跟踪结果的定性表现; (底部) 展示了与现有最先进方法相比更优越的定量结果。

图片

Fig. 2. Overview of CholecTrack20 dataset showing localization, tracking, and associated labels (Nwoye et al., 2023).

图 2. CholecTrack20 数据集概览,展示了定位、跟踪及相关标签(Nwoye 等人,2023)。

图片

Fig. 3. Overview of our proposed tool tracking model showing: (a) full architecture of SurgiTrack and its major component modules. One of the which is the YOLO-based detector.The other is the Siamese-based surgical tool direction estimator — full architectural detail in (b) which also shows an optional head for surgeon operator classification. The lastcomponent of the SurgiTrack is the harmonizing bipartite graph matching (HBGM) algorithm for tool track identity association under multiple perspectives of tool trajectories:visibility, intracorporeal, and intraoperative — full pipeline in (c)

图 3. 我们提出的器械跟踪模型概览,包括: (a) SurgiTrack 的完整架构及其主要组件模块。其中一个模块是基于 YOLO 的检测器,另一个模块是基于 Siamese 网络的手术器械方向估计器,其完整架构详见 (b),同时展示了一个可选的外科医生操作员分类模块。 最后一个组件是 SurgiTrack 的协调二分图匹配(HBGM)算法,用于在多个器械轨迹视角(可见性、腔内轨迹和手术全程轨迹)下的器械轨迹身份关联,其完整流程详见 (c)。

图片

Fig. 4. Impact of direction estimation in tracking surgical tools at varying video sampling rates (i.e. 1, 5, 25 frames per seconds FPS). A demonstration is included in the qualitativevideo.

图 4. 方向估计在不同视频采样率(即 1、5 和 25 帧每秒 FPS)下对手术器械跟踪的影响。定性视频中包含演示。

图片

Fig. 5. Performance assessment of SurgiTrack amidst surgical visual challenges. Overall performance is tabulated at the top, preceded by quantitative and qualitative resultsshowcasing tracking performance on specific visual challenge frames. Values in black denote comparable performance (within the average range, ±1.0). Values in green indicateabove-average performance, while red values indicate decreasing performance below average. The breakdown explores distinct tracking metrics focusing on detection, localization,and association or re-identification. A demo is included in the qualitative results video

图 5. 在手术视觉挑战条件下对 SurgiTrack 的性能评估。顶部列出了整体性能结果,随后是定量和定性结果,展示了在特定视觉挑战帧上的跟踪表现。 黑色数值表示性能与平均水平相当(在 ±1.0 的平均范围内);绿色数值表示高于平均水平的表现;红色数值表示低于平均水平的表现。 评估内容涵盖了检测、定位及关联或重新识别等不同的跟踪指标。定性结果视频中包含演示。

图片

Fig. 6. Qualitative result of SurgiTrack in comparison with some existing methods. Bounding box represents tool detection, tool name represents tool classification, number inblock parenthesis represents track identity, and scribble represents tracklet (max. 2 s). Green color indicates correctness, red indicates failure. A demo is included in the qualitativeresults video.

图 6. SurgiTrack 与一些现有方法的定性比较结果。 矩形框表示器械检测结果,器械名称表示器械分类,括号中的数字表示轨迹身份,线条表示轨迹片段(最长 2 秒)。绿色表示正确,红色表示失败。 定性结果视频中包含演示。

图片

Fig. 7. Qualitative result of SurgiTrack in comparison with a state of the art method (BotSORT) on tracking across variable frame rates (1FPS, 5FPS and 25FPS). Tick bluebounding box represents tool detection at current time. dotted gray bounding boxes detection at previous times, tool name represents tool classification, track identity number iswritten above each box. A demo is included in the qualitative results video.

图 7. SurgiTrack 与最先进方法 BotSORT 在不同帧率(1FPS、5FPS 和 25FPS)下进行跟踪的定性比较结果。 粗蓝色边框表示当前时间的器械检测,灰色虚线边框表示之前时间点的检测,器械名称表示器械分类,每个框上方标注轨迹身份编号。 定性结果视频中包含演示。

Table

图片

Table 1Summary of the CholecTrack20 dataset statistics (Nwoye et al., 2023)

表 1CholecTrack20 数据集统计信息摘要(Nwoye 等人,2023)

图片

Table 2Experiment and hyperparameter settings.

表 2 实验及超参数设置。

图片

Table 3Surgeon operator prediction based on direction feature embeddings for tool track re-identification — results showing the mean and per-classaverage precision (% AP) and embedding temporal consistency (% Accuracy) from time 𝑡 − 𝑘 to 𝑡. (Grasper only)

表 3 基于方向特征嵌入进行器械轨迹重新识别的外科医生操作员预测结果——显示均值和每类的平均精度(% AP),以及从时间 𝑡 − 𝑘 到 𝑡 的嵌入时间一致性(% 准确率)。(仅针对抓持器 Grasper)

图片

Table 4Ablation study on track re-identification features. Using the intraoperative trajectory perspective @ 25 FPS

表 4 关于轨迹重新识别特征的消融研究。在手术全程轨迹视角下以 25 FPS 进行实验。

图片

Table 5Ablation study on linear association algorithms and approach for combining multiplere-ID costs.

表 5 关于线性关联算法及多种重新识别 (re-ID) 成本组合方法的消融研究。

图片

Table 6Multi-perspective multi-tool tracking results @ 25 FPS.

表 6 多视角多器械跟踪结果(25 FPS)。

图片

Table 7Class-wise tracking accuracy and Impact of KB algorithm on state of the art models; [Intraoperative Perspective @ 25 FPS]

表 7各类别跟踪准确率及 KB 算法对最先进模型的影响;[手术全程视角下,25 FPS]


原文地址:https://blog.csdn.net/weixin_38594676/article/details/145210814

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!