We propose a new multilabel classifier, called LapTool-Net to detect the presence of surgical tools in each frame of a laparoscopic video. The novelty of LapTool-Net is the exploitation of the correlation among the usage of different tools and, the tools and tasks - namely, the context of the tools' usage. Towards this goal, the pattern in the co-occurrence of the tools is utilized for designing a decision policy for a multilabel classifier based on a Recurrent Convolutional Neural Network (RCNN) architecture to simultaneously extract the spatio-temporal features. In contrast to the previous multilabel classification methods, the RCNN and the decision model are trained in an end-to-end manner using a multitask learning scheme. To overcome the high imbalance and avoid overfitting caused by the lack of variety in the training data, a high down-sampling rate is chosen based on the more frequent combinations. Furthermore, at the post-processing step, the prediction for all the frames of a video are corrected by designing a bi-directional RNN to model the long-term task's order. LapTool-net was trained using a publicly available dataset of laparoscopic cholecystectomy. The results show LapTool-Net outperforms existing methods significantly, even while using fewer training samples and a shallower architecture.
翻译:我们提议一个新的多标签分类器,名为LapTool-Net,以检测在拉帕罗斯科视频的每个框框中是否有外科工具。LapTool-Net的新颖之处是利用不同工具的使用和工具与任务(即工具的使用背景)之间的相互联系。为了实现这一目标,使用这些工具共同出现的模式来设计一个多标签分类器的决策政策,该多标签分类器以经常性神经网络(RCNN)架构为基础,同时提取时空特征。与以往的多标签分类方法不同,RANN和决定模型采用多任务学习方案,以端对端的方式培训。为了克服高不平衡现象,避免因培训数据缺乏多样性造成的过度匹配,根据更频繁的组合选择了高下游率。此外,在后处理阶段,通过设计双向的 RNNE,甚至以模拟长期任务分类方法对高端和低时空的图像样本进行校准。Lapol-totototo to complainstation rodustry the produstrue the progrational laphyal-stal roomstal rogrational-tractions romastryal-stal-tostal-tost romatistry