Fig. 1. Selected results of liver tumor segmentation in CT image slices.
Segmenting organisms or tumors from medical data (e.g. ultrasound videos, CT or MRI volumes) is one of the fundamental tasks in medical informatics and diagnosis, and thus receives long-term attentions. We studies a general framework of interactive image sequence segmentation that can be adaptively applied for different types of medical data. Our system is able to accurately segments an image based on very few user scribbles (Fig. 2) and automatically propagates the segmentation into consecutive image series, resulting in the spatio-temporal tumor volume extraction.
Fig. 2. The general segmentation framework for a single image.
Fig. 3. The cooperative representation of a CT image with liver tumor.
We propose a cooperative model of a dual form that formulates the tumor segmentation from two aspects: region partition and boundary localization. The two terms are complementary but simultaneously competing; the former extracts the tumor by its appearance/texture difference against surrounding background and the latter searches for the palpable tumor boundary (Fig. 3). Moreover, we allow the model discriminatively trained based on the user placing scribbles.
In order to adapt the different appearances of medical data. The inference of image-series segmentation iterates with two steps (Fig. 4):
Fig. 5. The process of segmenting a series of images
Fig. 6. Results on CT images.
Fig. 7. Results on Ultrasound Image sequences.
GrabCut | GAC | STF | DRLSE | Ours(R) | Ours(B) | Ours(R+B) | |
---|---|---|---|---|---|---|---|
subCT precision | 0.6322 | 0.4870 | 0.4910 | 0.6808 | 0.6837 | 0.0268 | 0.8469 |
subUS precision | 0.6089 | 0.5680 | 0.5818 | 0.6995 | 0.7057 | 0.0378 | 0.7631 |
average precision | 0.6194 | 0.5315 | 0.5546 | 0.6901 | 0.6910 | 0.0304 | 0.8008 |
average runtime (s) | 2.494 | 6.635 | 19.716 | 65.700 | 0.565 | 0.334 | 0.684 |
Table 1. Comparison with the state-of-the-arts.
Table I plots the average accuracy (TP/(TP+FP+FN)) on single image segmentation (dataset: subCT and subUS). Provided the same user interactions, our approach outperforms interactive segmentation method GrabCut [1], GAC [2], and DRLSE [4]. Our approach also yield better segmentations with STF [3]. Our framework, however, is more efficient and need less tedious human annotation compared with fully supervised methods. Moreover, our method is per-image-specific, thus adapts various imaging conditions, while fully supervised methods stuck in this situation.
Fig. 8. Example of refining the segmentation by simple interactions. (a) Initial segmentation. (b) Background scribble B1 is added to exclude undesired regions. (c) Foreground scribble F1 are added to further include the neglected region.
Fig. 9. The average segmentation accuracy of the k-th slice in an image sequence. The left figure plots result obtained on SYSU-CT, while the right on SYSU-US. CT+ and US+ are results obtained by the full system, and CT- and US- are results obtained by the baseline system without model update. The performance of our method decreases in a reasonable speed. In general, a satisfactory result could be obtained if the user refine the segmentation every 10 slices.
Fig. 10. Examples of CT volumes segmentation.
Fig. 11. Examples of Ultrasound video segmentation.