Advancing 3D Biomedical Image Segmentation Through Net Learning

Volumetric data in biomedical analysis presents unique challenges, particularly in annotation. The necessity to annotate 3D structures slice by slice on a 2D screen is both time-consuming and inefficient, especially given the redundancy between adjacent slices. This inefficiency becomes a significant bottleneck for Net Learning approaches that depend on extensive annotated datasets. Complete annotation of 3D volumes is often not a practical route to create the large, diverse training datasets needed for robust generalization in these net learning models.

This paper introduces a deep network designed for dense volumetric segmentation using net learning. Crucially, this network requires only sparsely annotated 2D slices for effective training. As illustrated in Fig. 1, this net learning approach offers two primary applications. The first is to enhance sparsely annotated datasets by generating dense segmentations. The second is to enable generalization from multiple sparsely annotated datasets to new, unseen data. Both applications are highly relevant in advancing the field of biomedical image analysis through net learning.

Our network architecture builds upon the established U-Net framework. U-Net is known for its encoder-decoder structure: a contracting path to capture image context and an expanding path for precise segmentation output [11]. While the original U-Net operated in 2D, our innovation lies in adapting it for 3D volumetric data. Our network processes 3D volumes using 3D operations, including 3D convolutions, 3D max pooling, and 3D up-convolutional layers. To optimize network performance, we have incorporated batch normalization [4] for faster convergence and avoided architectural bottlenecks [13]. These enhancements are crucial for effective net learning in complex 3D biomedical data.

A key advantage in many biomedical applications of net learning is the ability to achieve good generalization with relatively small training datasets. This is due to the inherent repetitive structures and variations within each biomedical image. Volumetric images amplify this effect, allowing us to train our network on just a few volumes and still generalize effectively to new cases. To facilitate training with sparse annotations, we employ a weighted loss function and specific data augmentation techniques. This approach enables robust net learning from sparsely annotated training data.

We have successfully applied this net learning method to a challenging confocal microscopic dataset of the Xenopus kidney, an organ with a complex developmental structure [7] that limits the effectiveness of traditional parametric models. Qualitative results demonstrate the high quality of dense segmentations generated from sparse annotations. These findings are supported by quantitative evaluations, and experiments further illustrate the impact of the number of annotated slices on network performance. Our Caffe-based network implementation is publicly available as OpenSourceFootnote 1, encouraging further research and application of this net learning approach.

1.1 Related Work in Net Learning for Image Segmentation

Convolutional Neural Networks (CNNs) have revolutionized 2D biomedical image segmentation, achieving near-human accuracy [3, 11, 12]. This success has spurred efforts to apply 3D CNNs to volumetric biomedical data. Milletari et al. [9] proposed a CNN with Hough voting for 3D segmentation, but this method is not end-to-end and is limited to blob-like structures. Kleesiek et al. [6] presented an end-to-end 3D CNN, but its shallow architecture restricts its ability to analyze multi-scale structures.

Our work builds upon the 2D U-Net [11], a network that has achieved top performance in international segmentation competitions. U-Net’s architecture and data augmentation strategies enable robust generalization from limited annotated data by leveraging biologically plausible transformations and deformations. Up-convolutional architectures like Fully Convolutional Networks [8] and U-Net, while powerful, are not yet widely adopted in 3D. Tran et al. [14] applied a similar architecture to videos with full annotation. The key contribution of our paper lies in demonstrating effective net learning from sparsely annotated volumes and the ability to process arbitrarily large volumes using seamless tiling. This work highlights the potential of net learning to overcome the annotation bottleneck in 3D biomedical image segmentation.


References (Bibliography in Markdown format):

1 Fig. 1. Application scenarios for volumetric segmentation with the 3D u-net.

3 Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceeding CVPR, pp. 447–456 (2015)

4 Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)

5 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceeding ACMMM, pp. 675–678 (2014)

6 Kleesiek, J., Urban, G., Hubert, A., Schwarz, D., Maier-Hein, K., Bendszus, M., Biller, A.: Deep mri brain extraction: a 3D convolutional neural network for skull stripping. NeuroImage (2016)

7 Lienkamp, S., Ganner, A., Boehlke, C., Schmidt, T., Arnold, S.J., Schäfer, T., Romaker, D., Schuler, J., Hoff, S., Powelske, C., Eifler, A., Krönig, C., Bullerkotte, A., Nitschke, R., Kuehn, E.W., Kim, E., Burkhardt, H., Brox, T., Ronneberger, O., Gloy, J., Walz, G.: Inversin relays frizzled-8 signals to promote proximal pronephros development. PNAS 107(47), 20388–20393 (2010)

8 Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceeding CVPR, pp. 3431–3440 (2015)

9 Milletari, F., Ahmadi, S., Kroll, C., Plate, A., Rozanski, V.E., Maiostre, J., Levin, J., Dietrich, O., Ertl-Wagner, B., Bötzel, K., Navab, N.: Hough-CNN: deep learning for segmentation of deep brain regions in MRI and ultrasound. CoRR abs/1601.07014 (2016)

11 Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24574-4_28

12 Seyedhosseini, M., Sajjadi, M., Tasdizen, T.: Image segmentation with cascaded hierarchical models and logistic disjunctive normal networks. In: Proceeding ICCV, pp. 2168–2175 (2013)

13 Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015)

14 Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Deep end2end voxel2voxel prediction. CoRR abs/1511.06681 (2015)

Fn1 Footnote 1. Link to OpenSource Caffe implementation (replace with actual link if available). (Note: In the original text, this is just a footnote marker without actual content, so I’ve replicated that.)

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *