Full length article| Volume 9, 100448, January 01, 2022

Evaluation of clinical applicability of automated liver parenchyma segmentation of multi-center magnetic resonance images

• Author Footnotes
1 Denotes equal contributions.
Open AccessPublished:November 02, 2022

Abstract

Purpose

Automated algorithms for liver parenchyma segmentation can be used to create patient-specific models (PSM) that assist clinicians in surgery planning. In this work, we analyze the clinical applicability of automated deep learning methods together with level set post-processing for liver segmentation in contrast-enhanced T1-weighted magnetic resonance images.

Methods

UNet variants with/without attention gate, multiple loss functions, and level set post-processing were used in the workflow. A multi-center, multi-vendor dataset from Oslo laparoscopic versus open liver resection for colorectal liver metastasis clinical trial is used in our study. The dataset of 150 volumes is divided as 81:25:25:19 corresponding to train:validation:test:clinical evaluation respectively. We evaluate the clinical use, time to edit automated segmentation, tumor regions, boundary leakage, and over-and-under segmentations of predictions.

Results

The deep learning algorithm shows a mean Dice score of 0.9696 in liver segmentation, and we also examined the potential of post-processing to improve the PSMs. The time to create clinical use segmentations of level set post-processed predictions shows a median time of 16 min which is 2 min less than deep learning inferences. The intra-observer variations between manually corrected deep learning and level set post-processed segmentations show a 3% variation in the Dice score. The clinical evaluation shows that 7 out of 19 cases of both deep learning and level set post-processed segmentations contain all required anatomy and pathology, and hence these results could be used without any manual corrections.

Conclusions

The level set post-processing reduces the time to create clinical standard segmentations, and over-and-under segmentations to a certain extent. The time advantage greatly supports clinicians to spend their valuable time with patients.

1. Introduction

Colorectal cancer is considered the third-most common cancer in the world and the second leading in mortality [
• Torre L.A.
• Bray F.
• Siegel R.L.
• Ferlay J.
• Lortet-Tieulent J.
• Jemal A.
Global cancer statistics, 2012.
,
• Bray F.
• Ferlay J.
• Soerjomataram I.
• Siegel R.L.
• Torre L.A.
• Jemal A.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
,
• Keum N.
• Giovannucci E.
Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies.
]. About 50% of the patients with colorectal cancer develops colorectal liver metastases [
• Zhou H.
• Liu Z.
• Wang Y.
• Wen X.
• Yuan L.
• Ran X.
• Xiong L.
• Ran Y.
• Chen W.
• Wen Y.
Colorectal liver metastasis: molecular mechanism and interventional therapy.
,
• Hu D.
• Pan Y.
• Chen G.
Colorectal cancer liver metastases: an update of treatment strategy and future perspectives.
], and resection of metastases is considered to be the curative option [
• Angelsen J.-H.
• Horn A.
• Sorbye H.
• Eide G.E.
• Løes I.M.
• Viste A.
Population-based study on resection rates and survival in patients with colorectal liver metastasis in Norway.
]. Though resection can be done both by open surgery and laparoscopic surgery, laparoscopic liver surgery has more advantages in postoperative benefits with reduced blood loss, shorter hospital stays, etc. [
• Witowski J.
• Rubinkiewicz M.
• Mizera M.
• Wysocki M.
• Gajewska N.
• Sitkowski M.
• Małczak P.
• Major P.
• Budzyński A.
• Pedziwiatr M.
Meta-analysis of short-and long-term outcomes after pure laparoscopic versus open liver surgery in hepatocellular carcinoma patients.
,
• Aghayan D.L.
• Fretland A.̊smund A.
• Kazaryan A.M.
• Sahakyan M.A.
• Dagenborg V.J.
• Bjørnbeth B.A.
• Flatmark K.
• Kristiansen R.
• Edwin B.
Laparoscopic versus open liver resection in the posterosuperior segments: a sub-group analysis from the oslo-comet randomized controlled trial.
,
• Ciria R.
• Gomez-Luque I.
• Ocana S.
• Cipriani F.
• Halls M.
• Briceno J.
• Okuda Y.
• Troisi R.
• Rotellar F.
• Soubrane O.
• et al.
A systematic review and meta-analysis comparing the short-and long-term outcomes for laparoscopic and open liver resections for hepatocellular carcinoma: updated results from the European Guidelines Meeting on Laparoscopic Liver Surgery, Southampton, UK, 2017.
]. Furthermore, laparoscopic parenchyma-sparing liver resection allows radical resection with maximum preservation of the liver that contributes to decreased risk of postoperative liver failure [
• Aghayan D.L.
• Pelanis E.
• Fretland A.A.
• Kazaryan A.M.
• Sahakyan M.A.
• Røsok B.I.
• Barkhatov L.
• Bjørnbeth B.A.
• Elle O.J.
• Edwin B.
Laparoscopic parenchyma-sparing liver resection for colorectal metastases.
]. This requires preparation with extensive planning related to the assessment of resection margin and segmentation of liver anatomy from medical images by the creation of patient-specific 3D models (PSM) [
• Schneider C.
• Allam M.
• Stoyanov D.
• Hawkes D.
• Gurusamy K.
• Davidson B.
Performance of image guided navigation in laparoscopic liver surgery - a systematic review.
,
• Eason D.
• Rea P.
• Watson A.J.M.
Development of a patient-specific 3D-printed liver model for preoperative planning.
]. Computed tomography (CT) is considered the choice for imaging, but magnetic resonance (MR) gained its popularity for improved lesion-to-liver contrast and non-ionizing radiation, etc. [
• Corwin M.T.
• Fananapazir G.
• Jin M.
• Lamba R.
• Bashir M.R.
Differences in liver imaging and reporting data system categorization between mri and ct.
,
• Alabousi M.
• McInnes M.D.
• Salameh J.-P.
• Satkunasingham J.
• Kagoma Y.K.
• Ruo L.
• Meyers B.M.
• Aziz T.
• van der Pol C.B.
Mri vs. ct for the detection of liver metastases in patients with pancreatic carcinoma: a comparative diagnostic test accuracy systematic review and meta-analysis.
,
• Mirpour S.
• Motaghi M.
• Hazhirkarzar B.
• Pawlik T.M.
• Kamel I.R.
Imaging of colorectal liver metastasis.
].
Artificial intelligence (AI) - based deep learning methods may reduce the workload of creating these PSMs by performing the tasks automatically and allowing the clinicians to interpret the segmentation with less time. Nowadays, with the availability of medical data and high computational power, there is increased usability of AI techniques in segmentation and surgery planning. Effective clinical utilization of AI-created PSMs still needs to be corrected to a certain extent and requires validation by clinicians. It is important to analyze the clinical relevancy of liver segmentation using automated algorithms, especially the time spend to create clinically acceptable segmentation, etc. [
• Lustberg T.
• van Soest J.
• Gooding M.
• Peressutti D.
• Aljabar P.
• van der Stoep J.
• van Elmpt W.
• Dekker A.
Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer.
].
The convolutional neural network models are playing an important role in medical images [
• Lundervold A.S.
• Lundervold A.
An overview of deep learning in medical imaging focusing on MRI.
]. Xio Han [
• Han X.
MR-based synthetic CT generation using a deep convolutional neural network method.
,

X. Han, Automatic liver lesion segmentation using A deep convolutional neural network method, CoRR abs/1704.07239 (2017). arXiv:1704.07239.

] has used a novel deep convolutional neural network, and achieved a high score in Liver Tumor Segmentation (LiTS) challenge 2017. Christ et al. [

P.F. Christ, F. Ettlinger, F. Grün, M.E.A. Elshaera, J. Lipkova, S. Schlecht, F. Ahmaddy, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, F. Hofmann, M.D. Anastasi, S.-A. Ahmadi, G. Kaissis, J. Holch, W. Sommer, R. Braren, V. Heinemann, B. Menze, Automatic Liver and Tumor Segmentation of CT and MRI Volumes using Cascaded Fully Convolutional Neural Networks (2017). arXiv:1702.05970.

] used a cascaded UNet model on diffusion weighted-MR images, and achieved 87% Dice score. Fu et al. [
• Fu Y.
• Mazur T.R.
• Wu X.
• Liu S.
• Chang X.
• Lu Y.
• Li H.H.
• Kim H.
• Roach M.C.
• Henke L.
• Yang D.
A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
] proposed a novel method using a trio of CNNs, and achieved a Dice score of 95.3% in the liver segmentation task. Valindria et al. [

V.V. Valindria, N. Pawlowski, M. Rajchl, I. Lavdas, E.O. Aboagye, A.G. Rockall, D. Rueckert, B. Glocker, Multi-modal learning from unpaired images: application to multi-organ segmentation in CT and MRI, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 547–556.10.1109/WACV.2018.00066.

] investigated using a dual-stream encoder-decoder architecture for abdominal multiple organ segmentation on unpaired CT and MR images, and achieved a Dice score of 94.3% in the liver. The effectiveness of UNet is proven in comparison with multi-atlas segmentation in T1-weighted 3D-SPGR images, Owler et al. [

J. Owler, B. Irving, G. Ridgeway, M. Wojciechowska, J. McGonigle, M. Brady, Comparison of multi-atlas segmentation and U-Net approaches for automated 3D liver delineation in MRI, in: Annual Conference on Medical Image Understanding and Analysis, Springer, 2019, pp.478–488.10.1007/978–3-030–39343-4_41.

] concluded that UNet approaches were more effective in obtaining a median Dice of 97% using 3D UNet architecture. Chen et al.[
• Chen Y.
• Ruan D.
• Xiao J.
• Wang L.
• Sun B.
• Saouaf R.
• Yang W.
• Li D.
• Fan Z.
Fully automated multiorgan segmentation in abdominal magnetic resonance imaging with deep neural networks.
] proposed a 2D UNet with densely connected blocks for multi-organ segmentation on multi-slice MR images, and obtained the Dice score of 96.3% in liver metrics. Couteaux et al. [

V. Couteaux, M. Trintignac, O. Nempont, G. Pizaine, A.S. Vlachomitrou, P.-J. Valette, L. Milot, I. Bloch, Comparing Deep Learning strategies for paired but unregistered multimodal segmentation of the liver in T1 and T2-weighted MRI (2021). arXiv:2101.06979.

] used single and multi-channel networks which involves a comparison of segmentation between paired-unregistered T1 and T2 weighted MR images using 3D-UNet architecture, and achieved Dice score 96.1% for T1 and 93.8% for T2 sequences respectively. Given the consideration of the potential impact of UNet in literature, we decided to use a variant of UNet with shape-based prior knowledge.
Despite the fact that the CNN model delivers results with higher accuracy, they are dependent on the training features learned by the model from the available dataset. On the other hand, shape-based techniques utilizing prior knowledge have promising results in the clinical domain. The level set method, as a part of the active contour model family, has evolved over the past 20 years with a higher impact on medical image segmentation [
• Göçeri E.
Fully automated liver segmentation using sobolev gradient-based level set evolution.
]. To address the problem of over-segmentation in MR images, Chen et al. [
• Chen G.
• Gu L.
• Qian L.
• Xu J.
An improved level set for liver segmentation and perfusion analysis in MRIs.
] proposed a method by employing a multiple initialization level set method achieving a true positive fraction of 95%. Middleton et al. [
• Middleton I.
• Damper R.I.
Segmentation of magnetic resonance images using a combination of neural networks and active contour models.
] described combining neural networks together with snake models to perform segmentation in MR images, and conclude that spatial inputs can contribute to marginal improvements in segmentation performance and poor segmentation can be improved by modifying active contour models. Using a region-based level set function, Omar Ibrahim Alirr [
• Alirr O.I.
Deep learning and level set approach for liver and tumor segmentation from CT scans.
] implemented a fully convolutional neural network for liver segmentation in CT images, and achieved a Dice score of 95.6% in the LiTS dataset. Shu et al. [
• Shu X.
• Yang Y.
• Wu B.
Adaptive segmentation model for liver CT images based on neural network and level set method.
] proposed a method for liver CT images based on a level set with energy functional including the data fitting term, the length term and the bound term, and achieved a Dice score of 97.57%. All these successful level set methods confirm their effectiveness in medical images and a potential domain of research.
There are many techniques based on automated or semi-automated methods for segmentations available nowadays. Further, how effectively these segmentations are utilized for visualization and surgical planning, the time required to modify clinically relevant 3D PSMs, prospectively clinician’s experience, etc., require more studies to implement 3D PSMs in everyday clinical routine [
• Coelho G.
• Rabelo N.N.
• Vieira E.
• Mendes K.
• Zagatto G.
• de Oliveira R.S.
• Raposo-Amaral C.E.
• Yoshida M.
• de Souza M.R.
• Fagundes C.F.
• Teixeira M.J.
• Figueiredo E.G.
Augmented reality and physical hybrid model simulation for preoperative planning of metopic craniosynostosis surgery.
,
• Palkovics D.
• Mangano F.G.
• Nagy K.
• et al.
Digital three-dimensional visualization of intrabony periodontal defects for regenerative surgical treatment planning.
,
• Singh GD S.M.
Virtual surgical planning: modeling from the present to the future.
]. In this work, we study the clinical relevancy of liver segmentation on multi-center MR images (T1-sequence) using a deep learning algorithm, and with level set post-processing. We also measure the time required to create clinically acceptable segmentation for deep learning, and deep learning initialized level set post-processed segmentations. The novelty of this article includes multi-center MR data, extensive parameter optimization of the level set method, and comprehensive clinical analysis.
This paper is organized as follows: in Section 2, a detailed description of the dataset, ground truth delineation criteria, proposed liver segmentation model with data pre- and post-processing particulars together with evaluation metrics are provided. In subsequent sections, numerical and subjective results of both deep learning and level set post-processing followed by a clinical discussion of these results are presented in Section 3. In Section 4, final observations of the results obtained, conclusions, and limitations of the proposed study are presented.

2. Materials and methods

2.1 Data acquisition

Our study includes contrast-enhanced T1-weighted MR images (150 volumes) gathered through Oslo laparoscopic versus open liver resection for colorectal liver metastases clinical trial study [
• Aghayan D.L.
• Fretland A.̊smund A.
• Kazaryan A.M.
• Sahakyan M.A.
• Dagenborg V.J.
• Bjørnbeth B.A.
• Flatmark K.
• Kristiansen R.
• Edwin B.
Laparoscopic versus open liver resection in the posterosuperior segments: a sub-group analysis from the oslo-comet randomized controlled trial.
,
• Ciria R.
• Gomez-Luque I.
• Ocana S.
• Cipriani F.
• Halls M.
• Briceno J.
• Okuda Y.
• Troisi R.
• Rotellar F.
• Soubrane O.
• et al.
A systematic review and meta-analysis comparing the short-and long-term outcomes for laparoscopic and open liver resections for hepatocellular carcinoma: updated results from the European Guidelines Meeting on Laparoscopic Liver Surgery, Southampton, UK, 2017.
,
• Fretland A.A.
• Kazaryan A.M.
• Bjørnbeth B.A.
• Flatmark K.
• Andersen M.H.
• Tønnessen T.I.
• Bjørnelv G.M.W.
• Fagerland M.W.
• Kristiansen R.
• Øyri K.
• et al.
Open versus laparoscopic liver resection for colorectal liver metastases (the Oslo-CoMet Study): study protocol for a randomized controlled trial.
,
• Aghayan D.L.
• Kazaryan A.M.
• Dagenborg V.J.
• Røsok B.I.
• Fagerland M.W.
• WaalerBjørnelv G.M.
• Kristiansen R.
• Flatmark K.
• Fretland A.A.
• Edwin B.
• et al.
Long-term oncologic outcomes after laparoscopic versus open resection for colorectal liver metastases: a randomized trial.
]. The study was performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
The database consists of medical images obtained from various hospitals in Norway, therefore was performed with both Siemens and Philips MRI machines on 7 different models. The T1-weighted MR images have been acquired using various protocols and combinations of Dixon, eThrive, Vibe with various different timings between contrast injection and imaging. Hence, the database includes images with a high level of variability. The image set has 150 anonymized MR image volumes and is converted into NIfTI format.

2.1.1 Ground truth segmentation

The ground truth manual segmentation creation for the dataset was performed by a medical doctor (over 4-year medical image post-processing experience) with 3D Slicer [
• Fedorov A.
• Beichel R.
• Kalpathy-Cramer J.
• Finet J.
• Fillion-Robin J.-C.
• Pujol S.
• Bauer C.
• Jennings D.
• Fennessy F.
• Sonka M.
• Buatti J.
• Aylward S.
• Miller J.V.
• Pieper S.
• Kikinis R.
3D Slicer as an image computing platform for the Quantitative Imaging Network.
]. In Slicer, several basic and advanced segmentation tools and techniques were used to make the ground truth. These are paint, erase, margin, grow from seed, smoothing, scissors, islands, and logical operators with a threshold to mention some. Delineation criteria are the following: Ground truth includes liver tissue as a whole organ including lesions, all liver vessels, and bile ducts adjoined to the liver parenchyma viewed on the axial plane. Segmentation does not include the gallbladder and the vena-cava is segmented on one additional axial slice cranially.

2.2 Image pre-processing and augmentations

The dataset of 131 volumes is taken out of 150 volumes, and divided into 81, 25, and 25 volumes for train, validation, and test dataset respectively. The remaining 19 volumes of the dataset were used for testing the clinical applicability of the deep learning segmented results. The volume dimensions are varying as 176–640, 168–560, 60–150, and voxel spacing varies as 0.56–1.95, 0.56–1.95, 1.6–3 mm in axial, coronal, and sagittal planes respectively of the dataset.
An input volume of the dataset contains the minimum voxel intensity zero and the maximum voxel value depending on each volume which is related to the reconstruction algorithm MR imaging scanner. The huge variations lead to difficulties in learning features during training and make converge slower. The min-max normalization scales the intensity between 0 and 1 which makes the better network learning, and gradient descent algorithm converge faster. We have used intensity scaling, contrast adjustment, Gaussian smoothness, Gaussian sharpening, flipping, rotation, and elastic 3D deformation as augmentations on the fly during the training of the network with a probability of 0.3. The parameter values in each augmentation choice are randomly chosen at each epoch. The flipping axis (random flip with respect to axis 0 or 1 or 2) and rotation (− 0.4 radian to +0.4 radians) are considered.

2.3 Network

In this study, a simple UNet variant model is used. We have considered UNet with 16 channels in the first layer (denoted by UNet16), UNet with 32 channels in the first layer (denoted UNet32), and an attention gate with each UNets model.
Fig. 1(a) explains the UNet architecture of convolution layers with stride 2 which helps to down-sample and up-sample spatial resolutions. The network consists of five levels with encoder-decoder layers. In the encoder, we represent the convolutional layers with 3 × 3 × 3 kernel with stride 2 as light gray layers. The pale white and light blue layers represent batch normalization and PReLU activation functions, respectively. The black layer represents the convolutional layers of 3 × 3 × 3 kernel with stride 1. The thick blue color blocks represent transpose convolutions of 3 × 3 × 3 kernel with stride 2, and the yellow color blocks represent concatenations. The dashed lines with arrows connect different layers of the network. We applied Argmax to extract one-channel network prediction.
We used the attention gate in the UNet16 and UNet32 decoder part of the network, shown as brown circles in Fig. 1(b). It consists of convolution with 1 × 1 × 1 kernel for both the skip connections and up-convolution channels followed by additive attention with relu activation function. Then a convolution with 1 × 1 × 1 kernel and sigmoid activation function is used. Finally, it has a 3D grid-attention re-sampler that makes a 3D volume.
For training the deep learning model, an Intel(R) Core (TM) i7–7700 K CPU 4.20 GHz (8 cores), 64 GB RAM, and NVIDIA Geforce GTX 1080 Ti - PCIE - 11 GB of Video RAM was used. The networks were trained using Novagrad optimizer with a learning rate of 0.001. The NovoGrad Optimizer is a stochastic normalized gradient descent algorithm that includes the information of layer-wise second moments, computes the first moment with gradients normalized with layer-wise second moments, and decouples weight decay. In comparison with the Adam optimizer, NovoGrad takes less memory and is more numerically stable [

B. Ginsburg, P. Castonguay, O. Hrinchuk, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li, H. Nguyen, Y. Zhang, J.M. Cohen, Training deep networks with stochastic gradient normalized by layerwise adaptive second moments (2020). arxiv.org/pdf/1905.11286.pdf.

].

2.4 Loss functions

The Dice loss, generalized Dice loss, and Tversky loss functions are used to analyze the network performance. The Dice similarity coefficient (DSC) is used to measure spatial overlap between manual segmentation and predicted image segmentation. If $pi$ and $pˆi$ are voxels of expected and predicted label class values respectively, the Dice loss (DL) function (based on the Dice similarity coefficient) can be written as
$DL=1−2∑i=1Npipˆi+ϵ∑i=1Npi2+∑i=1Npˆi2+ϵ,$
(1)

where ϵ = 10−6 and N takes all voxels of the manual segmentation [
• Sudre C.H.
• Li W.
• Vercauteren T.
• Ourselin S.
• Jorge Cardoso M.
Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations.
,

F. Milletari, N. Navab, S. Ahmadi, V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation, 2016. arXiv:1606.04797.

]. The generalized Dice loss (GDL) obtained from generalized Dice score (used to evaluate multi-class K label segmentation with a single score) can be utilized for training deep CNN models described by
$GDL=1−2∑l=1Kwl∑i=1Nplipˆli∑l=1Kwl∑i=1Npli+pˆli,$
(2)

where $wl=1∑n=1Npˆln$ is the weight [
• Sudre C.H.
• Li W.
• Vercauteren T.
• Ourselin S.
• Jorge Cardoso M.
Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations.
,

F. Milletari, N. Navab, S. Ahmadi, V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation, 2016. arXiv:1606.04797.

,
• Crum W.
• Camara O.
• Hill D.
Generalized overlap measures for evaluation and validation in medical image analysis.
].
The Tversky similarity Index (TI) is balancing false positive (FP) and false negative (FN), given more weighted FN than FP, defined as
$TI=TPTP+αFP+βFN,$
(3)

where α + β = 1 and β > α. Hence, these conditions leads to αβ ≰ 0.5. If αβ = 0.5, the Tversky similarity index becomes the Dice similarity score. The Tversky loss (TL) can be derived as TL = 1 − TI, where TI is the Tversky index [
• Erdogmus D.
• Gholipour A.
Tversky loss function for image segmentation using 3D fully convolutional deep networks.
]. In the case of α and β with the restriction β > α and α + β = 1, gives the parameter choices for (αβ) = {(0.4, 0.6), (0.3, 0.7), (0.2, 0.8), (0.1, 0.9)} (change in α and β are 0.1), represented Tversky losses as TL46, TL37, TL28, TL19.

2.5 Data post-processing

A level set algorithm as post-processing to refine the deep learning liver parenchyma segmentation was used. We consider the distance regularized level set evolution (DRLSE) which is a variant of the variational level set formulation [
• Li C.
• Xu C.
• Gui C.
• Fox M.D.
Distance regularized level set evolution and its application to image segmentation.
]. The DRLSE algorithm possesses a potential function that acts as a distance regularization term, and it helps the gradient magnitude of the level set function to a minimum point. In our case, we consider a double-well potential function that has two minima. Also, DRLSE algorithm has an external energy term, that helps to move the zero-level contour to the desired location. The gradient flow, diffusion equation, can be obtained by exterimizing the total energy integral [
• Li C.
• Xu C.
• Gui C.
• Fox M.D.
Distance regularized level set evolution and its application to image segmentation.
] as
$∂ϕ∂t=μdiv(dp(∣∇ϕ∣)∇ϕ)+λδϵ(ϕ)divg∇ϕ∣∇ϕ∣+αgδϵ(ϕ),$
(4)

where the initial level set function given by ϕ(x, 0), μdp(∣ ∇ ϕ∣) is the diffusion coefficient, δ is the Dirac delta function, λ > 0, $α∈R$, and g is an edge indicator function defined by $11+∣∇Gσ*I∣2$, Gσ is a Gaussian kernel with a standard deviation σ, and I is an image on a domain Ω. The α and λ are the coefficients of area and length terms. Using DRLSE algorithm we iteratively proceed to the steady solution by following the steepest descent direction.

2.6 Evaluation metrics

In general, the evaluation of segmentation results can be classified using a confusion matrix with true positive (TP), false positive (FP), false negative (FN), and true negative (TN). We used the following performance metric for the evaluation: The Dice similarity coefficient (DSC or Dice score) is a measure of the spatial overlap between the predicted segmentation and the manual segmentation written as
$DSC=2TP2TP + FP + FN.$
(5)

The Jaccard index (JAC) is defined as the intersection over the union of the predicted segmentation and the manual segmentation defined by
$JAC=TPTP+FP+FN.$
(6)

Over segmentation (OS) measures the overlapping of the prediction and the complement of label voxels in the ground truth over a union of ground truth and prediction defined by
$OS=2∣GT¯∩Pred∣∣GT∣+∣Pred∣,$
(7)

where GT is the ground truth, Pred represents the prediction, and $GT¯$ complement of a class in the manual segmentation [
• Vania M.
• Mureja D.
• Lee D.
Automatic spine segmentation from CT images using Convolutional Neural Network via redundant generation of class labels.
,
• Nainamalai V.
• Lippert M.
• Brun H.
• Elle O.J.
• Kumar R.P.
Local integration of deep learning for advanced visualization in congenital heart disease surgical planning.
]. Under segmentation (US) measures overlapping of the complement of the prediction ($Pred¯$) and the ground truth over the union of ground truth and prediction defined by
$US=2∣GT∩Pred¯∣∣GT∣+∣Pred∣.$
(8)

The Hausdorff Distance (HD) is a spatial-distance-based measure that is used to evaluate dissimilarity between two segmentation surfaces. We consider modified Hausdorff distance (95 percentile) to remove a small subset of outlier voxels. For two finite point sets A and B, the 95 percentile Hausdorff Distance can be defined as
$HD95(A,B)=max(h(A,B),h(A,B)),$
(9)

where $h(A,B)=maxa∈Aminb∈B∣∣a−b∣∣$ is the directed Hausdorff distance, the norm ∣∣. ∣∣ represents the Euclidean distance between the ground truth A and prediction B surfaces respectively [
• Kuijf H.J.
• Biesbroek J.M.
• De Bresser e. a.
Standardized assessment of automatic segmentation of white matter hyperintensities and results of the WMH segmentation challenge.
,
• Karimi D.
• Salcudean S.E.
Reducing the Hausdorff distance in medical image segmentation with convolutional neural networks.
].

2.7 Clinical evaluation

A representative set of 19 volumes of MR images were chosen and excluded from any other parts of this work to be used purely for clinical evaluation. The deep learning network model corresponds to the best Dice score of the test dataset to infer the 19 volumes of the clinical evaluation dataset were considered. The clinical evaluation set of MR images has similar characteristics and is balanced (various manufacturers, machine models, slice thickness, lesion localization, and sizes) as the one used for the deep learning network training, validation, and test data sets. We divide the clinical evaluation into two parts clinical usability and review, and segmentation correction.

2.7.1 Clinical usability and review

A couple of meetings, consisting of at least one radiologist and one surgeon, were gathered to review the deep learning results and level-set processed results. The 19 volumes of clinical evaluation MR images, deep learning results, and level set inferences were shown side by side to the blinded review team. The following were analyzed and carried out during the meeting: Firstly, the focus of the discussion was on the accuracy of 3D model, evaluating the liver surface, OS and US, and overall understanding of the liver. Secondly, a detailed examination conducted on the axial view of the MR image was performed along with (deep learning with and without level set) segmentation overlaid with high transparency. This procedure is to evaluate and focus on the segmentation of liver parenchyma, vessels, and lesions, and to identify locations of OS and US. Finally, for each case, a group decision had to be made if segmentations were complete and after future sub-classification of internal anatomical structures could be used for liver surgery planning without any manual corrections. We collected their comments throughout the meeting and finally summarized them into key observations.

2.7.2 Segmentation correction

Manual corrections were performed on the results of deep learning inferences and level set function results of 19 volumes of the clinical evaluation dataset. The segmentation corrections were performed using 3D Slicer with the help of various tools available, including addition, removal, island removal, filling holes, median smoothing, etc. The time required to make necessary changes to the segmentation was recorded from the first change until the review and acceptance of the result. Initially, the deep learning inferences were corrected manually to meet the defined standard of segmentation. Then, the level set function results were edited manually after a blindfold period of five weeks break. The metrics for deep learning (with and without level set) inferences were evaluated against manually corrected segmentation of (deep learning with and without level set) each set. In addition to this, we have estimated intra-observer variations between manually corrected deep learning and level set segmentation to measure clinically acceptable variations for this rator.

3. Results

3.1 Deep learning results

In this study, we considered a UNet variant and UNet with attention-gate network models. We set 300 epochs for training with early stopping as 25 epochs for all experiments, and hence the training will be stopped when there is no improvement in the learning. The parameters corresponding to the best Dice score of the validation dataset were stored, and it was used to infer the test dataset. Table 1 shows the evaluation of 25 volumes test dataset metrics of the UNet16 and UNet32 and attention gate models with different losses.
Table 1Average metric values of test dataset of 25 volumes using (i) UNet16 (first block) (ii) UNet32 (second block) (iii) UNet16 with attention gate (Third block) (iv) UNet32 with attention gate (fourth block), with different loss functions. Results from UNet32 with TL19 loss is accessed for level set experiments.
NetworkLossDSCJACOSUSHD95
UNet16DL0.9612 ± 0.01360.9255 ± 0.02480.0218 ± 0.02380.0559 ± 0.02423.6158 ± 3.5482
GDL0.9579 ± 0.01550.9197 ± 0.02770.0214 ± 0.03070.0628 ± 0.02293.7245 ± 3.6924
TL460.9656 ± 0.01040.9336 ± 0.01930.0161 ± 0.01620.0528 ± 0.01943.1974 ± 2.9255
TL370.968 ± 0.01070.9382 ± 0.01980.0216 ± 0.01860.0424 ± 0.01973.1383 ± 2.921
TL280.9668 ± 0.0150.936 ± 0.02690.0273 ± 0.02960.0392 ± 0.0193.5011 ± 3.4614
TL190.9688 ± 0.0110.9397 ± 0.02030.037 ± 0.0230.0254 ± 0.01432.9778 ± 1.8407
UNet 32DL0.9623 ± 0.0150.9277 ± 0.02690.022 ± 0.0280.0534 ± 0.01932.9746 ± 2.7522
GDL0.9627 ± 0.01250.9283 ± 0.02290.0187 ± 0.01720.0559 ± 0.02482.854 ± 1.9512
TL460.9683 ± 0.01240.9389 ± 0.02270.0241 ± 0.02240.0392 ± 0.01643.0974 ± 2.9941
TL370.9678 ± 0.01160.9378 ± 0.02140.0206 ± 0.01890.0439 ± 0.01812.857 ± 2.3179
TL280.9692 ± 0.01250.9404 ± 0.0230.0208 ± 0.01930.0409 ± 0.01742.6133 ± 2.4184
TL190.9696 ± 0.01180.9413 ± 0.02170.0311 ± 0.02360.0296 ± 0.01333.2787 ± 2.7023
Att. gateUNet16DL0.9358 ± 0.02590.8804 ± 0.0440.029 ± 0.05060.0994 ± 0.04096.7862 ± 7.0749
GDL0.9507 ± 0.01840.9065 ± 0.03270.0202 ± 0.02760.0784 ± 0.03595.6354 ± 4.9625
TL460.9507 ± 0.01830.9066 ± 0.03240.0273 ± 0.03610.0712 ± 0.03015.4712 ± 4.9985
TL370.9479 ± 0.02440.9018 ± 0.04200.0359 ± 0.04890.0684 ± 0.03666.2078 ± 6.9105
TL280.9343 ± 0.04210.8793 ± 0.06750.0883 ± 0.09160.0430 ± 0.021612.1457 ± 12.932
TL190.8911 ± 0.04910.8068 ± 0.07650.1697 ± 0.10650.0482 ± 0.020019.4041 ± 9.8893
Att. gateUNet32DL0.9438 ± 0.0150.894 ± 0.02660.0175 ± 0.02120.0949 ± 0.03215.3777 ± 4.2915
GDL0.9437 ± 0.01890.8939 ± 0.03340.0134 ± 0.01570.0992 ± 0.03865.2408 ± 3.9102
TL460.9352 ± 0.02140.879 ± 0.03740.0365 ± 0.04180.0932 ± 0.0388.0229 ± 6.3321
TL370.9400 ± 0.0180.8872 ± 0.03150.0364 ± 0.03780.0837 ± 0.03137.3583 ± 6.1879
TL280.9501 ± 0.02820.9062 ± 0.04700.0574 ± 0.05940.0423 ± 0.02366.6902 ± 7.1561
TL190.9254 ± 0.04000.8635 ± 0.06520.1106 ± 0.08750.0386 ± 0.020812.1541 ± 10.7049
For the UNet16, the TL19 show the higher Dice score, less under-segmentation, and less Hausdorff distance. The FP weight α = 0.4 and the FN weight β = 0.6 show the less OS than other α, β. For the UNet32, the TL19 show the highest Dice score of 0.9696, Jaccard, and less under segmentation. The Hausdorff distance is less for the TL28 model. Less OS has been observed with the generalized Dice loss model. The best scores in Table 1 are highlighted in bold letters.
We also investigated the performance of UNet16 and UNet32 with the attention gate in the encoder part of the neural network. We observed that UNet16 and attention gate with generalized Dice loss function and TL46 shows a higher Dice score of 0.9507. The UNet32 with attention gate and TL28 model shows higher Dice, and Jaccard scores. We observed that UNet32 with the TL19 loss, without attention gate, shows a higher Dice score. Hence, the inferences of the test dataset of 25 volumes from the TL19 network model are used further for the level set optimization.

3.2 Level set results

The test dataset of 25 vol MR images was considered as images for the level set optimization. The test dataset segmentation corresponding to the best Dice score of the deep learning model UNet32 with loss function TL19 was used as initial level set functions. Our main focus in this work lies in the optimization of parameters of the partial differential equation (4). The choice of values of these parameters [
• Li C.
• Xu C.
• Gui C.
• Fox M.D.
Distance regularized level set evolution and its application to image segmentation.
] used as the base values for our optimization. In this work, we consider the time step Δt = 1, the Dirac delta function parameter ϵ = 0.2, and the Gaussian kernel parameter σ = 0.2. To observe the effect of parameter selection in terms of Dice, accuracy, OS, and US, we experimented with several parameter choices. The inner and outer iteration spatial evolution considered, as an ordered pair, (xx) such that x ∈ {5, 15, …, 65}. The value of α = [ − 3, − 7] and λ = [3,7] with varying steps Δα = − 1 and Δλ = 1. Our initial level set function which is obtained from deep learning segmentation lies within the liver domain, hence we considered negative α values.
First, we have conducted level set post-processing experiments on slices in axial, sagittal, and coronal planes independently with the same parameters. We noticed that the axial plane shows a higher Dice score than other planes. Then, we used axial plane images for the level set optimization. We observed that α = − 5, λ = 5, inner iterations = 45 and outer iterations = 25 shows optimum results. In the level set experiments, we obtained the best Dice score of 0.9743 and the lowest Dice score of 0.9231, shown in Table 2. In deep learning-based segmentation, we obtained the best Dice score of 0.9824 and the lowest Dice score of 0.9241. The best and lowest Dice score volumes from the test dataset were compared with the ground truth and level set algorithm results are shown in Figs. 2 and 3. The level set post-processing shows less OS than deep learning results, and other metrics are inferior to the deep learning scores, shown in Fig. 4. We also removed 5 voxel cubes around the edge surface of liver parenchyma of deep learning and level set post-processed segmentations, denoted AI5 and LV5. We expect that border removal can reveal more information about OS and US. Fig. 4 shows the validation results of AI5 and LV5 evaluated against AI, and LV results. The dice and US scores had a greater impact, but less with OS and HD scores.
Table 2Comparison of test dataset of 25 vol metrics of deep learning prediction and level set processed deep learning segmentation scores.
Deep learning segmentation scoresLevel set post-processed segmentation scores
VolIDDiceJACOSUSHD95DiceJACOSUSHD95
10.98120.96320.01260.02491.410.96160.9260.00910.06781.73
20.97120.9440.00940.048350.97430.94980.01510.03645.1
30.97320.94780.01030.04339.850.96930.94040.00950.05212.37
40.98240.96540.00570.029510.96140.92570.00540.07181.41
50.9750.95120.02630.023720.96560.93350.02240.04642
60.97970.96020.02340.017220.97310.94760.02430.02952.45
70.97370.94880.0340.018620.97040.94260.04060.01852
80.9810.96270.00660.031410.95520.91430.00530.08431.73
90.970.94180.02710.032850.95850.92030.04160.04148.06
100.95010.9050.05410.04567.140.94520.89610.05840.05127.14
110.97450.95030.01410.036920.9640.93060.02650.04545
120.96680.93570.00510.06142.240.93410.87640.00450.12732.24
130.97270.94680.02690.027830.96150.92590.0230.0542.45
140.96990.94150.03640.02392.240.95730.9180.03120.05432.24
150.97150.94460.03110.025930.96280.92830.02530.04913
160.96270.92810.03590.038730.9550.91380.03120.05893
170.96350.92950.01840.05472.450.94160.88960.0150.10193
180.9690.93980.04610.01620.96020.92340.03890.04072
190.96920.94030.04920.012430.96090.92480.040.03822.24
200.96940.94060.04850.01273.160.96010.92330.04010.03973
210.97350.94840.02840.02461.410.96340.92940.02340.04981.73
220.96880.93950.04620.01622.240.96260.9280.04250.03222
230.96590.9340.05270.01562.830.95990.9230.04870.03142.24
240.92410.8590.11590.0359120.92310.85720.11290.04111
250.98220.9650.01270.022910.9510.90660.00580.09211.73
mean0.96960.94130.03110.02963.27870.95810.91980.02960.05423.6343
STD0.01180.02170.02360.01332.70230.01170.02130.0230.02492.9495

3.3 Study of clinical use of results

3.3.1 Clinical evaluation

The clinical evaluation dataset, a total of 19 clinical cases of MR images, and its deep learning and level set post-processed deep learning segmentations were presented to the review team. During the meeting, participants requested to view 3D liver models colored red on a black background rather than a bright green, which made some of the surfaces less appealing. Examinations of 3D deep learning and level set post-processed segmentations show a large OS that concurs with surrounding organs. These types of segmentations are considered bad and distracting segmentations, and not suitable for clinical use without manual editing, shown in Figs. 5(a) and 5(c). In the overlay of segmentations on MR images, the axial plane confirmed various segmentation leaks into nearby organs such as the heart, spleen, stomach, and right kidney, shown in Figs. 5(e) and 5(g). The larger indents and holes on the liver surface were associated with under-segmentation (Figs. 5(m) and 5(o)), and most likely a missed lesion, although needed confirmation by viewing segmentation overlaid on an axial plane.
The participants from the clinical analysis meeting commented that some of the deep learning segmentations look more natural with a bit of a rough surface compared to the overly smooth surface of level set processed 3D models. In other cases, some larger granularity on the liver surface was removed through level set post-processing and improved liver surface. On the axial plane, determining edges of the liver parenchyma varied in quality between cases, and between deep learning inferences and level set post-processed segmentation although no significant and systematic difference has been observed.
Clinically most significant segmentation flaws were fully or partly missed lesions which were automatically categorized as incomplete segmentation and required corrections to include the whole lesion. In certain cases, the vast majority of the liver segmentations were perfect although part of the lesion has been missed. Therefore it requires some local correction for clinical acceptance or standard segmentations. In seven cases (37%), deep learning and level set processed deep learning segmentations possess all required anatomical structures as well as pathology. These liver segmentations have some OS and US though a clear understandable liver shape included necessary vessels and lesions. Therefore after manual, semi-automatic, or even automatic separation of lesions and liver vessels inside of these segmentations, these inferences could be used for surgery planning without any modifications.

3.3.2 Segmentation correction time

The deep learning inferences of the clinical evaluation dataset of 19 volumes were considered as initial segmentations and corrected manually to create clinical standard segmentations (AI-GT). The median time to complete a segmentation was 18 min (inter-quartile range: 13:18–22:46). After five weeks, the same cases of level set post-processed deep learning segmentations (LS) were corrected manually (LS-GT) with a median time was 16 min (inter-quartile range: 12:47–21:56).
To evaluate the segmentation quality of the proposed method, we evaluated the results of both deep learning and level set results against AI-GT and LS-GT. Table 3 shows the impact of these corrections in terms of evaluation metrics values. The values for (AI-GT)+ (LS-GT) are obtained using “logical or” operation, and (AI-GT). (LS-GT) is obtained using Hadamard products of AI-GT and LS-GT. Intra-observer variation was measured to be about 3% in the Dice score between AI-GT and LS-GT. The HD95 is 1.6 mm for the AI-GT versus LS-GT which is ten times lesser than automatic method predictions and their corresponding manual corrections.
Table 3Comparison of clinical evaluation dataset of 19 volumes with manually corrected AI and LS results.
EvaluationsDiceJACOSUSHD95
AI vs AI-GT0.9540 ± 0.05610.917 ± 0.09580.0622 ± 0.10150.0297 ± 0.041314.5142 ± 20.6775
AI vs (AI-GT).(LS-GT)0.9392 ± 0.05860.8905 ± 0.09720.0975 ± 0.10920.0241 ± 0.038514.7683 ± 21.0661
AI vs (AI-GT)+ (LS-GT)0.9497 ± 0.05400.9087 ± 0.09190.0589 ± 0.09820.0417 ± 0.043114.5266 ± 20.3651
LS vs LS-GT0.9490 ± 0.05780.9081 ± 0.09810.0644 ± 0.10310.0376 ± 0.044315.0082 ± 20.6430
LS vs (AI-GT).(LS-GT)0.9483 ± 0.05770.9068 ± 0.09770.0731 ± 0.10460.0303 ± 0.041215.1672 ± 21.0178
LS vs (AI-GT)+ (LS-GT)0.9349 ± 0.05490.8822 ± 0.09170.0584 ± 0.09550.0718 ± 0.046815.0700 ± 20.2732
AI-GT vs LS-GT0.9718 ± 0.00690.9453 ± 0.0130N/AN/A1.5573 ± 0.3267

4. Discussion

Image-guided surgery systems have proven efficiency for navigation and excision in laparoscopic liver resection surgery planning along with the help of the latest techniques in segmentation, 3D model creation, registration, and tracking [
• Schneider C.
• Allam M.
• Stoyanov D.
• Hawkes D.
• Gurusamy K.
• Davidson B.
Performance of image guided navigation in laparoscopic liver surgery - a systematic review.
,
• Eason D.
• Rea P.
• Watson A.J.M.
Development of a patient-specific 3D-printed liver model for preoperative planning.
]. To create segmentations we presented an approach for the automated segmentation of liver parenchyma in T1-weighted MR images using a deep learning algorithm with and without level set post-processing. Both the deep learning and level set results were used for the clinical validation study to analyze the impact of these automated methods.
We used UNet variants with/without attention gate and obtained optimal outcomes for the T1 sequences of MR images. The literature shows that the neural network model with attention gate learns the target structures of varying shapes and sizes, and better training convergence, etc. [
• Schlemper J.
• Oktay O.
• Schaap M.
• Heinrich M.
• Kainz B.
• Glocker B.
• Rueckert D.
Attention gated networks: learning to leverage salient regions in medical images.
,
• Yeung M.
• Sala E.
• Schönlieb C.-B.
• Rundo L.
Focus U-Net: a novel dual attention-gated CNN for polyp segmentation during colonoscopy.
]. The UNet32 model without attention gate shows highest Dice score of 0.9696 for Tversky loss with α = 0.1 and β = 0.9. On the other hand, Salehi et al. [
• Salehi S.S.M.
• Erdogmus D.
• Gholipour A.
Tversky loss function for image segmentation using 3D fully convolutional deep networks.
] showed Tversky loss with α = 0.3 and β = 0.7 shows better result in their study. It is one of the important predictions from our experiment that there is a possibility to obtain a better performance depending on the choice of parameter choices in Tversky loss, the dataset, the network, etc.
In the level set experiment, we initiated our experiments with the base values from Li et al. [
• Li C.
• Xu C.
• Gui C.
• Fox M.D.
Distance regularized level set evolution and its application to image segmentation.
] and experimented on the inner and outer iteration counts for level set evolution with different α and λ values. If the boundary appears to be with weak edges, a higher α value may cause leakage. The liver and its other neighboring organs such as the heart, blood vessels, etc., can have hyper-intense values due to contrast injection that may cause weak edges. Though it is difficult to determine generalized values of the control parameters that yields optimal segmentation result, we made an effort to determine them by conducting experiments with many possible parameter choices and evaluating metric scores.
In Fig. 2, volume 4 in the test dataset represents both the AI and LS segmentation results that appear to be as close as the ground truth. Also, the image slices corresponding to the third row in Fig. 2 the show effectiveness of the level set in the case of the US. In comparison (yellow colored circles) with the ground truth image (Fig. 2(c)), the deep learning algorithm failed to segment some pixels in the boundary (Fig. 2(g)), and the level set algorithm correctly segment those missed pixels by AI (Fig. 2(k)). We have also observed that the level set fails if there is less connectivity in pixels. For example, the ground truth (Fig. 2(d)) shows a weak connectivity region, the deep learning result (Fig. 2(h)) loses its connection, and the level set result (Fig. 2(l)) appears to be completely lost (yellow colored circles).
In Fig. 3, volume 24 in the test dataset represents deep learning, and the level set result shows OS with leakage into the boundary and connecting region (yellow-colored circles). The metric scores of volume 24 in the test dataset (Table 2) show that the level set achieved lower OS, and Dice scores compared to deep learning results. Contrary to our expectations, the use of the level set method did not enhance the Dice score performance of deep learning results. The level set parameters were predetermined in this work to have an automatic process, but a semi-automatic level set might have given better results.
In general, leakage in boundary pixels is predominant and this drives less attention toward examining the inner pixel accuracy. We applied 5 × 5 × 5 and 3 × 3 × 3 voxels border removal to assess the effectiveness of the used AI and level set algorithms in terms of OS and US. Thus border removal helps to asses directly the most important inner regions. Fig. 4 shows the comparison of metrics distribution for AI, LS, AI5, and LS5. This approach enhanced the metrics of OS and US. The substantial impact could be seen in Dice and US scores, whereas the OS and HD remain less affected.
A visual inspection of AI or LS inferences overlaid with the ground truth can explain the over and under segmentations, for which we considered volume number 2 of the test dataset where both AI and level set algorithms failed to predict tumor region. The visual inspection shows that the contours of AI and LS inferences are consistent with ground truth in some slices as shown in Fig. 6(a). However, the network fails to predict the whole large lesion region, and major corrections are still needed after level set post-processing, shown in Fig. 6(b). The analysis of results from the same volume on multiple slices shows that our method fails to classify tumor regions that appear at the periphery of the liver, as shown in Fig. 6(c).
The visual analysis by clinicians The analysis of segmented 3D models from clinical evaluation shows that both the deep learning and level set segmentation required manual correction in 12 cases. The clinical acceptance of both deep learning and level set results varied between cases but overall the clinicians suggest the practical usage of inference results, after additional anatomical structure separation, in surgical planning.
After five weeks (to avoid bias) of the manual corrections of deep learning predictions, the same clinician performed manual corrections on level set predictions. The variability was measured in terms of deep learning and level set results versus their corrections. We have observed an intra-observer variability of 3% between deep learning and level set results corrected segmentations. We also evaluated metrics on these corrected segmentations (see Table 3). The correction time required for level set predictions is 2 min median time less than the correction time of deep learning predictions. It is difficult to make direct comparisons with other works in literature since different approaches, datasets, and delineation criteria were used in various published studies.
The time analysis to create clinically acceptable segmentation by manual correction of AI and level set inferences were carried out using 19 volumes of MR images by an appropriately trained medical doctor. A comparison with fully manually segmented MR images would be of value though would be challenging on a such scale. Firstly, a group of trained radiologists or medical doctors need to create manual segmentation for the test set. Secondly, AI inferences of the same images need to be corrected manually. Finally, level set inferences of the same images need to be corrected manually. It is a three-fold work with the same test dataset, and it would include many hours for complete manual segmentation and correction. The current work focuses on AI segmentation of liver parenchyma and not the full organ, and it would be even more time-consuming. Therefore a smaller scale may be planned in the future with multiple labels (parenchyma, separated vessels, and potentially liver lesions). The complete workflow evaluation could also be tested in the future measuring time needed from image acquisition to full liver segmentation approved by a clinician. This would be very relevant to working towards the use of intra-operative CT for liver resection planning and volumetric analysis e.g. living donor.
In this work, we used 256 resampled volumes as the input to the network, and scores were measured with the original data shapes. Our proposed AI algorithm performs better in terms of evaluation scores compared to other methods. Using different networks with RAM and Video RAM memory constraints, loss functions, highly variable data, network design, contrast enhancement pre-processing, etc., could lead to better results. The use of a 2D-slice-based level set for post-processing results in a stacked visualization of slice segmentation in the 3D models, which could be managed by using the volume-based level set approach. Apart from this, the use of GPU computation instead of CPU-based level set segmentation could reduce the time taken. Also, a potential way to avoid intra-observer variations to a certain level is by involving multiple clinical experts to correct the segmentations, which is however difficult as more time needs to be spent by clinicians on segmentations.

5. Conclusion

In this paper, we have analyzed the clinical applicability of two automated (deep learning and level set) algorithms for liver parenchyma segmentation of contrast-enhanced T1-weighted magnetic resonance (MR) images. The dataset possesses a certain degree of variability, which includes two different MR machine manufacturers with seven models. We have achieved the highest dice score of 0.9696 when compared to state-of-the-art deep learning algorithms in liver parenchyma segmentation. We have proved that the Tversky loss achieves higher scores other than the standard parameters. Apart from this, the importance of this research work also includes an analysis of clinical acceptance of the created segmentations, 3D models, and the time to create clinically accepted segmentations. We addressed some of the segmentation difficulties including missing tumor regions, boundary leakage, and over-segmentation of AI-based liver parenchyma segmentation by combining the merits of the deep learning algorithm with a level set approach. Our experiments show that approximately one-third of the segmentations using automated algorithms can produce results without the need for manual correction. Further research directions in this work may include a complete segmentation with lesions, vascular structures, and clinical applicability with a use case for surgery resection study.

CRediT authorship contribution statement

Varatharajan Nainamalai: Conceptualization, Methodology, Software, Validation, Formal analysis, Resources, Investigation, Data curation, Visualization, Supervision, Writing - original draft, Writing - review & editing. Pravda Jith Ray Prasad: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Visualization, Writing - original draft, Writing - review & editing. Egidijus Pelanis: Conceptualization, Methodology, Software, Validation, Formal analysis, Resources, Investigation, Data curation, Writing – original draft, Visualization, Writing – review & editing. Bjørn Edwin: Data curation, Supervision, Project administration, Funding acquisition, Investigation, Methodology, Resources, Validation, Writing - review & editing. Fritz Albregtsen: Formal analysis, Conceptualization, Supervision, Writing - review & editing. Ole Jakob Elle: Supervision, Project administration, Funding acquisition, Resources, Writing - review & editing. Rahul P. Kumar: Conceptualization, Supervision, Project administration, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The author N. Varatharajan was supported by the Advanced Analytics on Smart Data research ( ANALYST ) project funded by the IKT Pluss programme of the Research Council of Norway under Grant 270922 . Authors Pravda Jith Ray Prasad and Egidijus Pelanis are funded by the project High-Performance soft tissue Navigation ( HiPerNav ). This project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Marie Skłodowska-Curie grant agreement No. 722068 . The author Rahul P. Kumar was supported by BIA - User-driven Research-based Innovation project HoloCare Cloud funded by the Research Council of Norway under Grant 296570 . The authors would like to express their gratitude to the clinicians at “The Intervention Centre”, Oslo University Hospital for participating in the clinical examination meeting.

Ethical statement

The study was performed in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

References

• Torre L.A.
• Bray F.
• Siegel R.L.
• Ferlay J.
• Lortet-Tieulent J.
• Jemal A.
Global cancer statistics, 2012.
CA Cancer J. Clin. 2015; 65: 87-108https://doi.org/10.3322/caac.21262
• Bray F.
• Ferlay J.
• Soerjomataram I.
• Siegel R.L.
• Torre L.A.
• Jemal A.
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.
CA Cancer J. Clin. 2018; 68: 394-424https://doi.org/10.3322/caac.21492
• Keum N.
• Giovannucci E.
Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies.
Nat. Rev. Gastroenterol. Hepatol. 2019; 16: 713-732https://doi.org/10.1038/s41575-019-0189-8
• Zhou H.
• Liu Z.
• Wang Y.
• Wen X.
• Yuan L.
• Ran X.
• Xiong L.
• Ran Y.
• Chen W.
• Wen Y.
Colorectal liver metastasis: molecular mechanism and interventional therapy.
Signal Transduct. Target Ther. 2022; 70https://doi.org/10.1038/s41392-022-00922-2
• Hu D.
• Pan Y.
• Chen G.
Colorectal cancer liver metastases: an update of treatment strategy and future perspectives.
Surg. Pract. Sci. 2021; 7100042https://doi.org/10.1016/j.sipas.2021.100042
• Angelsen J.-H.
• Horn A.
• Sorbye H.
• Eide G.E.
• Løes I.M.
• Viste A.
Population-based study on resection rates and survival in patients with colorectal liver metastasis in Norway.
Br. J. Surg. 2017; 104: 580-589https://doi.org/10.1002/bjs.10457
• Witowski J.
• Rubinkiewicz M.
• Mizera M.
• Wysocki M.
• Gajewska N.
• Sitkowski M.
• Małczak P.
• Major P.
• Budzyński A.
• Pedziwiatr M.
Meta-analysis of short-and long-term outcomes after pure laparoscopic versus open liver surgery in hepatocellular carcinoma patients.
Surg. Endosc. 2019; 33: 1491-1507https://doi.org/10.1007/s00464-018-6431-6
• Aghayan D.L.
• Fretland A.̊smund A.
• Kazaryan A.M.
• Sahakyan M.A.
• Dagenborg V.J.
• Bjørnbeth B.A.
• Flatmark K.
• Kristiansen R.
• Edwin B.
Laparoscopic versus open liver resection in the posterosuperior segments: a sub-group analysis from the oslo-comet randomized controlled trial.
HPB. 2019; 21: 1485-1490https://doi.org/10.1016/j.hpb.2019.03.358
• Ciria R.
• Gomez-Luque I.
• Ocana S.
• Cipriani F.
• Halls M.
• Briceno J.
• Okuda Y.
• Troisi R.
• Rotellar F.
• Soubrane O.
• et al.
A systematic review and meta-analysis comparing the short-and long-term outcomes for laparoscopic and open liver resections for hepatocellular carcinoma: updated results from the European Guidelines Meeting on Laparoscopic Liver Surgery, Southampton, UK, 2017.
Ann. Surg. Oncol. 2019; 26: 252-263https://doi.org/10.1245/s10434-018-6926-3
• Aghayan D.L.
• Pelanis E.
• Fretland A.A.
• Kazaryan A.M.
• Sahakyan M.A.
• Røsok B.I.
• Barkhatov L.
• Bjørnbeth B.A.
• Elle O.J.
• Edwin B.
Laparoscopic parenchyma-sparing liver resection for colorectal metastases.
• Schneider C.
• Allam M.
• Stoyanov D.
• Hawkes D.
• Gurusamy K.
• Davidson B.
Performance of image guided navigation in laparoscopic liver surgery - a systematic review.
Surg. Oncol. 2021; 38101637https://doi.org/10.1016/j.suronc.2021.101637
• Eason D.
• Rea P.
• Watson A.J.M.
Development of a patient-specific 3D-printed liver model for preoperative planning.
Surg. Innov. 2017; 24: 145-150https://doi.org/10.1177/1553350616689414
• Corwin M.T.
• Fananapazir G.
• Jin M.
• Lamba R.
• Bashir M.R.
Differences in liver imaging and reporting data system categorization between mri and ct.
Am. J. Roentgenol. 2016; 206: 307-312https://doi.org/10.2214/ajr.15.14788
• Alabousi M.
• McInnes M.D.
• Salameh J.-P.
• Satkunasingham J.
• Kagoma Y.K.
• Ruo L.
• Meyers B.M.
• Aziz T.
• van der Pol C.B.
Mri vs. ct for the detection of liver metastases in patients with pancreatic carcinoma: a comparative diagnostic test accuracy systematic review and meta-analysis.
J. Magn. Reson. Imaging. 2021; 53: 38-48https://doi.org/10.1002/jmri.27072
• Mirpour S.
• Motaghi M.
• Hazhirkarzar B.
• Pawlik T.M.
• Kamel I.R.
Imaging of colorectal liver metastasis.
J. Gastrointest. Surg. 2021; : 1-13https://doi.org/10.1007/s11605-021-05164-
• Lustberg T.
• van Soest J.
• Gooding M.
• Peressutti D.
• Aljabar P.
• van der Stoep J.
• van Elmpt W.
• Dekker A.
Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer.
• Lundervold A.S.
• Lundervold A.
An overview of deep learning in medical imaging focusing on MRI.
Z. Med. Phys. 2019; 29: 102-127https://doi.org/10.1016/j.zemedi.2018.11.002
• Han X.
MR-based synthetic CT generation using a deep convolutional neural network method.
Med. Phys. 2017; 44: 1408-1419https://doi.org/10.1002/mp.12155
1. X. Han, Automatic liver lesion segmentation using A deep convolutional neural network method, CoRR abs/1704.07239 (2017). arXiv:1704.07239.

2. P.F. Christ, F. Ettlinger, F. Grün, M.E.A. Elshaera, J. Lipkova, S. Schlecht, F. Ahmaddy, S. Tatavarty, M. Bickel, P. Bilic, M. Rempfler, F. Hofmann, M.D. Anastasi, S.-A. Ahmadi, G. Kaissis, J. Holch, W. Sommer, R. Braren, V. Heinemann, B. Menze, Automatic Liver and Tumor Segmentation of CT and MRI Volumes using Cascaded Fully Convolutional Neural Networks (2017). arXiv:1702.05970.

• Fu Y.
• Mazur T.R.
• Wu X.
• Liu S.
• Chang X.
• Lu Y.
• Li H.H.
• Kim H.
• Roach M.C.
• Henke L.
• Yang D.
A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy.
Med. Phys. 2018; 45: 5129-5137https://doi.org/10.1002/mp.13221
3. V.V. Valindria, N. Pawlowski, M. Rajchl, I. Lavdas, E.O. Aboagye, A.G. Rockall, D. Rueckert, B. Glocker, Multi-modal learning from unpaired images: application to multi-organ segmentation in CT and MRI, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 547–556.10.1109/WACV.2018.00066.

4. J. Owler, B. Irving, G. Ridgeway, M. Wojciechowska, J. McGonigle, M. Brady, Comparison of multi-atlas segmentation and U-Net approaches for automated 3D liver delineation in MRI, in: Annual Conference on Medical Image Understanding and Analysis, Springer, 2019, pp.478–488.10.1007/978–3-030–39343-4_41.

• Chen Y.
• Ruan D.
• Xiao J.
• Wang L.
• Sun B.
• Saouaf R.
• Yang W.
• Li D.
• Fan Z.
Fully automated multiorgan segmentation in abdominal magnetic resonance imaging with deep neural networks.
Med. Phys. 2020; 47: 4971-4982https://doi.org/10.1002/mp.14429
5. V. Couteaux, M. Trintignac, O. Nempont, G. Pizaine, A.S. Vlachomitrou, P.-J. Valette, L. Milot, I. Bloch, Comparing Deep Learning strategies for paired but unregistered multimodal segmentation of the liver in T1 and T2-weighted MRI (2021). arXiv:2101.06979.

• Göçeri E.
Fully automated liver segmentation using sobolev gradient-based level set evolution.
Int. J. Numer. Methods Biomed. Eng. 2016; 32e02765https://doi.org/10.1002/cnm.2765
• Chen G.
• Gu L.
• Qian L.
• Xu J.
An improved level set for liver segmentation and perfusion analysis in MRIs.
Trans. Info Tech. Biomed. 2009; 13: 94-103https://doi.org/10.1109/TITB.2008.2007110
• Middleton I.
• Damper R.I.
Segmentation of magnetic resonance images using a combination of neural networks and active contour models.
Med. Eng. Phys. 2004; 26: 71-86https://doi.org/10.1016/S1350-4533(03)00137-1
• Alirr O.I.
Deep learning and level set approach for liver and tumor segmentation from CT scans.
J. Appl. Clin. Med. Phys. 2020; 21: 200-209https://doi.org/10.1002/acm2.13003
• Shu X.
• Yang Y.
• Wu B.
Adaptive segmentation model for liver CT images based on neural network and level set method.
Neurocomputing. 2021; 453: 438-452https://doi.org/10.1016/j.neucom.2021.01.081
• Coelho G.
• Rabelo N.N.
• Vieira E.
• Mendes K.
• Zagatto G.
• de Oliveira R.S.
• Raposo-Amaral C.E.
• Yoshida M.
• de Souza M.R.
• Fagundes C.F.
• Teixeira M.J.
• Figueiredo E.G.
Augmented reality and physical hybrid model simulation for preoperative planning of metopic craniosynostosis surgery.
Neurosurg. Focus. 2020; 48https://doi.org/10.3171/2019.12.FOCUS19854
• Palkovics D.
• Mangano F.G.
• Nagy K.
• et al.
Digital three-dimensional visualization of intrabony periodontal defects for regenerative surgical treatment planning.
BMC Oral Health. 2020; 20https://doi.org/10.1186/s12903-020-01342-w
• Singh GD S.M.
Virtual surgical planning: modeling from the present to the future.
J. Clin. Med. 2021; 30: 23https://doi.org/10.3390/jcm10235655
• Fretland A.A.
• Kazaryan A.M.
• Bjørnbeth B.A.
• Flatmark K.
• Andersen M.H.
• Tønnessen T.I.
• Bjørnelv G.M.W.
• Fagerland M.W.
• Kristiansen R.
• Øyri K.
• et al.
Open versus laparoscopic liver resection for colorectal liver metastases (the Oslo-CoMet Study): study protocol for a randomized controlled trial.
Trials. 2015; 16: 1-10https://doi.org/10.1186/s13063-015-0577-5
• Aghayan D.L.
• Kazaryan A.M.
• Dagenborg V.J.
• Røsok B.I.
• Fagerland M.W.
• WaalerBjørnelv G.M.
• Kristiansen R.
• Flatmark K.
• Fretland A.A.
• Edwin B.
• et al.
Long-term oncologic outcomes after laparoscopic versus open resection for colorectal liver metastases: a randomized trial.
Ann. Intern. Med. 2021; 174: 175-182https://doi.org/10.7326/M20-4011
• Fedorov A.
• Beichel R.
• Kalpathy-Cramer J.
• Finet J.
• Fillion-Robin J.-C.
• Pujol S.
• Bauer C.
• Jennings D.
• Fennessy F.
• Sonka M.
• Buatti J.
• Aylward S.
• Miller J.V.
• Pieper S.
• Kikinis R.
3D Slicer as an image computing platform for the Quantitative Imaging Network.
Magn. Reson. Imaging. 2012; 30: 1323-1341https://doi.org/10.1016/j.mri.2012.05.001
6. B. Ginsburg, P. Castonguay, O. Hrinchuk, O. Kuchaiev, V. Lavrukhin, R. Leary, J. Li, H. Nguyen, Y. Zhang, J.M. Cohen, Training deep networks with stochastic gradient normalized by layerwise adaptive second moments (2020). arxiv.org/pdf/1905.11286.pdf.

• Sudre C.H.
• Li W.
• Vercauteren T.
• Ourselin S.
• Jorge Cardoso M.
Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations.
in: Cardoso M.J. Arbel T. Carneiro G. Syeda-Mahmood T. Tavares J.M.R. Moradi M. Bradley A. Greenspan H. Papa J.P. Madabhushi A. Nascimento J.C. Cardoso J.S. Belagiannis V. Lu Z. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer International Publishing, Cham2017: 240-248https://doi.org/10.1007/978-3-319-67558-9_28 (pp)
7. F. Milletari, N. Navab, S. Ahmadi, V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation, 2016. arXiv:1606.04797.

• Crum W.
• Camara O.
• Hill D.
Generalized overlap measures for evaluation and validation in medical image analysis.
IEEE Trans. Med. Imaging. 2006; 25: 1451-1461https://doi.org/10.1109/TMI.2006.880587
• Erdogmus D.
• Gholipour A.
Tversky loss function for image segmentation using 3D fully convolutional deep networks.
Mach. Learn. Med. Imaging. 2017; : 379-387https://doi.org/10.1007/978-3-319-67389-9_44
• Li C.
• Xu C.
• Gui C.
• Fox M.D.
Distance regularized level set evolution and its application to image segmentation.
IEEE Trans. Image Process. 2010; 19: 3243-3254https://doi.org/10.1109/TIP.2010.2069690
• Vania M.
• Mureja D.
• Lee D.
Automatic spine segmentation from CT images using Convolutional Neural Network via redundant generation of class labels.
J. Comput. Des. Eng. 2019; 6: 224-232https://doi.org/10.1016/j.jcde.2018.05.002
• Nainamalai V.
• Lippert M.
• Brun H.
• Elle O.J.
• Kumar R.P.
Local integration of deep learning for advanced visualization in congenital heart disease surgical planning.
Intell. Based Med. 2022; 6100055https://doi.org/10.1016/j.ibmed.2022.100055
• Kuijf H.J.
• Biesbroek J.M.
• De Bresser e. a.
Standardized assessment of automatic segmentation of white matter hyperintensities and results of the WMH segmentation challenge.
IEEE Trans. Med. Imaging. 2019; 38: 2556-2568https://doi.org/10.1109/TMI.2019.2905770
• Karimi D.
• Salcudean S.E.
Reducing the Hausdorff distance in medical image segmentation with convolutional neural networks.
IEEE Trans. Med. Imaging. 2020; 39: 499-513https://doi.org/10.1109/TMI.2019.2930068
• Schlemper J.
• Oktay O.
• Schaap M.
• Heinrich M.
• Kainz B.
• Glocker B.
• Rueckert D.
Attention gated networks: learning to leverage salient regions in medical images.
Med. Image Anal. 2019; 53: 197-207https://doi.org/10.1016/j.media.2019.01.012
• Yeung M.
• Sala E.
• Schönlieb C.-B.
• Rundo L.
Focus U-Net: a novel dual attention-gated CNN for polyp segmentation during colonoscopy.
Comput. Biol. Med. 2021; 137104815https://doi.org/10.1016/j.compbiomed.2021.104815
• Salehi S.S.M.
• Erdogmus D.
• Gholipour A.
Tversky loss function for image segmentation using 3D fully convolutional deep networks.
in: Wang Q. Shi Y. Suk H.-I. Suzuki K. Machine Learning in Medical Imaging. Springer International Publishing, Cham2017: 379-387https://doi.org/10.1007/978-3-319-67389-9_44 (pp)