An Extensible Hierarchical Multimodal Semantic Segmentation Network for Underwater Scenarios
-
Graphical Abstract
-
Abstract
Semantic segmentation technology is renowned for its remarkable target comprehension, and has garnered significant attention in underwater scenarios. Traditional approaches for underwater image semantic segmentation are far from ideal due to the poor imaging quality. Recently, multimodal semantic segmentation techniques have shown promising results in non-underwater scenarios, yet face challenges of information redundancy and local semantic loss. In this work, we propose an extensible hierarchical multimodal semantic segmentation network for underwater scenarios (E-HMSNet) to address these issues. Specifically, we devise a cross-modal mask complementation (CMC) strategy. It utilizes partially visible blocks to reconstruct inter-modality associations, so as to eliminate the information redundancy. Subsequently, we introduce a cross-layer semantic supplementation module (CSS), which learns to enhance feature correlations between adjacent layers to compensate for local information loss effectively. Additionally, we construct a simulated underwater multimodal semantic segmentation (UWS) Dataset by leveraging the structural similarities between forward-looking sonar and optical imagery. Our experimental results in public Dataset PST900 demonstrate that E-HMSNet achieves higher mIOU and mACC than state-of-the-art baseline methods, with improvements of 2.87% and 1.79% respectively. In addition, E-HMSNet improves by 2.78% and 3.98% in the UWS Dataset, demonstrating its better adaptability for underwater scenarios.
-
-