Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention

doi:10.11772/j.issn.1001-9081.2025050645

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1692-1702.DOI: 10.11772/j.issn.1001-9081.2025050645

• Frontier and comprehensive applications • Previous Articles

Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention

Jing HU¹, Shikun CHEN¹(), Fang WANG¹, Rui ZHANG¹, Yong WANG²

^1.School of Computer Science and Technology，Taiyuan University of Science and Technology，Taiyuan Shanxi 030024，China
^2.Civil Explosives Engineering Branch of Shanxi Coking Coal Group Company Limited，Taiyuan Shanxi 030300，China

Received:2025-06-12 Revised:2025-08-10 Accepted:2025-09-09 Online:2025-09-25 Published:2026-05-10
Contact: Shikun CHEN
About author:HU Jing， born in 1977， Ph. D.， professor. Her research interests include image processing， deep learning.
WANG Fang， born in 1989， M. S.， lecturer. Her research interests include medical image segmentation.
ZHANG Rui， born in 1987， Ph. D.， associate professor. His research interests include intelligent information processing.
WANG Yong， born in 1984， engineer. His research interests include artificial intelligence， big data mining.
Supported by:
Shanxi Provincial Natural Science Foundation(202203021211189);Enterprise Commissioned Horizontal Project(2021035)

基于线性可变形卷积与双域协同动态注意力的矿石图像分割

胡静¹, 陈世堃¹(), 王芳¹, 张睿¹, 王勇²

^1.太原科技大学计算机科学与技术学院，太原 030024
^2.山西焦煤民爆集团矿山民爆工程分公司，太原 030300

通讯作者: 陈世堃
作者简介:胡静（1977—），女，山西太原人，教授，博士，CCF高级会员，主要研究方向：图像处理、深度学习
王芳（1989—），女，山西太原人，讲师，硕士，主要研究方向：医学图像分割
张睿（1987—），男，山西太原人，副教授，博士，CCF高级会员，主要研究方向：智能信息处理
王勇（1984—），男，湖北武汉人，工程师，主要研究方向：人工智能、大数据挖掘。
基金资助:
山西省自然科学基金资助项目(202203021211189);山西省自然科学基金资助项目(202403021221142);企业委托横向项目(2021035)

Abstract

Abstract:

In order to solve the problems of blurred boundaries and insufficient accuracy in ore image segmentation caused by complex texture， irregular shape and uneven illumination， a segmentation network with Linear Deformable Convolution （LDConv） and dual-domain synergistic dynamic attention was proposed， namely LDDA-Net （Linear Deformable Dual-domain Attention Network）. LDDA-Net adopted an encoder-decoder architecture. In the serial dual-feature encoder， an adaptive sampling point distribution was constructed through LDConv to flexibly fit the irregular shapes of the ore， and effectively control the computational overhead with its linear characteristics. Secondly， a Dynamic Attention Modulation （DAM） module was designed for spatial domain features， which realized dynamic focusing and reinforcement of the key information in the feature map and the ore edge through pooling sampling， learnable attention matrix and boundary-sensitive weight allocation mechanism. Finally， a new Dynamic Progressive Attention Guided Loss function （DPAG Loss） was proposed， which guided the model to focus on hard-to-divide areas such as fuzzy boundaries and small-sized ore particles during the training process by dynamically generating attention maps in multiple stages， and a space-loss dual-domain synergy was formed by DPAG Loss and DAM module， creating a feedback closed-loop mechanism of feature perception and learning strategies. Experimental results on the self-built open-pit ore dataset （OpenPitOre dataset） and the public ore dataset （Ore dataset） showed that LDDA-Net achieved a HD95 boundary error of only 16.84 mm， which is 11.37% lower than that of the suboptimal model VM-Unet； it attained the Dice coefficient as high as 91.54%， the mIoU and PA of 85.13% and 94.10%， respectively， significantly outperforming comparative segmentation models. LDDA-Net achieves high-precision and refined segmentation in complex scenarios， providing reliable technical support for intelligent detection and fragmentation analysis of ore in open-pit blasting.

Key words: ore image segmentation, semantic segmentation, Linear Deformable Convolution (LDConv), attention mechanism, boundary detection, dual-domain synergy

摘要：

针对矿石图像在分割任务中因纹理复杂、形态不规则以及光照不均导致的边界模糊与精度不足问题，提出一种基于线性可变形卷积（LDConv）与双域协同动态注意力的分割网络LDDA-Net（Linear Deformable Dual-domain Attention Network）。该网络采用编码器与解码器架构，在串行双重特征编码器中通过LDConv构建自适应采样点分布，灵活拟合矿石的不规则形态，并且凭借LDConv的线性特性有效控制计算开销；其次，针对空间域特征设计动态注意力调制（DAM）模块，通过池化采样、可学习注意力矩阵和边界敏感权重分配机制实现特征图中关键信息与矿石边缘的动态聚焦和强化；最后，提出一种新的动态渐进式注意力引导损失函数（DPAG Loss），通过多阶段动态生成注意力图，引导模型在训练过程中聚焦模糊边界与小颗粒矿石等难分割区域，并与DAM模块形成空间-损失双域协同，构建特征感知与学习策略的反馈闭环机制。在自建露天矿石数据集（OpenPitOre Dataset）与公开矿石数据集（Ore dataset）上的实验结果表明，LDDA-Net的豪斯多夫-95距离（HD95）边界误差仅16.84 mm，相较于次优模型VM-Unet降低了11.37%；Dice系数高达91.54%，平均交并比（mIoU）和像素准确率（PA）分别为85.13%和94.10%，均显著优于对比分割模型。LDDA-Net在复杂场景下可实现高精度与精细化分割效果，可为露天爆破矿石的智能检测与块度分析提供可靠的技术支撑。

关键词: 矿石图像分割, 语义分割, 线性可变形卷积, 注意力机制, 边界检测, 双域协同

CLC Number:

TP391.4

Jing HU, Shikun CHEN, Fang WANG, Rui ZHANG, Yong WANG. Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention[J]. Journal of Computer Applications, 2026, 46(5): 1692-1702.

胡静, 陈世堃, 王芳, 张睿, 王勇. 基于线性可变形卷积与双域协同动态注意力的矿石图像分割[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1692-1702.

Figures/Tables 14

References 30

[1]	徐述腾，周永章. 基于深度学习的镜下矿石矿物的智能识别实验研究［J］. 岩石学报， 2018， 34（11）： 3244-3252.
	XU S T， ZHOU Y Z. Artificial intelligence identification of ore minerals under microscope based on deep learning algorithm［J］. Acta Petrologica Sinica， 2018， 34（11）： 3244-3252.
[2]	ZHAN Y， ZHANG G. An improved OTSU algorithm using histogram accumulation moment for ore segmentation［J］. Symmetry， 2019， 11（3）： No.431.
[3]	ANDERSSON T， THURLEY M J， CARLSON J E. A machine vision system for estimation of size distributions by weight of limestone particles［J］. Minerals Engineering， 2012， 25（1）： 38-46.
[4]	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 International Conference on Medical image computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
[5]	DUAN J， LIU X. Online monitoring of green pellet size distribution in haze-degraded images based on VGG16-LU-Net and haze judgment［J］. IEEE Transactions on Instrumentation and Measurement， 2021， 70： No.5006316.
[6]	WANG W， LI Q， XIAO C， et al. An improved boundary-aware U‑Net for ore image semantic segmentation［J］. Sensors， 2021， 21（8）： No.2615.
[7]	FILIPPO M P， GOMES O D F M， COSTA G A O P DA， et al. Deep learning semantic segmentation of opaque and non-opaque minerals from epoxy resin in reflected light microscopy images［J］. Minerals Engineering， 2021， 170： No.107007.
[8]	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. ［2025-06-03］..
[9]	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical Vision Transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
[10]	ZHANG Y， LIU H， HU Q. TransFuse： fusing Transformers and CNNs for medical image segmentation［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 14-24.
[11]	XIA C， WANG X， LV F， et al. ViT-CoMer： Vision Transformer with convolutional multi-scale feature interaction for dense predictions［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 5493-5502.
[12]	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
[13]	LIU X， ZHANG Y， JING H， et al. Ore image segmentation method using U-Net and Res_Unet convolutional networks［J］. RSC Advances， 2020， 10（16）： 9396-9406.
[14]	LI F， LIU X， YIN Y， et al. DDR-Unet： a high-accuracy and efficient ore image segmentation method［J］. IEEE Transactions on Instrumentation and Measurement， 2023， 72： No.5027920.
[15]	XIAO D， LIU X， LE B T， et al. An ore image segmentation method based on RDU-Net model［J］. Sensors， 2020， 20（17）： No.4979.
[16]	YANG H， HUANG C， WANG L， et al. An improved encoder-decoder network for ore image segmentation［J］. IEEE Sensors Journal， 2021， 21（10）： 11469-11475.
[17]	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation［EB/OL］. ［2021-02-08］..
[18]	WANG B， WANG F， DONG P， et al. Multiscale TransUNet++： dense hybrid U-Net with Transformer for medical image segmentation［J］. Signal， Image and Video Processing， 2022， 16（6）： 1607-1614.
[19]	HEIDARI M， KAZEROUNI A， SOLTANY M， et al. HiFormer： hierarchical multi-scale representations using Transformers for medical image segmentation［C］// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2023： 6191-6201.
[20]	郇宝乾，宋家威，张万忠，等. 基于TAUNet分割模型的爆堆块度空间分布研究［J］. 矿业研究与开发， 2024，44（5）： 37-44.
	HUAN B Q， SONG J W， ZHANG W Z， et al. Spatial distribution of blast reactor block based on TAUNet segmentation model［J］. Mining Research and Development， 2024， 44（5）： 37-44.
[21]	DAI J， QI H， XIONG Y， et al. Deformable convolutional networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 764-773.
[22]	ZHANG X， SONG Y， SONG T， et al. LDConv： linear deformable convolution for improving convolutional neural networks［J］. Image and Vision Computing， 2024， 149： No.105190.
[23]	ZHANG Y， CHENG L， PENG Y， et al. Faster OreFSDet： a lightweight and effective few-shot object detector for ore images［J］. Pattern Recognition， 2023， 141： No.109664.
[24]	CAO H， WANG Y， CHEN J， et al. Swin-Unet： Unet-like pure Transformer for medical image segmentation［C］// Proceedings of the 2022 European Conference on Computer Vision Workshops， LNCS 13803. Cham： Springer， 2023： 205-218.
[25]	RUAN J， XIANG S， XIE M， et al. MALUNet： a multi-attention and light-weight UNet for skin lesion segmentation［C］// Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway： IEEE， 2022： 1150-1156.
[26]	RUAN J， LI J， XIANG S. VM-UNet： vision Mamba UNet for medical image segmentation［EB/OL］. ［2024-11-08］..
[27]	OKTAY O， SCHLEMPER J， LE FOLGOC L， et al. Attention U-Net： learning where to look for the pancreas［EB/OL］. ［2025-04-20］..
[28]	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
[29]	XIE E， WANG W， YU Z， et al. SegFormer： simple and efficient design for semantic segmentation with Transformers［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 12077-12090.
[30]	CHENG B， MISRA I， SCHWING A G， et al. Masked-attention mask transformer for universal image segmentation［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 1280-1289.

模型	Dice/%	HD95/mm	mIoU/%	P/%	R/%	PA/%
U-net	85.20	23.11	72.50	82.52	89.63	90.55
Swin-Unet	88.83	20.13	76.50	86.50	91.20	92.85
TransUnet	87.95	21.50	73.30	82.30	91.11	92.40
TransFuse	79.61	25.30	67.86	72.57	88.22	89.50
MALUNet	87.96	21.07	78.69	85.07	91.30	92.95
VM-Unet	90.06	19.00	82.59	89.86	92.29	93.60
ViT-CoMer	89.85	19.92	83.65	89.50	90.20	93.75
LDDA-Net	91.54	16.84	85.13	90.01	91.09	94.10

模型	Dice/%	HD95/mm	mIoU/%	P/%	R/%	PA/%
U-net	85.20	23.11	72.50	82.52	89.63	90.55
Swin-Unet	88.83	20.13	76.50	86.50	91.20	92.85
TransUnet	87.95	21.50	73.30	82.30	91.11	92.40
TransFuse	79.61	25.30	67.86	72.57	88.22	89.50
MALUNet	87.96	21.07	78.69	85.07	91.30	92.95
VM-Unet	90.06	19.00	82.59	89.86	92.29	93.60
ViT-CoMer	89.85	19.92	83.65	89.50	90.20	93.75
LDDA-Net	91.54	16.84	85.13	90.01	91.09	94.10

BCE	Boundary	DPAG	Dice/%	HD95/mm	mIoU/%	PA/%
√			82.23	24.45	77.65	88.15
	√		81.85	23.92	76.50	87.80
√	√		83.63	22.02	78.23	89.30
√	√	√	85.81	20.15	80.74	90.85

BCE	Boundary	DPAG	Dice/%	HD95/mm	mIoU/%	PA/%
√			82.23	24.45	77.65	88.15
	√		81.85	23.92	76.50	87.80
√	√		83.63	22.02	78.23	89.30
√	√	√	85.81	20.15	80.74	90.85

方法	Dice/%	H/%	I_mIoU/%	Params/10⁶	GFLOPS
Conv=3×3， N=9	85.81	20.15	80.74	11.89	7.81
LDConv， N=5	83.20	23.76	77.23	9.49	6.26
LDConv， N=7	84.24	21.25	78.99	10.94	7.16
LDConv， N=9	88.75	18.75	83.42	12.38	8.10
DefConv=3×3， N=9	88.03	19.20	82.20	13.92	9.02
LDConv， N=11	90.12	18.94	84.12	15.83	11.63

Ore image segmentation with linear deformable convolution and dual-domain synergistic dynamic attention

基于线性可变形卷积与双域协同动态注意力的矿石图像分割

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 30

Related Articles 15

Recommended Articles

Metrics

模型	编码器		模块		n_Skip	Dice/%	HD95/mm	mIoU/%	PA/%
模型	ResNetV2	Transformer	LDConv	DAM	n_Skip	Dice/%	HD95/mm	mIoU/%	PA/%
M0	√				3	83.52	22.58	78.15	87.95
M1	√	√			3	85.81	20.15	80.74	90.85
M2	√	√	√		3	88.75	18.75	83.42	92.12
M3	√	√		√	3	87.21	17.99	82.20	91.50
M4	√		√	√	3	89.55	18.10	84.05	92.80
M5	√	√	√	√	1	89.80	18.20	83.90	90.84
M6	√	√	√	√	2	89.46	17.85	83.88	92.30
M7	√	√	√	√	3	91.54	16.84	85.13	94.10

模型	Dice/%	HD95/mm	mIoU/%	PA/%
LDDA-Net	88.85	17.37	86.67	93.30
U-net	80.07	23.51	77.05	88.74
HiFormer	84.13	19.95	82.58	90.05
Att U-net	83.22	20.84	80.36	89.82
HRNet	82.44	21.23	80.44	89.72
SegFormer	85.21	18.26	83.94	91.41
Mask2Former	86.52	17.25	83.29	90.16

[1]	Huijie GUO, Tianfeng DOU, Zhenlin ZHANG, Kaiyuan QI, Dong WU, Zhijian QU, Zhao LI, Chongguang REN. Time-interdependency-aware dynamic Bayesian network for traffic prediction [J]. Journal of Computer Applications, 2026, 46(5): 1507-1517.
[2]	Wen PENG, Bokai ZHANG, Jinwei LIN. Chromosome cascaded classification framework integrating image texture enhancement and super-resolution [J]. Journal of Computer Applications, 2026, 46(5): 1647-1657.
[3]	Qianfei WANG, Yang LI, Deyu LI, Suge WANG. Dual-channel feature fusion representation method for short-text clustering based on large language model [J]. Journal of Computer Applications, 2026, 46(5): 1441-1449.
[4]	Ruirui SONG, Leichun WANG, Yunping HE, Jinxiang WEI, Xiangfeng LU, Xiaomeng LIU. Long time series prediction based on hybrid self-attention and differentiated normalization [J]. Journal of Computer Applications, 2026, 46(5): 1499-1506.
[5]	Ying JING, Ran LI, Zhuo JIANG, Ziyang FU, Jingyi DU, Qi LIU, Jihang LIU. SAM Meibomian gland unified dense segmentation method with introduction of automatic prompt encoder [J]. Journal of Computer Applications, 2026, 46(5): 1667-1676.
[6]	Baoyuan ZHENG, Chaobo HE. Graph convolutional network enhanced by graph diffusion and dual-view feature learning [J]. Journal of Computer Applications, 2026, 46(5): 1370-1377.
[7]	Hongrui ZHANG, Weiming FENG, Luxia YANG, Yongjie MA. CSAF-YOLO： improved YOLO11 algorithm for underwater small object detection [J]. Journal of Computer Applications, 2026, 46(5): 1578-1585.
[8]	Chuandong QIN, Zhiqiang SUO. Skin cancer classification integrating improved ResNet50 with ensemble classifier [J]. Journal of Computer Applications, 2026, 46(4): 1354-1362.
[9]	Huanxian LIU, Hongtao WANG, Xian’ao WANG, Hongmei WANG, Weifeng XU. Multimodal fact verification with cross-modal semantic association [J]. Journal of Computer Applications, 2026, 46(4): 1069-1076.
[10]	Xumeng DOU, Bin XIE, Zhaohui ZHANG, Zhengang ZHAO, Hanyu DUAN, Aolei GUO. Drug-target interaction prediction based on structure-network collaborative features and grid-attention enhanced Kolmogorov-Arnold network [J]. Journal of Computer Applications, 2026, 46(4): 1344-1353.
[11]	Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG. Power image retrieval method based on improved Swin Transformer [J]. Journal of Computer Applications, 2026, 46(4): 1334-1343.
[12]	Peirong SHAO, Suzhen LIN, Yanbo WANG. Human-centric detail-enhanced virtual try-on method [J]. Journal of Computer Applications, 2026, 46(3): 915-923.
[13]	Zuxi ZHANG, Zhancheng ZHANG, Fuyuan HU. Local and long-range temporal complementary modeling for video action recognition [J]. Journal of Computer Applications, 2026, 46(3): 758-766.
[14]	Ming LI, Mengqi WANG, Aili ZHANG, Hua REN, Yuqiang DOU. Image steganography method based on conditional generative adversarial networks and hybrid attention mechanism [J]. Journal of Computer Applications, 2026, 46(2): 475-484.
[15]	Sizhong ZHANG, Jianyang LIU, Linfeng LI. Action quality assessment model based on trajectory-guided perceptual learning with X3D [J]. Journal of Computer Applications, 2026, 46(2): 555-563.