Image watermarking method combining attention mechanism and multi-scale feature

doi:10.11772/j.issn.1001-9081.2024030282

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 616-623.DOI: 10.11772/j.issn.1001-9081.2024030282

• Multimedia computing and computer simulation • Previous Articles

Image watermarking method combining attention mechanism and multi-scale feature

Tianqi ZHANG, Shuang TAN(), Xiwen SHEN, Juan TANG

School of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2024-03-18 Revised:2024-06-20 Accepted:2024-06-25 Online:2024-10-14 Published:2025-02-10
Contact: Shuang TAN
About author:ZHANG Tianqi， born in 1971， Ph. D.， professor. His research interests include modulation and demodulation of communication signals， blind processing.
SHEN Xiwen， born in 2000， M. S. candidate. His research interests include speech enhancement， speech signal processing.
TANG Juan， born in 2000， M. S. candidate. Her research interests include satellite spread spectrum signal capture.
Supported by:
Natural Science Foundation of Chongqing(cstc2021jcyj-msxmX0836)

融合注意力机制和多尺度特征的图像水印方法

张天骐, 谭霜(), 沈夕文, 唐娟

重庆邮电大学通信与信息工程学院，重庆 400065

通讯作者: 谭霜
作者简介:张天骐（1971—），男，四川眉山人，教授，博士，CCF会员，主要研究方向：通信信号的调制解调、盲处理
沈夕文（2000—），男，安徽滁州人，硕士研究生，主要研究方向：语音增强、语音信号处理
唐娟（2000—），女，四川德阳人，硕士研究生，主要研究方向：卫星扩频信号捕获。
基金资助:
重庆市自然科学基金资助项目(cstc2021jcyj-msxmX0836)

Abstract

Abstract:

Aiming at the problems that the watermarking method based on deep learning does not fully highlight key features of the image and does not utilize the output features of the intermediate convolution layer effectively， to improve the visual quality and the ability to resist noise attacks of the watermarked image， an attention mechanism-based multi-scale feature image watermarking method was proposed. An attention module was designed in the encoder part to focus on important image features， thereby reducing image distortion caused by watermark embedding； a multi-scale feature extraction module was designed in the decoder part to capture different levels of image details. Experimental results show that compared with the deep watermark model HiDDeN（Hiding Data with Deep Networks） on COCO dataset， the proposed method has the generated watermarked image’s Peak Signal-to-Noise Ratio （PSNR） and Structural SIMilarity （SSIM） increased by 11.63% and 1.29% respectively and has the average Bit Error Rate （BER） of watermark extraction for dropout， cropout， crop， Gaussian blur， and JPEG compression reduced by 53.85%. In addition， ablation experimental results confirm that the method adding attention module and multi-scale feature extraction module has better invisibility and robustness.

Key words: image watermarking, attention mechanism, feature extraction, robust watermarking, deep learning, adversarial training

摘要：

针对基于深度学习的水印方法未充分突显图像的关键特征，以及未有效利用中间卷积层输出特征的问题，为提升含水印图像的视觉质量和抵抗噪声攻击的能力，提出一种融合注意力机制和多尺度特征的图像水印方法。在编码器部分，设计注意力模块关注重要图像特征，以减小水印嵌入引起的图像失真；在解码器部分，设计多尺度特征提取模块，以捕获不同层次的图像细节。实验结果表明，在COCO数据集上与深度水印模型HiDDeN（Hiding Data with Deep Networks）相比，所提方法生成的含水印图像的峰值信噪比（PSNR）和结构相似度（SSIM）分别增加了11.63%和1.29%；所提方法针对dropout、cropout、crop、高斯模糊和JPEG压缩的水印提取平均误比特率（BER）降低了53.85%；此外，消融实验结果验证了添加注意力模块和多尺度特征提取模块的方法有更好的不可见性和鲁棒性。

关键词: 图像水印, 注意力机制, 特征提取, 鲁棒水印, 深度学习, 对抗训练

CLC Number:

TP309.7

Tianqi ZHANG, Shuang TAN, Xiwen SHEN, Juan TANG. Image watermarking method combining attention mechanism and multi-scale feature[J]. Journal of Computer Applications, 2025, 45(2): 616-623.

张天骐, 谭霜, 沈夕文, 唐娟. 融合注意力机制和多尺度特征的图像水印方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 616-623.

Figures/Tables 14

Fig. 1 Structure of proposed model

Fig. 2 Structure of encoder

Tab. 1 Noise layer types and descriptions

噪声种类	噪声描述
缩放	调整 $I e m$ 的尺寸，将它缩小或放大 $r$ 倍。若 $r < 1$ ，图像尺寸缩小；反之，则放大
dropout	$I e m$ 中的每个像素点在概率 $p d ∈ (0,1)$ 下保留，在概率 $1 - p d$ 下被 $I c o$ 对应位置的像素点替换。若 $p d$ 越大，图像像素丢失越少
高斯模糊	使用高斯核对 $I e m$ 中的每个像素点周围重新分配权重以平滑图像，高斯模糊攻击的强度由高斯核的标准差 $σ$ 决定。 $σ$ 越大，处理后的图像越平滑
JPEG压缩	将 $I e m$ 划分为若干个8×8的小块，并对每个小块作DCT得到频域系数，然后对频域系数进行量化。 JPEG压缩攻击强度由压缩质量参数 $q ∈ (50,100)$ 决定， $q$ 越大，图像细节保留得越好，图像质量越高
椒盐噪声	$I e m$ 中的每个像素点在概率 $p s ∈ (0,1)$ 下随机替换为黑色或白色的像素点，在概率 $1 - 2 p s$ 下被保留

Tab. 1 Noise layer types and descriptions

噪声种类	噪声描述
缩放	调整 $I e m$ 的尺寸，将它缩小或放大 $r$ 倍。若 $r < 1$ ，图像尺寸缩小；反之，则放大
dropout	$I e m$ 中的每个像素点在概率 $p d ∈ (0,1)$ 下保留，在概率 $1 - p d$ 下被 $I c o$ 对应位置的像素点替换。若 $p d$ 越大，图像像素丢失越少
高斯模糊	使用高斯核对 $I e m$ 中的每个像素点周围重新分配权重以平滑图像，高斯模糊攻击的强度由高斯核的标准差 $σ$ 决定。 $σ$ 越大，处理后的图像越平滑
JPEG压缩	将 $I e m$ 划分为若干个8×8的小块，并对每个小块作DCT得到频域系数，然后对频域系数进行量化。 JPEG压缩攻击强度由压缩质量参数 $q ∈ (50,100)$ 决定， $q$ 越大，图像细节保留得越好，图像质量越高
椒盐噪声	$I e m$ 中的每个像素点在概率 $p s ∈ (0,1)$ 下随机替换为黑色或白色的像素点，在概率 $1 - 2 p s$ 下被保留

Fig. 3 Noise layer effect

Fig. 4 Structure of decoder

Fig. 5 Structure of discriminator

Fig. 6 Subjective effect of image invisibility of different methods

Fig. 7 Heat maps before and after introducing attention module

Tab. 2 PSNR and SSIM of watermarked images generated by different methods

方法	PSNR/dB	SSIM/%
HiDDeN-NN	35.61	98.63
本文方法-NN	41.09	99.65
HiDDeN	30.88	96.65
本文方法	34.47	97.90

Fig. 8 BERs of different methods with different attack strengths

Tab. 3 Performance comparison of different methods on COCO dataset

方法	不可见性		鲁棒性（不同噪声攻击下的BER）						参数量/10⁶
方法	PSNR/dB	SSIM/%	dropout（ $p d = 0.3$ ）	cropout（p=0.3）	crop（p=0.035）	高斯模糊（σ=2）	JPEG压缩（q=80）	平均	参数量/10⁶
HiDDeN	30.88	96.65	0.07	0.06	0.12	0.04	0.37	0.13	0.45
ReDMark	35.93	96.60	0.08	0.08	0.12	0.50	0.25	0.21	0.13
IGA	—	—	0.22	0.13	0.26	0.19	0.13	0.19	—
SSLW	33.50	84.12	0.12	0.49	0.20	0.01	0.17	0.20	27.70
ARWGAN	35.87	96.88	0.04	0.04	0.04	0.03	0.14	0.06	1.50
本文方法	35.92	98.14	0.04	0.02	0.02	0.03	0.17	0.06	0.55

Tab. 3 Performance comparison of different methods on COCO dataset

方法	不可见性		鲁棒性（不同噪声攻击下的BER）						参数量/10⁶
方法	PSNR/dB	SSIM/%	dropout（ $p d = 0.3$ ）	cropout（p=0.3）	crop（p=0.035）	高斯模糊（σ=2）	JPEG压缩（q=80）	平均	参数量/10⁶
HiDDeN	30.88	96.65	0.07	0.06	0.12	0.04	0.37	0.13	0.45
ReDMark	35.93	96.60	0.08	0.08	0.12	0.50	0.25	0.21	0.13
IGA	—	—	0.22	0.13	0.26	0.19	0.13	0.19	—
SSLW	33.50	84.12	0.12	0.49	0.20	0.01	0.17	0.20	27.70
ARWGAN	35.87	96.88	0.04	0.04	0.04	0.03	0.14	0.06	1.50
本文方法	35.92	98.14	0.04	0.02	0.02	0.03	0.17	0.06	0.55

Fig. 9 Images from different datasets and their watermarked versions

Tab. 4 Comparison of results of proposed method on different datasets

数据集	不可见性		鲁棒性（不同噪声攻击下的BER）
数据集	PSNR/dB	SSIM/%	缩放（r=0.8）	Dropout（ $p d = 0.3$ ）	高斯模糊（σ=2）	JPEG压缩（q=80）	椒盐噪声（ $p s = 0.1$ ）	平均
COCO	34.47	97.90	0.01	0.07	0.03	0.04	0.07	0.04
ImageNet	34.88	97.75	0.02	0.08	0.03	0.05	0.07	0.05
VOC 2012	35.10	97.83	0.02	0.08	0.03	0.05	0.07	0.05
NaSC TG2	37.21	99.52	0.03	0.08	0.03	0.06	0.07	0.05
Animal	35.74	97.89	0.02	0.07	0.02	0.06	0.07	0.05
Intel	34.56	98.24	0.03	0.09	0.03	0.03	0.07	0.05

Tab. 4 Comparison of results of proposed method on different datasets

数据集	不可见性		鲁棒性（不同噪声攻击下的BER）
数据集	PSNR/dB	SSIM/%	缩放（r=0.8）	Dropout（ $p d = 0.3$ ）	高斯模糊（σ=2）	JPEG压缩（q=80）	椒盐噪声（ $p s = 0.1$ ）	平均
COCO	34.47	97.90	0.01	0.07	0.03	0.04	0.07	0.04
ImageNet	34.88	97.75	0.02	0.08	0.03	0.05	0.07	0.05
VOC 2012	35.10	97.83	0.02	0.08	0.03	0.05	0.07	0.05
NaSC TG2	37.21	99.52	0.03	0.08	0.03	0.06	0.07	0.05
Animal	35.74	97.89	0.02	0.07	0.02	0.06	0.07	0.05
Intel	34.56	98.24	0.03	0.09	0.03	0.03	0.07	0.05

Tab. 5 Comparison of ablation experimental results

方法	不可见性		鲁棒性（不同噪声攻击下的BER）
方法	PSNR/dB	SSIM/%	缩放（r=0.8）	dropout（ $p d = 0.3$ ）	高斯模糊（σ=2）	JPEG压缩（q=80）	椒盐噪声（ $p s = 0.1$ ）	平均
w/o am	33.42	97.77	0.19	0.07	0.07	0.09	0.05	0.09
w/o mf	30.79	96.76	0.08	0.10	0.14	0.09	0.14	0.11
本文方法	34.47	97.90	0.01	0.07	0.03	0.04	0.07	0.04

Tab. 5 Comparison of ablation experimental results

方法	不可见性		鲁棒性（不同噪声攻击下的BER）
方法	PSNR/dB	SSIM/%	缩放（r=0.8）	dropout（ $p d = 0.3$ ）	高斯模糊（σ=2）	JPEG压缩（q=80）	椒盐噪声（ $p s = 0.1$ ）	平均
w/o am	33.42	97.77	0.19	0.07	0.07	0.09	0.05	0.09
w/o mf	30.79	96.76	0.08	0.10	0.14	0.09	0.14	0.11
本文方法	34.47	97.90	0.01	0.07	0.03	0.04	0.07	0.04

References 29

1	KASHYAP N， SINHA G R. Image watermarking using 3-level Discrete Wavelet Transform （DWT）［J］. International Journal of Modern Education and Computer Science， 2012， 4（3）： 50-56.
2	ABRAHAM J， PAUL V. An imperceptible spatial domain color image watermarking scheme［J］. Journal of King Saud University —Computer and Information Sciences， 2019， 31（1）： 125-133.
3	ALI M， AHN C W， PANT M. A robust image watermarking technique using SVD and differential evolution in DCT domain［J］. Optik， 2014， 125（1）： 428-434.
4	LU S P， WANG R， ZHONG T， et al. Large-capacity image steganography based on invertible neural networks［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10811-10820.
5	LU J， NI J， SU W， et al. Wavelet-based CNN for robust and high-capacity image watermarking［C］// Proceedings of the 2022 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2022： 1-6.
6	PLATA M， SYGA P. Robust spatial-spread deep neural image watermarking［C］// Proceedings of the IEEE 19th International Conference on Trust， Security and Privacy in Computing and Communications. Piscataway： IEEE， 2020： 62-70.
7	TEOH Y J， LING H C， WONG W K， et al. A hybrid SVD-based image watermarking scheme utilizing both U and V orthogonal vectors for robustness and imperceptibility［J］. IEEE Access， 2023， 11： 51018-51031.
8	钟瑞泽，谢海波. 基于视觉显著性与量化指数调制的图像鲁棒水印算法［J］. 电子测量与仪器学报， 2020， 34（3）： 17-27.
	ZHONG R Z， XIE H B. Robust image watermarking algorithm based on visual saliency and quantization exponential modulation［J］. Journal of Electronic Measurement and Instrumentation， 2020， 34（3）： 17-27.
9	YUAN Z， LIU D， ZHANG X， et al. DCT-based color digital image blind watermarking method with variable steps［J］. Multimedia Tools and Applications， 2020， 79（41/42）： 30557-30581.
10	张天骐，周琳，梁先明，等. 基于Blob-Harris特征区域和NSCT-Zernike的鲁棒水印算法［J］. 电子与信息学报， 2021， 43（7）： 2038-2045.
	ZHANG T Q， ZHOU L， LIANG X M， et al. A robust watermarking algorithm based on Blob-Harris and NSCT-Zernike［J］. Journal of Electronics and Information Technology， 2021， 43（7）： 2038-2045.
11	FANG H， JIA Z， MA Z， et al. PIMoG： an effective screen-shooting noise-layer simulation for deep-learning-based watermarking network［C］// Proceedings of the 30th ACM International Conference on Multimedia. New York： ACM， 2022： 2267-2275.
12	MAHAPATRA D， AMRIT P， SINGH O P， et al. Autoencoder-convolutional neural network-based embedding and extraction model for image watermarking［J］. Journal of Electronic Imaging， 2023， 32（2）： No.021604.
13	WANG X， MA D， HU K， et al. Mapping based residual convolution neural network for non-embedding and blind image watermarking［J］. Journal of Information Security and Applications， 2021， 59： No.102820.
14	BALUJA S. Hiding images in plain sight： deep steganography［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 2066-2076.
15	ZHU J， KAPLAN R， JOHNSON J， et al. HiDDeN： hiding data with deep networks［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11219. Cham： Springer， 2018： 682-697.
16	HAO K， FENG G， ZHANG X. Robust image watermarking based on generative adversarial network［J］. China Communications， 2020， 17（11）： 131-140.
17	ZHAO Z， LI J， LUO Z， et al. Remote sensing image scene classification based on an enhanced attention module［J］. IEEE Geoscience and Remote Sensing Letters， 2021， 18（11）： 1926-1930.
18	FU J， LIU J， JIANG J， et al. Scene segmentation with dual relation-aware attention network［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（6）： 2547-2560.
19	YAN C， HAO Y， LI L， et al. Task-adaptive attention for image captioning［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（1）： 43-51.
20	CHEN B， TAN W， COATRIEUX G， et al. A serial image copy-move forgery localization scheme with source/target distinguishment［J］. IEEE Transactions on Multimedia， 2021， 23： 3506-3517.
21	ZHANG H， LI Y. Digital watermarking via inverse gradient attention［C］// Proceedings of the 9th International Conference on Behavioural and Social Computing. Piscataway： IEEE， 2022： 1-3.
22	YU C. Attention based data hiding with generative adversarial networks［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2020： 1120-1128.
23	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
24	AHMADI M， NOROUZI A， KARIMI N， et al. ReDMark： framework for residual diffusion watermarking based on deep networks［J］. Expert Systems with Applications， 2020， 146： No.113157.
25	FERNANDEZ P， SABLAYROLLES A， FURON T， et al. Watermarking images in self-supervised latent spaces［C］// Proceedings of the 2022 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2022： 3054-3058.
26	HUANG J， LUO T， LI L， et al. ARWGAN： attention-guided robust image watermarking model based on GAN［J］. IEEE Transactions on Instrumentation and Measurement， 2023， 72： No.5018417.
27	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255.
28	EVERINGHAM M， VAN GOOL L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge［J］. International Journal of Computer Vision， 2010， 88（2）： 303-338.
29	ZHOU Z， LI S， WU W， et al. NaSC-TG2： natural scene classification with Tiangong-2 remotely sensed imagery［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2021， 14： 3228-3242.

[1]	Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(2): 421-427.
[2]	Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU. Summary of network intrusion detection systems based on deep learning [J]. Journal of Computer Applications, 2025, 45(2): 453-466.
[3]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[4]	Haiteng MENG, Xiaole ZHAO, Tianrui LI. Lightweight image super-resolution reconstruction based on asymmetric information distillation network [J]. Journal of Computer Applications, 2025, 45(2): 601-609.
[5]	Dixin WANG, Jiahao WANG, Min LI, Hao CHEN, Guangyao HU, Yu GONG. Abnormal attack detection for underwater acoustic communication network [J]. Journal of Computer Applications, 2025, 45(2): 526-533.
[6]	Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317.
[7]	Ying HUANG, Changsheng LI, Hui PENG, Su LIU. Dual-branch network guided by local entropy for dynamic scene high dynamic range imaging [J]. Journal of Computer Applications, 2025, 45(1): 204-213.
[8]	Zongsheng ZHENG, Jia DU, Yuhe CHENG, Zecheng ZHAO, Yuewei ZHANG, Xulong WANG. Cross-modal dual-stream alternating interactive network for infrared-visible image classification [J]. Journal of Computer Applications, 2025, 45(1): 275-283.
[9]	Xinran XU, Shaobing ZHANG, Miao CHENG, Yang ZHANG, Shang ZENG. Bearings fault diagnosis method based on multi-pathed hierarchical mixture-of-experts model [J]. Journal of Computer Applications, 2025, 45(1): 59-68.
[10]	Jietao LIANG, Bing LUO, Lanhui FU, Qingling CHANG, Nannan LI, Ningbo YI, Qi FENG, Xin HE, Fuqin DENG. Point cloud registration method based on coordinate geometric sampling [J]. Journal of Computer Applications, 2025, 45(1): 214-222.
[11]	Yan YAN, Xingying QIAN, Pengbin YAN, Jie YANG. Federated learning-based statistical prediction and differential privacy protection method for location big data [J]. Journal of Computer Applications, 2025, 45(1): 127-135.
[12]	Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU. Action recognition algorithm based on attention mechanism and energy function [J]. Journal of Computer Applications, 2025, 45(1): 234-239.
[13]	Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG. Facial attribute estimation and expression recognition based on contextual channel attention mechanism [J]. Journal of Computer Applications, 2025, 45(1): 253-260.
[14]	Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution [J]. Journal of Computer Applications, 2025, 45(1): 223-233.
[15]	Siqi ZHANG, Jinjun ZHANG, Tianyi WANG, Xiaolin QIN. Deep temporal event detection algorithm based on signal temporal logic [J]. Journal of Computer Applications, 2025, 45(1): 90-97.

Image watermarking method combining attention mechanism and multi-scale feature

融合注意力机制和多尺度特征的图像水印方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 29

Related Articles 15

Recommended Articles

Metrics