TBFormer

TBFormer: Two-Branch Transformer for Image Forgery Localization

Liu, Y., Lv, B., Jin, X., Chen, X., & Zhang, X. (2023). TBFormer: Two-Branch Transformer for Image Forgery Localization. In ArXiv: Vol. abs/2302.13004. https://doi.org/10.48550/arXiv.2302.13004

TBFormer cover

概述

é’ˆåÆ¹å›¾åƒēÆ”ę”¹ę£€ęµ‹ļ¼Œę–‡ē« ęå‡ŗēš„ę–¹ę³•ę˜Æļ¼Œåˆ©ē”Øäø¤äøŖē‰¹å¾ęå–ēš„transformerē½‘ē»œē»“ęž„ļ¼Œäø€äøŖč“Ÿč“£RGBē‰¹å¾ļ¼Œäø€äøŖč“Ÿč“£å™Ŗå£°ē‰¹å¾ļ¼Œäø¤äøŖåˆ†ę”Æäøå…±äŗ«ęƒé‡ļ¼›åŒę—¶ęå‡ŗäŗ†äø€äøŖé’ˆåÆ¹čžåˆčæ™äø¤äøŖåˆ†ę”Æē‰¹å¾ēš„ę³Øę„åŠ›ę„ŸēŸ„ēš„å±‚ę¬”ē‰¹å¾čžåˆęØ”å—Attention-aware Hierarchical-feature Fusion Module(AHFMļ¼‰ļ¼Œåˆ©ē”Øä½ē½®ę³Øę„åŠ›å°†ę„č‡Ŗäø¤äøŖåŸŸēš„ē‰¹å¾åµŒå…„åˆ°åŒę„ēš„ē‰¹å¾åŸŸäø­čæ›č”Œē‰¹å¾č”Øē¤ŗļ¼›ęœ€åŽtransformerēš„decoderļ¼ŒåŒę—¶åŠ å…„ē±»åˆ«åµŒå…„ļ¼Œcategory embeddings,ē”Øę„åšē‰¹å¾é‡å»ŗä»„ē”Ÿęˆé¢„ęµ‹ęŽ©ē ć€‚

ä½œč€…ęåˆ°ē›®å‰ēš„äø€äŗ›ē»“åˆRGB和Noise domainēš„ę–¹ę³•ļ¼Œéƒ½ę˜ÆåŸŗäŗŽCNNēš„ē»“ęž„č®¾č®”ēš„ļ¼›č€Œå…¶ä»–äø€äŗ›ē”Øåˆ°äŗ†transformerēš„ē½‘ē»œē»“ęž„ļ¼Œåƒę˜ÆObjectFormeräø€čˆ¬éƒ½äøę˜Æē›“ęŽ„å°†å›¾ē‰‡ä½œäøŗč¾“å…„ļ¼Œč€Œę˜Æå…ˆē”ØCNNčæ›č”Œäŗ†ē‰¹å¾ęå–ä¹‹åŽå†ę„čæ›č”Œå—åµŒå…„; ä»„åŠåƒę˜ÆETåˆ™ę˜Æļ¼Œč™½ē„¶ä½æē”Øäŗ†å¤šäøŖtransformerå±‚ę„ęå–ē‰¹å¾ļ¼Œä½†ę˜ÆåŖä»ŽRGBåŸŸé‡Œé¢čæ›č”Œęå–ļ¼ŒåŒę—¶ä¹Ÿęž„å»ŗäŗ†äø€äøŖCNNēš„decoder怂

ęŠ€ęœÆč“”ēŒ®

  1. ęå‡ŗäŗ†äø€äøŖę–°ēš„å®Œå…ØTransformerē±»åž‹ēš„ē½‘ē»œē»“ęž„ļ¼ŒåŒ…å«äø¤äøŖē‰¹å¾ęå–ēš„åˆ†ę”Æļ¼Œē”Øę„åšēÆ”ę”¹å®šä½ć€‚

  2. ęå‡ŗäŗ†äø€äøŖę–°ēš„ę³Øę„åŠ›ę„ŸēŸ„å±‚ę¬”ē‰¹å¾čžåˆęØ”å—ļ¼Œē”Øę„é«˜ę•ˆēš„ē»“åˆę„č‡Ŗäø¤äøŖäøåŒåŸŸēš„åˆ†ę”Æēš„ē‰¹å¾ć€‚

  3. ęå‡ŗäŗ†äø€äøŖTransformerēš„decoderē”Øę„åšē‰¹å¾é‡ęž„ę„ē”Ÿęˆé¢„ęµ‹ęŽ©ē ć€‚

ęØ”åž‹

TBFormerē½‘ē»œę”†ęž¶å›¾

å¦‚å›¾ę‰€ē¤ŗļ¼Œäø€äøŖRGBåˆ†ę”Æå’Œäø€äøŖå™Ŗå£°åˆ†ę”Æļ¼ŒRGBé¢œč‰²åŸŸēš„å›¾ē‰‡ $\boldsymbol{I}{c} \in \mathbb{R}^{H \times W \times 3}$ ē»čæ‡BayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) ē½‘ē»œå¾—åˆ°å›¾ē‰‡ēš„å™Ŗå£°å›¾$\boldsymbol{I}{n} \in \mathbb{R}^{H \times W \times 3}$ ļ¼Œē„¶åŽå°†RGBå›¾ē‰‡åˆ’åˆ†äøŗå¤šäøŖ16 x 16ēš„patch,$\boldsymbol{X}{c}=\left{\boldsymbol{x}{c}^{(1)}, \boldsymbol{x}{c}^{(2)}, \cdots, \boldsymbol{x}{c}^{(N)}\right}$, where $\boldsymbol{x}{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ ē»„ęˆåŗåˆ—ļ¼Œé€ščæ‡ēŗæę€§ę˜ å°„ļ¼Œåŗåˆ—äø­ēš„ęÆäøŖå›¾ē‰‡å—éƒ½ä¼šč¢«reshapeꈐ1ē»“å‘é‡ļ¼Œč€Œäø€ē»“å‘é‡ē»„ęˆēš„åŗåˆ—å°±ęž„ęˆäŗ†å—åµŒå…„åŗåˆ—ļ¼Œpatch embedding sequence $\boldsymbol{P}{c}=\left{\boldsymbol{p}{c}^{(1)}, \boldsymbol{p}{c}^{(2)}, \cdots, \boldsymbol{p}{c}^{(N)}\right} \in \mathbb{R}^{N \times L}$ļ¼›č€ŒåÆ¹åŗ”ēš„ä½ē½®ē¼–ē åˆ™ę˜Æå¦‚å›¾äø­ę‰€ē¤ŗļ¼Œē›“ęŽ„åˆ†åˆ«åŠ åˆ°åÆ¹åŗ”ēš„åµŒå…„åŗåˆ—é‡Œé¢ļ¼Œē»„ęˆęœ€åŽēš„č¾“å…„åŗåˆ—ć€‚$\boldsymbol{E}{c}=\left{\boldsymbol{e}{c}^{(1)}, \boldsymbol{e}{c}^{(2)}, \ldots, \boldsymbol{e}{c}^{(N)}\right} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}{c}^{(i)}=\boldsymbol{p}{c}^{(i)}+\text{pos}{c}^{(i)}$

ęŽ„ē€å°†č¾“å…„åŗåˆ—å–‚čæ›ē”±12äøŖTransformerå±‚ļ¼ˆå¤šå¤“č‡Ŗę³Øę„åŠ›ęØ”å—å’Œäø€äøŖå¤šå±‚ę„ŸēŸ„ęØ”å—(čæ™äøå°±ę˜ÆCNN?MLP)ļ¼‰ē»„ęˆēš„ē‰¹å¾ęå–å™Øļ¼Œē„¶åŽę”¶é›†ē¬¬4,8,12å±‚ēš„č¾“å‡ŗ$\boldsymbol{T}{c}^{(4)}, \boldsymbol{T}{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}$ļ¼›

Tc={Tc(4),Tc(8),Tc(12)}=fc(Ec)\boldsymbol{T}_{c}=\left\{\boldsymbol{T}_{c}^{(4)}, \boldsymbol{T}_{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}\right\}=f_{c}\left(\boldsymbol{E}_{c}\right)
Mc(i)=MSAc(i)(LN(Tc(iāˆ’1)))+Tc(iāˆ’1)Tc(i)=MLPc(i)(LN(Mc(i)))+Mc(i)\begin{aligned} \boldsymbol{M}_{c}^{(i)} & =\text{MSA}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{T}_{c}^{(i-1)}\right)\right)+\boldsymbol{T}_{c}^{(i-1)} \\ \boldsymbol{T}_{c}^{(i)} & =\text{MLP}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{M}_{c}^{(i)}\right)\right)+\boldsymbol{M}_{c}^{(i)} \end{aligned}
SAc(i)(Tc(iāˆ’1))=softmax(Qc(i)(Kc(i))T/L)Vc(i)\text{SA}_{c}^{(i)}\left(\boldsymbol{T}_{c}^{(i-1)}\right)=\text{softmax}\left(\boldsymbol{Q}_{c}^{(i)}\left(\boldsymbol{K}_{c}^{(i)}\right)^{\mathrm{T}} / \sqrt{L}\right) \boldsymbol{V}_{c}^{(i)}
Qc(i)=Tc(iāˆ’1)WcQ(i),Kc(i)=Tc(iāˆ’1)WcK(i),Vc(i)=Tc(iāˆ’1)WcV(i),andWcQ(i),WcK(i),WcV(i)\boldsymbol{Q}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \\\boldsymbol{K}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \\\boldsymbol{V}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cV}}^{(i)}, \\and \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \boldsymbol{W}_{\mathrm{cV}}^{(i)}

åŒę ·ēš„ļ¼ŒåœØå™Ŗå£°åˆ†ę”ÆäøŠļ¼Œä»„ē›øåŒēš„ęØ”å—ļ¼Œä½†ę˜Æäøå…±äŗ«ēš„ęƒé‡ć€‚ē“§ęŽ„ē€ę„åˆ°AHFMęØ”å—ļ¼Œčæ›č”Œäø¤äøŖåˆ†ę”Æēš„ē‰¹å¾ēš„čžåˆć€‚ē”±äŗŽäø¤äøŖåˆ†ę”Æēš„ē‰¹å¾ē›øå·®č¾ƒå¤§ļ¼Œę‰€ä»„åœØę³Øę„åŠ›ę„ŸēŸ„å±‚ę¬”ē‰¹å¾ęØ”å—é‡Œé¢ļ¼Œä½œč€…ęž„å»ŗäŗ†äø€äøŖä½ē½®ę³Øę„åŠ›ęØ”å—ļ¼ˆposition attention PAļ¼‰ęØ”å—ć€‚å¦‚äø‹å›¾ę‰€ē¤ŗļ¼Œåˆ†åˆ«å°†ä»Žē‰¹å¾ęå–å™Øē¬¬4/8/12å±‚å¾—åˆ°ēš„ē‰¹å¾å›¾ļ¼Œé¦–å…ˆē»čæ‡č½¬ē½®ē„¶åŽreshapeęˆäø‰ē»“å‘é‡ļ¼›ęŽ„ē€å°†äø¤äøŖåˆ†ę”Æēš„č½¬ē½®å˜ę¢åŽēš„ē‰¹å¾ē›øåŠ ļ¼ˆconcatenateļ¼Œä»„é€šé“ē»“åŗ¦ļ¼‰ļ¼Œå†ē»čæ‡å·ē§Æļ¼Œå†ę¬”ē»čæ‡äø‰äøŖäøåŒå·ē§Æę øēš„å·ē§Æļ¼Œå¾—åˆ°äø‰äøŖę–°ēš„ē‰¹å¾å›¾ļ¼Œå†ē»čæ‡softmaxå¾—åˆ°ä½ē½®ę³Øę„åŠ›ęƒé‡ļ¼Œęœ€åŽčæ›äø€ę­„å¾—åˆ°čžåˆēš„ē‰¹å¾å›¾ć€‚ä»„åŒę ·ēš„ę–¹å¼å¾—åˆ°ē¬¬å…«å±‚ļ¼Œ12å±‚ļ¼Œäø‰äøŖčžåˆåŽēš„ē‰¹å¾å›¾ļ¼Œē»čæ‡é€äøŖå…ƒē“ ēš„ē›øåŠ ļ¼Œ3*3ēš„å·ē§Æļ¼Œę‰¹ę ‡å‡†åŒ–ļ¼ŒReLU激擻得encoderé˜¶ę®µęœ€åŽēš„čžåˆēš„ē‰¹å¾å›¾ć€‚

ä½ē½®ę³Øę„åŠ›ęØ”å—
A(4)=softmax((T(4_1))TT(4_2))\boldsymbol{A}^{(4)}=\text{softmax}\left(\left(\boldsymbol{T}^{\left(4 \_1\right)}\right)^{\mathrm{T}} \boldsymbol{T}^{\left(4 \_2\right)}\right)
Z(4)=Conv(4)(α(4)(T(4_3)A(4))reshapeĀ āŠ•T^(4))\boldsymbol{Z}^{(4)}=\text{Conv}^{(4)}\left(\alpha^{(4)}\left(\boldsymbol{T}^{\left(4 \_3\right)} \boldsymbol{A}^{(4)}\right)_{\text {reshape }} \oplus \hat{\boldsymbol{T}}^{(4)}\right)
Z=Conv(Z(12)āŠ•Z(8)āŠ•Z(4))\boldsymbol{Z} = \text{Conv}\left(\boldsymbol{Z}^{(12)} \oplus \boldsymbol{Z}^{(8)} \oplus \boldsymbol{Z}^{(4)}\right)

ęŽ„ē€ę„åˆ°äŗ†č§£ē é˜¶ę®µļ¼Œē›“ęŽ„å½“åščÆ­ä¹‰åˆ†å‰²ēš„ä»»åŠ”ę„åÆ¹å¾…ļ¼Œč®¾ē½®äø¤äøŖåÆå­¦ä¹ ēš„ē±»åˆ«åµŒå…„ļ¼ˆēœŸå®žēš„ļ¼ŒēÆ”ę”¹ēš„ļ¼‰ę„čæ›äø€ę­„å­¦ä¹ ēœŸå®žēš„å’ŒēÆ”ę”¹ēš„ē‰¹å¾č”Øē¤ŗļ¼Œčæ™äø¤äøŖē±»åˆ«åµŒå…„å’Œčžåˆē‰¹å¾ēš„å—åµŒå…„äø€čµ·č¾“å…„åˆ°č§£ē å™Øēš„äø¤äøŖTransformerå±‚é‡Œé¢ļ¼Œę„å¾—åˆ°é¢„ęµ‹ęŽ©ē ć€‚äøŗäŗ†å¾—åˆ°čžåˆē‰¹å¾ēš„å—åµŒå…„ļ¼Œpatch embeddingsļ¼Œé¦–å…ˆę˜Æreshape,transposeļ¼Œē„¶åŽēŗæę€§ę˜ å°„ļ¼›čæ™äŗ›åµŒå…„å’Œē±»åˆ«åµŒå…„äø€čµ·č¾“å…„åˆ°Transformerå±‚ļ¼Œē»čæ‡ę­£åˆ™åŒ–äøŠé‡‡ę ·ē­‰ę“ä½œå¾—åˆ°ęœ€åŽēš„é¢„ęµ‹ęŽ©ē ć€‚

YĀØ=L2(fprojĀ (ZĀØ))(L2(fprojĀ (SĀØ)))T\ddot{\boldsymbol{Y}}=L_{2}\left(f_{\text {proj }}(\ddot{\boldsymbol{Z}})\right)\left(L_{2}\left(f_{\text {proj }}(\ddot{\boldsymbol{S}})\right)\right)^{T}
M=softmax(Upsample(Y))\boldsymbol{M}=\text{softmax}(\text{Upsample}(\boldsymbol{Y}))

å¦‚äøŠčæ°å…¬å¼ļ¼ŒZ代蔨encoderēš„čžåˆē‰¹å¾ēš„åµŒå…„ļ¼Œč€ŒSä»£č”Øē±»åˆ«åµŒå…„ļ¼Œē»čæ‡projļ¼ˆēŗæę€§ę˜ å°„å‡½ę•°ļ¼‰ä»„åŠL2ļ¼ˆę­£åˆ™ļ¼‰ēš„åø¦ęœ€åŽēš„é‡åŒ–å€¼Yļ¼Œē„¶åŽå†ē»čæ‡åÆ¹Yēš„äøŠé‡‡ę ·ļ¼Œå¾—åˆ°M é¢„ęµ‹ęŽ©ē ć€‚

å®žéŖŒ

设置

é’ˆåÆ¹splicing,copy move,inpainting(removalļ¼‰éƒ½åˆ¶ä½œäŗ†å¤§é‡ēš„ę•°ę®é›†ļ¼Œ140432å¼ ē”Øę„č®­ē»ƒļ¼Œ7787ē”Øę„validation,7787ē”Øę„ęµ‹čÆ•ć€‚č€ŒåœØęµ‹čÆ•é›†äøŠļ¼Œē”Øäŗ†4äøŖå…¬å¼€ēš„ę•°ę®é›†ļ¼ŒNIST,CASIA v1.0,IMD20ļ¼Œä»„åŠRealisticę„čæ›č”ŒčÆ„ä¼°ć€‚čÆ„ä»·ęŒ‡ę ‡äøŠļ¼Œä½æē”Øäŗ†F1-score,IOU和AUCļ¼Œå½“é¢„ęµ‹ēš„maskäŗŒå€¼åŒ–ę—¶ļ¼Œé€‰ę‹©0.5ä½œäøŗé—Øé™å€¼ć€‚äø€äŗ›å®žéŖŒēš„ē»†čŠ‚ę˜Æļ¼Œę‰€ęœ‰č¾“å…„å›¾ē‰‡ę˜Æ512x512ēš„ļ¼Œä¼˜åŒ–å™Øē”ØSGDļ¼Œå¤šé”¹å¼č”°å‡å­¦ä¹ ēŽ‡ē­–ē•„ę˜Æ$l r=l r_{0}\left(1-\text { iter }{\text {current }} / \text { iter }{\text {total }}\right)^{0.9}$,batchsize是8ļ¼Œč®­ē»ƒäŗ†15äøŖepoch怂

ē»“ęžœ

å¦‚č”Øę ¼ę‰€ē¤ŗļ¼Œä½œč€…čæ›č”Œäŗ†å’Œäø‹é¢å‡ äøŖę–¹ę³•ēš„å®žéŖŒļ¼Œéƒ½č¾¾åˆ°äŗ†ęÆ”č¾ƒå„½ēš„ē»“ęžœć€‚

compare with SOTA

ę¶ˆčžē ”ē©¶äøŠļ¼Œåˆ†ęžäŗ†åˆ†ę”Æēš„å½±å“ļ¼ŒčžåˆęØ”å—ēš„å½±å“ć€‚

ablation study

ꀝ考

ē»“ęžœę˜ÆęŒŗå„½ēš„ļ¼Œä½†ę˜ÆåÆčƒ½å½“å‰åœØēÆ”ę”¹ę£€ęµ‹é‡Œé¢ļ¼Œčæ™äøŖę”†ęž¶ļ¼ŒRGB再堆frequencyęˆ–č€…ē”Ønoise mapēš„ę¦‚åæµļ¼Œå·²ē»ēŽ©äŗ†ęÆ”č¾ƒå¤šäŗ†ļ¼Œę‰€ä»„ę–‡ē« ä¹ŸčŠ±ęÆ”č¾ƒå°‘ēš„åŠ›ę°”č®²äøŗä»€ä¹ˆč¦čæ™ä¹ˆē»“åˆļ¼Œč€Œę˜ÆčÆ“ęˆ‘čæ™ę¬”ēš„ē»“åˆå’Œå…¶ä»–äŗŗäøåŒļ¼ŒåœØäŗŽęˆ‘åŖē”Øē”Øäŗ†Transformerļ¼Œčæ™äøŖē†ē”±å„½åƒčÆ“å¾—čæ‡åŽ»ļ¼Œåˆå„½åƒäøé‚£ä¹ˆå¼ŗļ¼›čæ˜é‡ē‚¹ęåˆ°äŗ†čæ™äøŖäø¤ē§ē‰¹å¾ē»“åˆēš„ęØ”å—ć€‚

é‚£ęˆ‘ēš„é—®é¢˜åÆčƒ½ę˜Æļ¼Œnoise map对copy moveå…¶å®žę˜Æę²”ęœ‰ē”Øēš„ļ¼Œä»„åŠčÆ“ę•“äøŖēš„frequencyļ¼›č€Œē”Ønoise mapä»æä½›åˆåŖč€ƒč™‘åˆ°äŗ†é«˜é¢‘ē‰¹å¾ļ¼Œä¹Ÿęœ‰äŗ›ę–¹ę³•ę˜Æē»“åˆé«˜é¢‘ē‰¹å¾å’Œä½Žé¢‘ē‰¹å¾ļ¼Œčæ™äŗ›ć€‚ć€‚ēŽ„å­¦ļ¼Ÿ

ē„¶åŽę˜Æļ¼Œē”Ÿęˆäŗ†åå‡ äø‡ēš„č®­ē»ƒé›†ļ¼Œä»æä½›ē”Øę„čÆ„ä¼°ēš„ę•°ę®é›†ęÆ”č¾ƒå°‘ļ¼Œč€Œäø”å¤§éƒØåˆ†ē”Øę„čÆ„ä¼°ēš„ę•°ę®é›†ä»æä½›éƒ½åÆä»„å¾ˆč½»ę˜“å°±č¢«å¤§éƒØåˆ†ēš„ę–¹ę³•č¾¾åˆ°90%+ļ¼Œę‰€ä»„åœØē”Ÿęˆč®­ē»ƒé›†ēš„ę—¶å€™ļ¼ŒåŽ»ē”ŸęˆčÆ„ä¼°ē”Øēš„ę•°ę®é›†ļ¼Œä¼šęœ‰é“ē†å—ļ¼Ÿ

GitHub

https://github.com/free1dom1/TBFormer

Last updated

Was this helpful?