TBFormer
TBFormer: Two-Branch Transformer for Image Forgery Localization
Liu, Y., Lv, B., Jin, X., Chen, X., & Zhang, X. (2023). TBFormer: Two-Branch Transformer for Image Forgery Localization. In ArXiv: Vol. abs/2302.13004. https://doi.org/10.48550/arXiv.2302.13004

ę¦čæ°
é对å¾å篔ę¹ę£ęµļ¼ęē« ęåŗēę¹ę³ęÆļ¼å©ēØäø¤äøŖē¹å¾ęåētransformerē½ē»ē»ęļ¼äøäøŖč“č“£RGBē¹å¾ļ¼äøäøŖč“č“£åŖå£°ē¹å¾ļ¼äø¤äøŖåęÆäøå ±äŗ«ęéļ¼åę¶ęåŗäŗäøäøŖé对čåčæäø¤äøŖåęÆē¹å¾ē注ęåęē„ēå±ę¬”ē¹å¾čå樔åAttention-aware Hierarchical-feature Fusion Moduleļ¼AHFMļ¼ļ¼å©ēØä½ē½®ę³Øęåå°ę„čŖäø¤äøŖåēē¹å¾åµå „å°åęēē¹å¾åäøčæč”ē¹å¾č”Øē¤ŗļ¼ęåtransformerēdecoderļ¼åę¶å å „ē±»å«åµå „ļ¼category embeddings,ēØę„åē¹å¾é建仄ēęé¢ęµę©ē ć
ä½č ęå°ē®åēäøäŗē»åRGBåNoise domainēę¹ę³ļ¼é½ęÆåŗäŗCNNēē»ę设讔ēļ¼čå ¶ä»äøäŗēØå°äŗtransformerēē½ē»ē»ęļ¼åęÆObjectFormeräøč¬é½äøęÆē“ę„å°å¾ēä½äøŗč¾å „ļ¼čęÆå ēØCNNčæč”äŗē¹å¾ęåä¹ååę„čæč”ååµå „; 仄ååęÆETåęÆļ¼č½ē¶ä½æēØäŗå¤äøŖtransformerå±ę„ęåē¹å¾ļ¼ä½ęÆåŖä»RGBåéé¢čæč”ęåļ¼åę¶ä¹ę建äŗäøäøŖCNNēdecoderć
ęęÆč“”ē®
ęåŗäŗäøäøŖę°ēå®å ØTransformerē±»åēē½ē»ē»ęļ¼å å«äø¤äøŖē¹å¾ęåēåęÆļ¼ēØę„å篔ę¹å®ä½ć
ęåŗäŗäøäøŖę°ē注ęåęē„å±ę¬”ē¹å¾čå樔åļ¼ēØę„é«ęēē»åę„čŖäø¤äøŖäøååēåęÆēē¹å¾ć
ęåŗäŗäøäøŖTransformerēdecoderēØę„åē¹å¾éęę„ēęé¢ęµę©ē ć
樔å

å¦å¾ę示ļ¼äøäøŖRGBåęÆåäøäøŖåŖå£°åęÆļ¼RGBé¢č²åēå¾ē $\boldsymbol{I}{c} \in \mathbb{R}^{H \times W \times 3}$ ē»čæBayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) ē½ē»å¾å°å¾ēēåŖå£°å¾$\boldsymbol{I}{n} \in \mathbb{R}^{H \times W \times 3}$ ļ¼ē¶åå°RGBå¾ēååäøŗå¤äøŖ16 x 16ēpatchļ¼$\boldsymbol{X}{c}=\left{\boldsymbol{x}{c}^{(1)}, \boldsymbol{x}{c}^{(2)}, \cdots, \boldsymbol{x}{c}^{(N)}\right}$, where $\boldsymbol{x}{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ ē»ęåŗåļ¼éčæēŗæę§ę å°ļ¼åŗåäøēęÆäøŖå¾ēåé½ä¼č¢«reshapeę1结åéļ¼čäøē»“åéē»ęēåŗåå°±ęęäŗååµå „åŗåļ¼patch embedding sequence $\boldsymbol{P}{c}=\left{\boldsymbol{p}{c}^{(1)}, \boldsymbol{p}{c}^{(2)}, \cdots, \boldsymbol{p}{c}^{(N)}\right} \in \mathbb{R}^{N \times L}$ļ¼č对åŗēä½ē½®ē¼ē åęÆå¦å¾äøę示ļ¼ē“ę„åå«å å°åƹåŗēåµå „åŗåéé¢ļ¼ē»ęęåēč¾å „åŗåć$\boldsymbol{E}{c}=\left{\boldsymbol{e}{c}^{(1)}, \boldsymbol{e}{c}^{(2)}, \ldots, \boldsymbol{e}{c}^{(N)}\right} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}{c}^{(i)}=\boldsymbol{p}{c}^{(i)}+\text{pos}{c}^{(i)}$
ę„ēå°č¾å „åŗååčæē±12äøŖTransformerå±ļ¼å¤å¤“čŖę³Øęå樔ååäøäøŖå¤å±ęē„樔å(čæäøå°±ęÆCNNļ¼MLP)ļ¼ē»ęēē¹å¾ęååØļ¼ē¶åę¶é第4ļ¼8ļ¼12å±ēč¾åŗ$\boldsymbol{T}{c}^{(4)}, \boldsymbol{T}{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}$ļ¼
åę ·ēļ¼åØåŖå£°åęÆäøļ¼ä»„ēøåē樔åļ¼ä½ęÆäøå ±äŗ«ēęéćē“§ę„ēę„å°AHFM樔åļ¼čæč”两个åęÆēē¹å¾ēčåćē±äŗäø¤äøŖåęÆēē¹å¾ēøå·®č¾å¤§ļ¼ę仄åØę³Øęåęē„å±ę¬”ē¹å¾ęØ”åéé¢ļ¼ä½č ę建äŗäøäøŖä½ē½®ę³Øęå樔åļ¼position attention PAļ¼ęØ”åćå¦äøå¾ę示ļ¼åå«å°ä»ē¹å¾ęååØē¬¬4/8/12å±å¾å°ēē¹å¾å¾ļ¼é¦å ē»čæč½¬ē½®ē¶åreshapeęäøē»“åéļ¼ę„ēå°äø¤äøŖåęÆē转置åę¢åēē¹å¾ēøå ļ¼concatenateļ¼ä»„éé结度ļ¼ļ¼åē»čæå·ē§Æļ¼åꬔē»čæäøäøŖäøåå·ē§Æę øēå·ē§Æļ¼å¾å°äøäøŖę°ēē¹å¾å¾ļ¼åē»čæsoftmaxå¾å°ä½ē½®ę³Øęåęéļ¼ęåčæäøę„å¾å°čåēē¹å¾å¾ć仄åę ·ēę¹å¼å¾å°ē¬¬å «å±ļ¼12å±ļ¼äøäøŖčååēē¹å¾å¾ļ¼ē»čæéäøŖå ē“ ēēøå ļ¼3*3ēå·ē§Æļ¼ę¹ę ååļ¼ReLUęæę“»å¾encoderé¶ę®µęåēčåēē¹å¾å¾ć

ę„ēę„å°äŗč§£ē é¶ę®µļ¼ē“ę„å½åčÆä¹åå²ēä»»å”ę„åÆ¹å¾ ļ¼č®¾ē½®äø¤äøŖåÆå¦ä¹ ēē±»å«åµå „ļ¼ēå®ēļ¼ēÆ”ę¹ēļ¼ę„čæäøę„å¦ä¹ ēå®ēå篔ę¹ēē¹å¾č”Øē¤ŗļ¼čæäø¤äøŖē±»å«åµå „åčåē¹å¾ēååµå „äøčµ·č¾å „å°č§£ē åØē两个Transformerå±éé¢ļ¼ę„å¾å°é¢ęµę©ē ćäøŗäŗå¾å°čåē¹å¾ēååµå „ļ¼patch embeddingsļ¼é¦å ęÆreshapeļ¼transposeļ¼ē¶åēŗæę§ę å°ļ¼čæäŗåµå „åē±»å«åµå „äøčµ·č¾å „å°Transformerå±ļ¼ē»čæę£ååäøéę ·ēęä½å¾å°ęåēé¢ęµę©ē ć
å¦äøčæ°å ¬å¼ļ¼Z代蔨encoderēčåē¹å¾ēåµå „ļ¼čS代蔨类å«åµå „ļ¼ē»čæprojļ¼ēŗæę§ę å°å½ę°ļ¼ä»„åL2ļ¼ę£åļ¼ēåø¦ęåēéåå¼Yļ¼ē¶ååē»čæåƹYēäøéę ·ļ¼å¾å°M é¢ęµę©ē ć
å®éŖ
设置
é对splicingļ¼copy moveļ¼inpaintingļ¼removalļ¼é½å¶ä½äŗå¤§éēę°ę®éļ¼140432å¼ ēØę„č®ē»ļ¼7787ēØę„validationļ¼7787ēØę„ęµčÆćčåØęµčÆéäøļ¼ēØäŗ4äøŖå ¬å¼ēę°ę®éļ¼NISTļ¼CASIA v1.0ļ¼IMD20ļ¼ä»„åRealisticę„čæč”čÆä¼°ćčÆä»·ęę äøļ¼ä½æēØäŗF1-scoreļ¼IOUåAUCļ¼å½é¢ęµēmaskäŗå¼åę¶ļ¼éę©0.5ä½äøŗéØéå¼ćäøäŗå®éŖēē»čęÆļ¼ęęč¾å „å¾ēęÆ512x512ēļ¼ä¼ååØēØSGDļ¼å¤é”¹å¼č”°åå¦ä¹ ēēē„ęÆ$l r=l r_{0}\left(1-\text { iter }{\text {current }} / \text { iter }{\text {total }}\right)^{0.9}$ļ¼batchsizeęÆ8ļ¼č®ē»äŗ15äøŖepochć
ē»ę
å¦č”Øę ¼ę示ļ¼ä½č čæč”äŗåäøé¢å äøŖę¹ę³ēå®éŖļ¼é½č¾¾å°äŗęÆč¾å„½ēē»ęć

ę¶čē ē©¶äøļ¼åęäŗåęÆēå½±åļ¼čå樔åēå½±åć

ęč
ē»ęęÆęŗå„½ēļ¼ä½ęÆåÆč½å½ååØēÆ”ę¹ę£ęµéé¢ļ¼čæäøŖę”ę¶ļ¼RGBåå frequencyęč ēØnoise mapēę¦åæµļ¼å·²ē»ē©äŗęÆč¾å¤äŗļ¼ę仄ęē« ä¹č±ęÆč¾å°ēåę°č®²äøŗä»ä¹č¦čæä¹ē»åļ¼čęÆčÆ“ęčæę¬”ēē»ååå ¶ä»äŗŗäøåļ¼åØäŗęåŖēØēØäŗTransformerļ¼čæäøŖēē±å„½å诓å¾čæå»ļ¼å儽åäøé£ä¹å¼ŗļ¼čæéē¹ęå°äŗčæäøŖäø¤ē§ē¹å¾ē»åē樔åć
é£ęēé®é¢åÆč½ęÆļ¼noise map对copy moveå ¶å®ęÆę²”ęēØēļ¼ä»„å诓ę“äøŖēfrequencyļ¼čēØnoise map仿ä½ååŖččå°äŗé«é¢ē¹å¾ļ¼ä¹ęäŗę¹ę³ęÆē»åé«é¢ē¹å¾åä½é¢ē¹å¾ļ¼čæäŗććēå¦ļ¼
ē¶åęÆļ¼ēęäŗåå äøēč®ē»éļ¼ä»æä½ēØę„čÆä¼°ēę°ę®éęÆč¾å°ļ¼čäøå¤§éØåēØę„čÆä¼°ēę°ę®é仿ä½é½åÆä»„å¾č½»ę就被大éØåēę¹ę³č¾¾å°90%+ļ¼ę仄åØēęč®ē»éēę¶åļ¼å»ēęčÆä¼°ēØēę°ę®éļ¼ä¼ęéēåļ¼
GitHub
Last updated
Was this helpful?