We introduce Noise-Coded Illumination (NCI), a scene-level intervention for helping detect spatiotemporal manipulations of video by coding the light in an environment with noise.
The Problem
The proliferation of advanced tools for manipulating video has led to an arms race, pitting those who wish to sow disinformation against those who
want to detect and expose it. Unfortunately, time favors the ill-intentioned in this race, with fake videos growing increasingly difficult to distinguish from real ones.
At the root of this trend is a fundamental advantage held by those manipulating media: equal access to a distribution of what we consider authentic (i.e., "natural") video.
Our Contribution
We show how coding very subtle, noise-like modulations into the illumination of a scene can help combat this advantage by creating an information asymmetry that
favors verification. Our approach effectively adds a temporal watermark to any video recorded under coded illumination. However, rather than encoding a specific message,
this watermark encodes an image of the unmanipulated scene as it would appear lit only by the coded illumination. We show that even when an adversary knows that our technique
is being used, creating plausible coded fake video amounts to solving a second, more difficult version of the original adversarial content creation problem at an information disadvantage.
Target Application
Our work targets high-stakes settings like public events and interviews, where the content on display is a likely target for manipulation, and while the
illumination can be controlled, the cameras capturing video cannot.
Method
Coding the Illumination in the Scene
We experimented with two types of light sources, computer monitors for applications such as video conferencing and stage lights for larger-scale events.
As with many lights, the stage light we purchased adopts the
ANSI 0-10V dimming standard, which we can use for injecting our time-varying illumination code.
However, we found that the dimming system has a low pass filter with a sub-1Hz cutoff, which prevented us from directly injecting our 12Hz code signal.
We fix this by adjusting the filter components and bypassing its internal pulse-width-modulation (PWM) section, a simple modification for manufacturers to incorporate that gives us a bandwidth of
over 100Hz. We use an ESP-32 microcontroller running our compiled C code to modulate the light with our code signal, with interactive control of the average brightness and signal amplitude.
Detecting Temporal Manipulations
We can use local evidence of the code signal in a video to create alignment matrices that map each point in time in a video to a point in our code signal.
Discontinuities in the alignment matrix indicate that a video has been temporally tampered.
Most interviews use a tri-camera setup, like the one shown above, which are especially vulnerable to temporal tampering, since an adversary can easily cut between cameras to
change the meaning of what was said, as shown below (view the original and edited videos and alignment matrices here).
In the original dialogue of our scene, the
interviewee expresses concern about fake video in political campaigns (mid left). The maliciously edited version splices in footage from an earlier response
given during a sound check that makes it look like the interviewee supports and encourages the use of fake video for spreading disinformation (mid right).
Our recovered alignment matrix displays the original timing of each clip, which shows that the answers in the manipulated video came from footage recorded
before the corresponding questions, indicating that they were maliciously taken out of context.
Detecting Spatial Manipulations
By assigning N uncorrelated code signals to light sources in a scene, we can perform light source seperation into N components,
each lit by the light sources corresponding to that code, as shown above. Typically, lighting is one of the hardest things to
fake in a video. With our approach, an adversary needs to somehow recover the illumination codes, which our supplemental analysis suggests is difficult for general types of manipulation, and
fake the lighting in N+1 images instead of just 1. We show an example below on analyzing a spatially tampered video from an uninformed adversary, meaning one
that doesn't know our technique is being used.
In code images from the fake video (bottom) we see several signs of manipulation. The added content does not
appear to reflect coded light. Code images also make it easier to see shadows that are masked by other light sources in the original frame. The shadow that
the person’s head casts from the LED lamp is barely visible in the original video due to an uncoded ceiling light. However, in code image 1, all other sources
are removed, leaving this part of the shadow clearly visible. The similar code shadows for added content are missing from manipulated code images. We
recommend zooming in to better view these results.
Check out our experiment repository
and supplemental material for results on diverse scenes such as those recorded outdoors and with dark-skinned subjects,
as well as analysis of human flicker senstivity from coding light, adversarial attacks from informed adversaries, and the effects of real-world phenonema such as video compression.
Acknowledgments
We thank all those who volunteered to be subjects in our test scenes.
This work was supported in part by an NDSEG fellowship to P.M., and the Pioneer Centre for AI, DNRF
grant number P1.