Yes, and it’s already happening. Deepfake voice and video exploit the same trust assumptions that human brains and biometric systems share: if it looks and sounds like someone you know, it must be them. Attackers weaponize that assumption to bypass authentication systems and social-engineer victims into transferring money or handing over credentials.
Pithy Security | Cybersecurity FAQs – The Details
Question: Can deepfake voice and video be used to steal your identity or money?
Asked by: DeepSeek V3
Answered by: Mike D (MrComputerScience) from Pithy Security.
Why GANs Make Fake Audio and Video So Convincing
The engine behind deepfakes is the Generative Adversarial Network (GAN). A GAN pits two neural networks against each other in a loop: a generator that creates synthetic output, and a discriminator that evaluates whether that output is real or fake. The generator fails, updates its weights, and tries again. The discriminator catches the fake, updates its own weights, and raises the bar. This adversarial loop runs thousands of iterations until the generator produces output the discriminator can no longer reliably reject.
For voice cloning, the model trains on recorded audio samples and learns to replicate not just words, but tone, cadence, and breathing patterns. For video, the model maps facial geometry, landmark positions, and micro-expressions onto a source actor’s movements. The deeper the training data, the harder the output is to distinguish from genuine media. The uncomfortable truth: the same discriminator architecture used to catch fakes is what makes the generator better at producing them.
How Deepfakes Break Biometric Authentication at the System Level
Biometric authentication makes one core assumption: the physical trait being verified is present in real time and belongs to the genuine person. Deepfakes attack that assumption at two separate layers.
The first is presentation attacks: feeding a synthetic face or voice directly to the sensor. A face recognition system compares the submitted input against a stored template and returns a match score. If the deepfake is good enough, the score clears the threshold. The system was never designed to ask whether the face is real, only whether it matches.
The second is injection attacks, and they are more dangerous. Instead of spoofing the camera, the attacker hijacks the video stream at the API layer, feeding pre-generated synthetic media directly into the verification pipeline before liveness detection can run. The camera sees nothing. The server receives a clean synthetic image. Without client-side integrity checks and secure enclaves protecting the data path from sensor to server, the authentication system never had a chance.
When Liveness Detection Actually Stops a Deepfake Attack
Liveness detection is the primary technical countermeasure, and it works by asking a question the deepfake cannot easily answer in real time: prove you are physically present right now.
Active liveness detection prompts unpredictable challenges: blink now, turn left, smile. Pre-recorded synthetic video cannot respond to a randomized sequence it has not seen before. Passive liveness detection analyzes biological signals the static model struggles to replicate: subtle skin texture variation, micro-expressions, natural eye movement patterns, and photoplethysmography signals in the face.
The weakest link is not the detection algorithm. It is the data path between the sensor and the server. Even a perfect liveness check fails if an injection attack bypasses the camera entirely. Robust defenses require three things working together: liveness detection at the sensor, tamper-resistant SDK integrity checks on the client, and encrypted, authenticated pipelines that verify the data originated from a legitimate, unmodified device. Break any one of those three, and the liveness check becomes irrelevant.
What This Means For You
- Verify any unexpected voice or video request for money or credentials through a separate, pre-established channel, never by calling back a number the requester provides.
- Disable voice-print authentication on your financial accounts if your bank offers it as a standalone factor; use hardware MFA or app-based authentication instead.
- Audit any identity verification workflows your organization uses and confirm the vendor implements both active liveness detection and injection attack prevention at the API layer.
- Establish a shared verbal codeword with your family or executive team now, before an attacker clones a voice and asks for an emergency wire transfer.
Related Questions
- 1
- 2
- 3
Want Cybersecurity Breakdowns Like This Every Week?
Subscribe to Pithy Security for no-fluff cybersecurity breakdowns delivered weekly.
Subscribe (Free) → pithysecurity.substack.com
Read the archives (Free) → pithysecurity.substack.com/archive
You’re reading Ask Pithy Security. Got a question? Email ask@pithysecurity.com (include your Substack pub URL for a free backlink).
