10 Seconds of Watching: Inside an RTSP Camera Hijacking

Every other protocol we monitor involves machines attacking machines. SSH bots brute-force passwords. Redis bots inject cron jobs. Docker exploits mount filesystems. The attacker wants compute, persistence, or a foothold. The target is a server. The intent is technical.
RTSP is different.
When someone connects to an unprotected camera on port 554, they are not looking for a server. They are looking for a window. A window into someone's living room. Their hallway. Their front door. And if that camera has no password, the window is open.
Our honeypot sensor network runs fake IP cameras on port 554 alongside 15 other protocol traps. What connects to them speaks the RTSP protocol fluently, sets up video streams, requests audio, and watches. The feed it receives is synthetic. There is no camera. There is no room. There is no one on the other side.
But the person connecting doesn't know that.
At 22:47 local time on a Monday night in Sofia, someone connected to one of our cameras. This is what happened.
The House That Doesn't Exist
Our RTSP honeypot emulates a consumer IP camera. It speaks the Real Time Streaming Protocol, responds to OPTIONS, DESCRIBE, SETUP, and PLAY commands with standards-compliant responses. It advertises an H.264 video stream at 2688x1520 resolution (4 megapixels, a common format for modern Hikvision and Dahua cameras) and an audio track. It provides a session ID, negotiates transport, and begins "streaming" frames.
To any RTSP client, this looks like a working camera. The SDP description matches real camera firmware. The URL structure uses Hikvision-format parameters. The stream resolution is plausible. The frame rate is consistent.
Behind the camera is a hallway that doesn't exist, in a house that was never built, on a street that has no name. Our sensor sits in the wall where the camera would be and waits for someone to look through it.
Port 554 is open. There is no password.
We don't have to wait long.
22:47:17 — Someone Tries the Handle
A connection opens from 146.70.245.131. Sofia, Bulgaria. The client identifies itself as Lavf58.76.100. That's libavformat, part of the FFmpeg multimedia framework. This is not someone manually typing an RTSP URL into VLC out of curiosity. This is automated tooling. A scanner. Something designed to move from camera to camera, efficiently, quietly, at scale.
The first request:
OPTIONS rtsp://[...]:554/?chID=1&streamType=main&linkType=tcp RTSP/1.0
CSeq: 1
User-Agent: Lavf58.76.100OPTIONS asks the server what methods it supports. "What can you do?" Our sensor responds with the standard list: OPTIONS, DESCRIBE, SETUP, PLAY, PAUSE, TEARDOWN. The full vocabulary of a working camera.
This is the equivalent of trying a door handle. Not pushing the door open. Just resting a hand on the handle and feeling whether it turns.
It turns.
22:47:17.200 — Looking Through the Keyhole
Two hundred milliseconds later. The second request:
DESCRIBE rtsp://[...]:554/?chID=1&streamType=main&linkType=tcp RTSP/1.0
Accept: application/sdp
CSeq: 2
User-Agent: Lavf58.76.100DESCRIBE asks for the Session Description Protocol response. This is the camera's spec sheet: what codecs it uses, what resolution it streams at, how many tracks it has, what media types are available.
Our sensor responds. Two tracks. Track 1: H.264 video, 2688x1520, 20 frames per second. Track 2: audio.
The URL parameters are worth noting. chID=1 is channel 1. streamType=main is the primary stream, the full-resolution feed, not the low-quality sub-stream that most cameras offer for bandwidth-constrained connections. linkType=tcp requests a reliable TCP transport rather than UDP.
They don't want the grainy thumbnail. They want the sharp picture. The 4-megapixel picture. Clear enough to read a name on an envelope. Clear enough to see a face.
And they want it reliably. No dropped frames. No stuttering. TCP.
The door handle turned. Now they're pressing their face against the glass and looking in.
22:47:18 — Opening the Window
Two more requests arrive in rapid succession. Two hundred milliseconds apart. Mechanical precision.
SETUP rtsp://[...]/trackID=1 RTSP/1.0
Transport: RTP/AVP/TCP;unicast;interleaved=0-1
CSeq: 3SETUP rtsp://[...]/trackID=2 RTSP/1.0
Transport: RTP/AVP/TCP;unicast;interleaved=2-3
CSeq: 4
Session: 9A9ED1FA1E2C49E7Two SETUP commands. One for each track.
Track 1 is video. That's expected. Someone scanning cameras wants to see.
Track 2 is audio.
They also want to listen.
The SETUP command negotiates the transport for each media track, telling the server how to deliver the data. RTP/AVP/TCP;unicast;interleaved=0-1 means "send the video data interleaved on the same TCP connection, channels 0 and 1." The audio goes on channels 2 and 3. Clean, efficient, leaving no separate ports open that might attract attention.
Our sensor assigns session ID 9A9ED1FA1E2C49E7 and accepts both tracks.
The window is now open. Video and audio. Eyes and ears.
22:47:18.700 — Watching
PLAY rtsp://[...]:554/?chID=1&streamType=main&linkType=tcp RTSP/1.0
Range: npt=0.000-
CSeq: 5
Session: 9A9ED1FA1E2C49E7PLAY. Range: npt=0.000- means "start from the beginning, play indefinitely." Don't stop until I say stop.
Our sensor begins "streaming." 2688x1520 H.264 video at 20 frames per second. 189 frames over 9.5 seconds. The audio track runs alongside it.
On their end, in Sofia, at nearly 11 PM on a Monday night, a window opens on a screen. What they expect to see is a room. A hallway. A parking lot. A storefront. A nursery. Whatever is on the other side of whatever camera they just found.
What they actually see is nothing. Synthetic frames from a sensor that emulates a camera but has no lens, no housing, no wall to mount on, no room to look into. The feed is technically valid H.264. It describes a camera that doesn't exist, mounted in a house that was never built.
They watch for 10 seconds.
10 seconds is enough. Enough to confirm the feed is live. Enough to evaluate what the camera is pointed at. Enough to decide whether this window is worth coming back to, or whether to move on to the next one.
22:47:28 — Stepping Back Into the Dark
TEARDOWN rtsp://[...]:554/?chID=1&streamType=main&linkType=tcp RTSP/1.0
CSeq: 6
Session: 9A9ED1FA1E2C49E7TEARDOWN. The session ends. The connection closes. Total time: 10.354 seconds. Six RTSP requests. One session. Both video and audio. Full resolution.
The scanner moves on. The next camera. The next window. The next house.
In the silence that follows, our sensor logs the session and waits for the next connection. It doesn't have to wait long.
The One Who Stayed
Two hours before the Sofia session, a different sensor captured a different kind of visitor.
94.228.209.182. Amsterdam, Netherlands. Lavf59.27.100, a newer FFmpeg build. Same protocol. Same handshake. Same lack of authentication. But a different animal entirely.
The Sofia scanner was a cataloger. Quick, mechanical, efficient. Ten seconds, clean TEARDOWN, next camera. It was building a list. Shopping.
The Amsterdam session lasted 45 seconds. 884 frames. And it never said goodbye.
OPTIONS rtsp://[...]:554/11 RTSP/1.0
CSeq: 1
User-Agent: Lavf59.27.100The URL is different. No chID=1&streamType=main&linkType=tcp. Just /11. A generic channel path. Where the Sofia scanner carried a Hikvision-specific key ring, this one carries a skeleton key. It doesn't care what brand the camera is. It tries every door on the corridor.
The handshake completes in 102 milliseconds. OPTIONS, DESCRIBE, SETUP, SETUP, PLAY. Both video and audio, same as Sofia. Full resolution. The stream begins.
Then something interesting happens in the SETUP requests.
SETUP rtsp://10.200.0.2:554/11/trackID=1 RTSP/1.0
Transport: RTP/AVP/TCP;unicast;interleaved=0-1
CSeq: 3The OPTIONS and DESCRIBE targeted the sensor's public IP. But SETUP and PLAY switched to 10.200.0.2. A private RFC 1918 address. The scanner's RTSP client parsed the SDP response from DESCRIBE and rewrote the URL using the address it found there, revealing its own internal network topology.
10.200.0.2 tells us this scanner is behind a NAT or a VPN. It's trying to stay hidden. But the RTSP protocol betrayed it. In the process of setting up the stream, it leaked the internal IP address of its own network. A face pressed against the glass, reflected back.
The stream plays. Ten seconds pass. Twenty seconds. Thirty.
At 30 seconds, instead of disconnecting, the scanner sends a command the Sofia session never used:
GET_PARAMETER rtsp://10.200.0.2:554/11 RTSP/1.0
CSeq: 6
Session: DBD5B09271BC4055GET_PARAMETER is an RTSP keepalive. It has no payload. It exists for one purpose: to tell the server "I'm still here. Don't close the connection. I'm still watching."
The Sofia scanner tasted. This one is feeding.
Fifteen more seconds of streaming. Then the connection drops. No TEARDOWN. No polite goodbye. The session simply ends at the 45-second mark, either because the scanner moved on without bothering to close the connection, or because something else pulled its attention away.
884 frames of 4-megapixel video. Both audio and video tracks. A keepalive to extend the session. And a leaked internal IP address left behind like a fingerprint on a windowsill.
Two sessions. Same day. Same protocol. Same fake cameras.
The scout checks the window and moves on. The viewer pulls up a chair.
The Two Sessions, Stripped Bare
Session 1: The Scout (Sofia)
| Time (UTC) | Method | Purpose |
|---|---|---|
| 20:47:17.881 | OPTIONS | "What can this camera do?" |
| 20:47:18.085 | DESCRIBE | "Show me the stream spec" |
| 20:47:18.292 | SETUP (track 1) | Set up video (H.264, 2688x1520, 20fps) |
| 20:47:18.505 | SETUP (track 2) | Set up audio |
| 20:47:18.729 | PLAY | Start streaming |
| 20:47:28.233 | TEARDOWN | Clean disconnect |
848 milliseconds from first contact to streaming. 9.5 seconds of watching. Clean goodbye. Gone.
Session 2: The Viewer (Amsterdam)
| Time (UTC) | Method | Purpose |
|---|---|---|
| 18:46:47.237 | OPTIONS | "What can this camera do?" |
| 18:46:47.262 | DESCRIBE | "Show me the stream spec" |
| 18:46:47.288 | SETUP (track 1) | Set up video (H.264, 2688x1520, 20fps) |
| 18:46:47.314 | SETUP (track 2) | Set up audio |
| 18:46:47.339 | PLAY | Start streaming |
| 18:47:17.363 | GET_PARAMETER | Keepalive: "I'm still watching" |
| — | No TEARDOWN | Connection dropped at 45s |
102 milliseconds from first contact to streaming. 45 seconds of watching. No goodbye. Internal IP leaked in SETUP/PLAY requests.
What the URL Tells Us
rtsp://[...]:554/?chID=1&streamType=main&linkType=tcpThis URL format is native to Hikvision and Dahua IP camera firmware, the two largest manufacturers of IP cameras worldwide. chID=1 selects camera channel 1 (for multi-channel NVRs, this would be the first camera). streamType=main requests the primary high-resolution stream. linkType=tcp forces a TCP connection.
The scanner is specifically targeting cameras running Hikvision or Dahua firmware. It knows the URL format. It knows which stream to request. This is not generic RTSP probing. This is purpose-built camera reconnaissance.
What the User-Agent Tells Us
Lavf58.76.100 is libavformat version 58.76.100, part of the FFmpeg project. FFmpeg is the industry-standard multimedia framework. Legitimate uses include video transcoding, media analysis, and stream recording.
Illegitimate uses include connecting to unprotected cameras at 11 PM and watching.
The libavformat version maps to a specific FFmpeg release. This build is not ancient and not bleeding-edge. A stable, maintained toolkit. Someone who keeps their tools updated.
What the Audio Track Tells Us
Most automated camera scanners only set up the video track. Video confirms whether the camera is live and what it's pointed at. That's sufficient for building a catalog of accessible cameras.
This scanner set up both video and audio. It wanted to hear what was happening in the room.
That's a different intent. A scanner that only checks video is mapping. A scanner that checks audio is surveilling. The difference between casing a house from the street and pressing a glass against the wall.
Why This Matters
There are over 1 billion surveillance cameras deployed worldwide, a number that has been growing at roughly 15% per year. A significant fraction of these are consumer and small-business IP cameras connected directly to the internet with default or no credentials.
Shodan, the search engine for internet-connected devices, consistently indexes hundreds of thousands of RTSP streams accessible without authentication. These aren't hidden. They're findable by anyone with a search query.
What our honeypot captures is the demand side of that equation. Someone, using purpose-built tooling, systematically connecting to cameras and watching. Not exploiting a vulnerability. Not deploying malware. Just... looking. Through windows that were left open.
The technical barrier is zero. RTSP is a standard protocol. FFmpeg is free, open-source software. The cameras are indexed. The default passwords are public. Our username database shows the credentials attackers try most often against RTSP: admin, root, 888888, 666666. Hikvision's old default password was 12345. Dahua's was admin.
This particular session didn't even attempt authentication. It connected and immediately started streaming. Either the scanner already knew this camera had no password, or it's specifically targeting cameras running in unauthenticated mode. On many consumer cameras, RTSP authentication is disabled by default.
What RTSP Scanners Do With Access
Camera access at scale has a supply chain:
Cataloging. Scanners like the one in this capture connect, check the feed for 10 seconds, and categorize what they see. Indoor residential. Outdoor commercial. Cash register. Parking lot. Nursery. The catalog is the product.
Aggregation. Catalogs are compiled into collections and sold on forums and Telegram channels. Buyers browse by category, location, and camera type. Some collections contain thousands of feeds organized by country.
Live viewing platforms. Websites aggregate thousands of unprotected camera feeds and make them browsable. Some have existed for over a decade. They are not hidden. They index cameras by country, city, and sometimes street.
Recording. Longer sessions record footage for later use. Blackmail. Social engineering. Stalking. The recording itself becomes leverage.
Credential testing. Some scanners use camera access as a pivot. Default credentials on cameras are often reused on other devices on the same network. A camera with the password admin is on a network where the router might also have the password admin.
IOCs
Session 1: The Scout
| Field | Value |
|---|---|
| Source IP | 146.70.245.131 |
| Source port | 44979 |
| Destination port | 554 |
| Country | Bulgaria (Sofia) |
| User-Agent | Lavf58.76.100 (FFmpeg/libavformat) |
| Session duration | 10.354 seconds |
| Frames | 189 |
| Authentication | None attempted |
| Stream path | /?chID=1&streamType=main&linkType=tcp (Hikvision) |
| Audio | Yes |
| Disconnect | Clean TEARDOWN |
Session 2: The Viewer
| Field | Value |
|---|---|
| Source IP | 94.228.209.182 |
| Source port | 54352 |
| Destination port | 554 |
| Country | Netherlands (Amsterdam) |
| User-Agent | Lavf59.27.100 (FFmpeg/libavformat) |
| Session duration | 44.932 seconds |
| Frames | 884 |
| Authentication | None attempted |
| Stream path | /11 (generic) |
| Audio | Yes |
| Disconnect | No TEARDOWN (connection dropped) |
| Leaked internal IP | 10.200.0.2 (visible in SETUP/PLAY requests) |
Camera URL Patterns
Hikvision/Dahua format (Session 1):
rtsp://{target}:554/?chID=1&streamType=main&linkType=tcpGeneric channel format (Session 2):
rtsp://{target}:554/11If your camera logs show DESCRIBE or SETUP requests from unknown IPs using either URL pattern, someone is attempting to view your feed.
Securing Your Cameras
IP camera security is consistently one of the worst categories in IoT. The defaults are bad, the firmware update cycle is slow, and most owners never change the settings after initial setup. Here's what actually prevents this:
- Set a strong RTSP password. Many cameras ship with RTSP authentication disabled. Enable it. Use a unique password. Not
admin. Not12345. Not888888. The scanner in this capture didn't even try a password. A password of any kind would have stopped this session cold.
- Disable unauthenticated RTSP access. Some cameras allow both authenticated and anonymous streams on different paths. Check your camera's RTSP settings and disable any anonymous stream paths.
- Don't expose port 554 to the internet. If you need remote access to your cameras, use a VPN, wireguard or TailScale. If your camera is accessible from the public internet on port 554, it is being scanned. Not "might be." Is.
- Change default credentials. Hikvision's legacy default was
admin/12345. Dahua's wasadmin/admin. Both manufacturers have improved, but millions of older cameras are still running with factory passwords. The username database shows these credentials are attempted thousands of times daily across our sensor network.
- Update firmware. Camera manufacturers periodically patch RTSP-related vulnerabilities. Hikvision CVE-2021-36260 allowed unauthenticated command injection via crafted RTSP requests. If your camera is running firmware from 2020, it is vulnerable to things that were patched years ago.
- Monitor for unauthorized RTSP connections. If your camera supports logging, check for connections from unknown IPs. An OPTIONS followed by DESCRIBE followed by SETUP from a foreign IP at 11 PM is not your phone checking the baby monitor.
- Block known scanners. SikkerGuard pulls our threat blacklist and blocks known malicious IPs at the firewall level. Both IPs from this post are tracked in our blacklist feed. Blocking them at the network edge prevents the RTSP handshake from ever completing.
The Window Was Never Real
All data in this post was captured by our production honeypot sensor network. There is no camera. There is no lens. There is no hallway, no room, no house. Our RTSP sensor is a protocol emulator that speaks fluent RTSP, advertises plausible camera specifications, negotiates transport, and "streams" synthetic frames. The person on the other end of this session connected to what they believed was a real camera and watched what they believed was a real feed.
They were looking through a window that opens onto nothing.
But the next camera they connect to might be real. And the person on the other side of that camera, asleep in their house at 11 PM on a Monday night, will never know someone was watching.
That's the part of RTSP that keeps the logs cold.
Every IP referenced in this post is tracked in our threat database. Look up any IP at https://sikkerapi.com, or query the check endpoint for structured threat data. For automated protection, install SikkerGuard or pull our scored blacklists directly into your firewall.
Browse the full threat landscape to see what our sensors are capturing across all 16 protocols, including RTSP activity.
Comments
No comments yet. Be the first to share your thoughts!