~$ Enumerating the Lightshot CDN
Tags:Information SecurityDevelopmentPrivacy
The Premise
Lightshot is a screen capture service provided for free on the Internet. It features a few functionalities including but not limited to screenshot editing, copying it to the copy-paste buffer and finally uploading the captured images to LightShot's CDN.
That last part is the one of interest for us, as it was mentioned by commandergirl (see below) that LightShot's CDN (Content Delivery Network) was quite easy to enumerate.
As a fun project I wanted to see how easy it was to enumerate all of the images uploaded to LightShot.
Spoiler alert, it is really easy.
Kowalski, Analysis!
The CDN is accessible via a URL that looks like hxxps[:]//prnt[.]sc/x
with x
being an alphanumerical string within the [0-9a-z]
.
I upload two test images - with approximately 25 seconds between uploads - to LightShot's CDN, which are:
- 1qvc84l
- 1qvc8ny
We can start by seeing that both codes seem sequential in nature, and assuming we are in the [0-9a-z]
space, show that a lot of images are constantly being uploaded to the CDN. The difference between 1qvc84l
and 1qvc8ny
is:
- 'l' to 'z' = 14 characters
- '50' to 'mz' = 17*36 = 612 characters
- '0' to 'y' = 35 characters
We can thus see there are approximately 659 URL segments between both of my uploads (uploads excluded), which comes down to approximately 26.5 images per second. That's a lot of images (in my best Phil Swift tone).
But how many images could that represent? Well using bad math, I would have told you "at least 27M images", which would be completely wrong. Someone who has been doing math for a living mentioned that just for the 7 character segment - considering that the segment cannot start with a 0 (from experimentation) - we need 35 * 36^6 ~= 76.12e9
or about 76.12 Billion possibilities. Summing up for character segments of size 1 to 7, we thus get 35 * 36^6 + 35 * 36^5 + 35 * 36^4 + 35 * 36^3 + 35 * 36^2 + 35 * 36 + 36 ~= 78.36e9
or about 78.36 Billion possibilities. That is a lot. Thankfully we seem to be at 1qxxxxx
which is somewhere at the beginning of the 7 character space (relatively to 78.36 Billion possibilities).
For each of these URL segments, my initial impression of parsing that URL is that the page takes a few seconds to load, loading the image asynchronously, thus requiring a more roundabout automated collection method. The image is available one of two ways, either in the image source, or as a data blob:
To automate this, I turned to my favorite language: Python. Also in my quiver were the Selenium WebDriver library, BeautifulSoup4 and Multiprocess.
By iterating over the namespace, and opening the page via Selenium in a headless Chrome then grabbing the page and jamming it into BS4, we can then find the image and it's blob, and then convert the blob, then saving it to the relevant file format.
You can check that first functioning attempt at here.
The quicker "I hadn't noticed that" approach
Meta tags. Apparently the LightShot page uses the <meta property="og:image" content="x">
tag, which contains a link to an image... which can be scraped using the urllib
library. Thanks Night for pointing that out.
By refactoring my code to include this modification, it became much easier and faster and less expensive to use, due to Selenium and BS4 being made redundant. You can find it here.
I had not removed parallel processing on my first try, and this method is "so" efficient, that it tripped CloudFlare's DOS mitigation (or, as CloudFlare claims, the owner banned my IP specifically, see below). Oops, I apologize. However, CloudFlare is protecting a service used to share non-consensual pornography, scams and PII (regardless of own ToS) which would make a GDPR-specialized risk management firm sweat. I "fixed" it to continue my "research" and everything was nice and functional.
PII farming
I did just mention scams, NCP and PII, so here are some (heavily redacted) examples!
It's a literal goldmine for people that wish to be malicious.
To the question about why I am not going through vulnerability disclosure processes? Because the idea of scraping LightShot has been going on for a while (see below) and no one at LightShot seems to care.
In conclusion, this is a case of *le big sigh*. Hope you've enjoyed, see you around.