Reverse Engineering Solscan $200/mo API
This is a very quick explanation (and demonstration) of how poorly the Solscan website is designed and how it was fairly trivial to reverse engineer their public website to access the same data API they sell for $200/mo to their users.
GitHub code at the end.
For those who don’t know, this is what solscan.io looks like:
It’s a very powerful website that lets you explore all transactions and accounts on the Solana Blockchain. Indeed, very useful, especially if you want to create powerful actions that can automate on-chain actions like trading bots, etc.
Of course, they generate their revenue by selling a data API that lets you access all of this data.
The only problem is the pricing is quite high, as even the cheapest offer starts from $200 per month.
Pretty expensive if you ask me.
So I started wondering if there was a way to scrape the data directly from their website, but I thought:
“I bet they are protecting the site from scraping. If they value the API so much, I’m sure they made it impossible to scrape any kind of data.”
I could not be more wrong.
So I opened the dev tools and started looking at the Network requests as the page was loading, and immediately noticed a bunch of XMLHttpRequest
which were loading all the juicy transaction data on the page.
See, at this point, I immediately thought that something was wrong. Most of the time, this kind of analytics website (which often sells access to their data through paid API subscriptions) makes sure to protect themselves very well from scraping, for obvious reasons.
To do this, they usually use server-side-rendering techniques to make sure every sensible data gets rendered on the server and the client (you and your browser) only receive the already rendered static HTML with all the data.
This way, they avoid exposing any internal API, avoiding delegating the client’s browser to do so.
Of course, this was not the case for Solscan.
As you can see, these endpoints are exactly loading all the transaction data, and take a look at the official documentation of their API:
Oh… yeah, it’s the same.
Now, looking at the request headers, there is one specific header which is obviously responsible for authentication. It’s called sol-aut
and it seems to be a random string of characters. What I could notice is:
- Every request has a different
sol-aut
token. - If you try to use the same
sol-aut
for another request, it gets rejected.
This means that for every request, a new token is generated. Let’s take a look at the JavaScript code which generates the request.
We can see that the request is initiated by _app-0387a288f339cc14.js
. Let’s open it and try to search for the string sol_aut.
“It will be obfuscated for sure, there is no way they left unobfuscated strings inside it.”
Oh… never mind.
They not only left the string clearly readable in the code, but they did not even try to hide how the string is being generated.
As you can see, the sol-aut
header is generated by a function called generateRandomString()
.
generateRandomString()
{
let e = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789==--",
t = Array(16).join().split(",").map(function() {
return e.charAt(Math.floor(Math.random() * e.length))
}).join(""),
r = Array(16).join().split(",").map(function() {
return e.charAt(Math.floor(Math.random() * e.length))
}).join(""),
n = Math.floor(31 * Math.random()),
i = "".concat(t).concat(r),
o = [i.slice(0, n), "B9dls0fK", i.slice(n)].join("");
return o
}
The function is super simple:
- Generate a 40-character length string.
- Use
let e = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789==--
as the character set. - Insert at a random position the substring
"B9dls0fK"
.
After acknowledging how to create the sol-aut
token, I created a simple Python script to test it:
import random
def generate_solauth_token() -> str:
"""
Generate a valid sol-aut token used to authenticate requests to the Solscan API.
"""
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789==--"
t = "".join(random.choice(chars) for _ in range(16))
r = "".join(random.choice(chars) for _ in range(16))
n = random.randint(0, 31)
i = t + r
return i[:n] + "B9dls0fK" + i[n:]
def send_api_request(url, headers=None, url_params=None) -> dict:
"""
Send a request to the Solscan API and return the response.
"""
base_url = "https://api-v2.solscan.io/v2"
default_headers = {
"Accept": "application/json, text/plain, */*",
"sol-aut": generate_solauth_token(),
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Referer": "https://solscan.io/",
"Origin": "https://solscan.io",
"Connection": "keep-alive",
}
if headers:
default_headers.update(headers)
response = requests.get(base_url + url, headers=default_headers, params=url_params)
result = response.json()
# check if "data" key is in the response
if result.get("data", None) == None:
raise Exception(f"Failed to get data from {url}")
return result["data"]
result = send_api_request(
"account/transfer",
url_params={
"address": "oQPnhXAbLbMuKHESaGrbXT17CyvWCpLyERSJA9HCYd7",
"page": 1,
"page_size:": 10}
)
print(result)
{
"success": true,
"data": [
{
"block_id": 324724027,
"trans_id": "7bUmBXDUf3uUurQKaNvLqCft2dMTjW5rsYWDMZqY3sYQLQ6oVn1beK95zUyq9KZ4mZeqCS3BEJ53BFvwLX7cNy8",
"block_time": 1741170411,
"activity_type": "ACTIVITY_SPL_TRANSFER",
"from_address": "oQPnhXAbLbMuKHESaGrbXT17CyvWCpLyERSJA9HCYd7",
"from_token_account": "oQPnhXAbLbMuKHESaGrbXT17CyvWCpLyERSJA9HCYd7",
"to_address": "Cw8CFyM9FkoMi7K7Crf6HNQqf4uEMzpKw6QNghXLvLkY",
"to_token_account": "Cw8CFyM9FkoMi7K7Crf6HNQqf4uEMzpKw6QNghXLvLkY",
"token_address": "So11111111111111111111111111111111111111111",
"token_decimals": 9,
"amount": 30000084,
"flow": "out",
"value": 4.380012264
},
...
]
}
Easy.
Of course, this is not a legitimate way to use their data and probably violates most of their TOS, and won’t probably last forever (if they ever notice it/bother to fix it), but this shows you how you can engineer scraping solutions by reverse engineering websites.
Web scraping is 80% reverse engineering. Open the network tab and spend some time really dissecting and reconstructing how the code works, and sometimes, like in this case, you will be able to pull off amazing results.
Now, my main concern with this is that they sell a service for $200/mo which they seem not to even care to protect properly. This is not good for a professional company like Solscan, which surely has built very great software.
Also, they do not even have some very basic security measures on this API endpoint like:
- TLS fingerprint rejection
- Rate limiting
I would suggest they fix these issues, even if it is against my own interests.
If you want to support me and you want to experiment with this, I’ve created a GitHub Repository from where you can directly install the code as a Python package to use it.
Leave a star if you appreciate the work and help me grow on my X profile.