Guide
What is a hidden API, and why it is the most reliable way to get data
When a page loads prices, listings, or search results, that data usually does not live in the HTML — it arrives from an internal JSON endpoint the page calls in the background. We call that a hidden API. For public, factual data, reading it is almost always more reliable than parsing the rendered page.
What "hidden" actually means
Nothing nefarious. A modern web page is a thin shell that, once loaded, calls one or more of its own backend endpoints to fetch the real content as structured JSON. These endpoints are "hidden" only in the sense that they are not documented for outside use and not shown in the address bar — but your own browser uses them every single time you open the page. They are part of how the public site delivers its public data to you.
Why calling it beats parsing HTML
- Stability. A redesign can rewrite every CSS class and break an HTML scraper overnight. The JSON field
pricetends to staypricethrough redesign after redesign. Fewer moving parts means fewer 2 a.m. failures. - Cleaner data. You get typed fields — numbers as numbers, dates as dates, nested objects — instead of scraping text out of formatted markup and re-parsing it. Less guesswork, fewer edge-case bugs.
- Completeness. The endpoint often returns more than the page shows: extra attributes, stock levels, identifiers, pagination metadata. You frequently get richer data than what is visible.
- Efficiency. One JSON call can return what would otherwise take rendering a full page with images and scripts — which means a lighter, more respectful footprint on the source site.
How to find a hidden API (DevTools, Network, XHR)
You can do this yourself in any browser, on any public page, in a couple of minutes:
- Open the page and press
F12(or right-click → Inspect) to open DevTools. - Go to the Network tab and filter by Fetch/XHR. This hides images and scripts so you see only data requests.
- Reload the page, or trigger the action you care about — paginate, search, open a listing.
- Watch the requests appear. Click the ones that return JSON and look at the Response / Preview tab.
- When you spot the request whose response contains your fields — the prices, the listings, the search results — you have found the hidden API. Note its URL, method, and the parameters it takes.
From there, the work is reading that endpoint reliably: correct parameters, sensible pagination, a respectful request rate, and handling for the cases where the site expects a real browser session first. That last part is exactly what our guide on Cloudflare-protected public sites covers.
We keep a working starting point in the open: see the hidden-api-extraction-template repository for a clean structure you can build on.
Doing it the compliant way
A hidden API is still the site's infrastructure, so the same rules apply as to any access of a public source:
- Read the site's Terms of Service and respect them. If the terms forbid automated access, that is a stop sign.
- Honor
robots.txtand any rate limits. Pace your requests; do not hammer the endpoint. - Stick to public, factual, non-PII data — catalog and listing fields, not personal information and not anything behind a login.
- You operate and own the resulting feed. We build it to be a good citizen of the source.