Request Builder
This page lets you build and test API requests interactively.
POST/v1/scrape
Query Parameters
api_token
stringrequiredA private API token is required for authentication when making any request to our API.
To get your own API token, please create an account here.
To get your own API token, please create an account here.
render_js
boolean(default: true)
By default, we use a headless browser (
this is useful for several reasons:
render_js = true
) that executes JavaScript on the page before delivering the final HTML result.this is useful for several reasons:
- Bypassing anti-scraping mechanisms: Some websites detect and block traditional HTTP requests(e.g., using Cloudflare).A headless browser helps simulate real user behavior and avoid detection.
- Handling dynamically loaded content: Many websites load content using JavaScript frameworks such as React, Vue, and Angular, which a traditional scraper might miss.
- Automating web interactions: You can define scenarios to fill out forms, click buttons, scroll pages, and interact with elements programmatically.
- Capturing screenshots: The headless browser allows taking a screenshot of the web page after it has fully loaded for verification or monitoring purposes.
timeout
integer(default: 180000)
The maximum time in milliseconds that the Scraping Rocket API will wait before timing out a request.
Setting a value that is too small can cause the request to stop before the execution is fully complete.
This is why we set
Setting a value that is too small can cause the request to stop before the execution is fully complete.
This is why we set
timeout = 180000
by default, ensuring the request has enough time to execute.proxy_type
enum<string>(default: regular)
There are three available types of proxies:
-
-
-
-
regular
: Our regular proxies located in the US.-
premium
: Uses our premium rotated proxy, allowing you to select any country for the proxy location.This is very useful for websites with anti - scraping mechanisms, as these proxies are difficult to detect.-
custom
: Uses your own proxy.You can specify your proxy in the proxy_url
parameter.show_only
enum<string>(default: rich_response)
You can choose the output format you want to receive from the request:
-
-
The Content-Type in the response header is set to
-
-
rich_response
Returns a JSON file containing:- statusCode: The HTTP status code of the request.
- statusInfo: Additional status details.
- url: The requested URL.
- message: Any relevant message regarding the request.
- duration: The total time taken to process the request.
- screenshotData: The screenshot(if requested).
- extractedData: The extracted data(if the extraction function is applied).
- html: The HTML result of the request.
Content-Type:application/json
-
html
: Returns only the HTML scraping result.The Content-Type in the response header is set to
Content-Type:text/html
-
screenshot
: Returns only screenshot data as a JSON file, containing:- A screenshot URL.
- A Base64-encoded version of the screenshot.
Content-Type:application/json
block_resources
boolean(default: true)
By default, we block resources such as JavaScript files within the HTML to prevent the script from waiting for them to load before returning the result. This significantly speeds up the request.
However, in some cases, these resources are necessary for the page to load correctly. If that's the case, you'll need to set
This parameter is only available when JavaScript rendering is enabled.
However, in some cases, these resources are necessary for the page to load correctly. If that's the case, you'll need to set
block_resources=false
to ensure all files are executed before the result is returned.This parameter is only available when JavaScript rendering is enabled.
wait_browser
enum<string>(default: load)
This parameter instructs the headless browser on which assets or data it should wait for before returning the result. There are four options to choose from:
-
-
-
-
Each option serves a specific use case, so pick the one that best aligns with your requirements.
This parameter is only available when JavaScript rendering is enabled.
-
load
: Use this when you need the entire page, including all assets (like images, scripts, and stylesheets), to be fully loaded.-
domcontentloaded
: Choose this if you only need the DOM to be ready and want to avoid waiting for external resources to load.-
networkidle
: Opt for this when you want to ensure the page is stable and there are no significant network requests happening.-
commit
: Select this for the fastest response, as it proceeds as soon as the initial HTTP request is committed, without waiting for the DOM or resources to load.Each option serves a specific use case, so pick the one that best aligns with your requirements.
This parameter is only available when JavaScript rendering is enabled.
resolve_captchas
boolean(default: false)
We rely on a third-party service to handle CAPTCHAs, including Google's reCAPTCHA and Cloudflare CAPTCHAs.
This approach can increase the response time because the page needs to be fully loaded before the CAPTCHA can be processed, and the CAPTCHA-solving process itself also requires additional time to complete.
This parameter is only available when JavaScript rendering is enabled.
This approach can increase the response time because the page needs to be fully loaded before the CAPTCHA can be processed, and the CAPTCHA-solving process itself also requires additional time to complete.
This parameter is only available when JavaScript rendering is enabled.
Body Parameters
extractor_function
stringThis is a Cheerio function designed to parse the HTML code obtained from scraping and extract specific data.
It can be configured to extract any desired data from the webpage.
We've simplified the process of generating this Cheerio function using our AI- powered selector.
The page is fully loaded in an iframe, allowing you to simply click on the elements you want to extract.
The Cheerio function is then automatically generated without requiring any coding, and you can reuse it for other similar URLs.
It can be configured to extract any desired data from the webpage.
We've simplified the process of generating this Cheerio function using our AI- powered selector.
The page is fully loaded in an iframe, allowing you to simply click on the elements you want to extract.
The Cheerio function is then automatically generated without requiring any coding, and you can reuse it for other similar URLs.
js_scenario
stringList of scenarios that enable you to execute various actions such as clicking buttons, filling forms, and scrolling, prior to scraping the desired HTML
For more details, please check here.
This parameter is only available when JavaScript rendering is enabled.
For more details, please check here.
This parameter is only available when JavaScript rendering is enabled.
headers
stringYou can send custom headers with the scraping request.
HTTP headers are key-value pairs, separated by a colon (:).
The
For example, if you want to send a User-Agent and Content-Type, it should look like this:
Cookies can be sent as part of the
HTTP headers are key-value pairs, separated by a colon (:).
The
headers
parameter that is sent should be in JSON format, like this:{"header-name-1": "header-value-1", "header-name-2": "header-value-2", ...}
For example, if you want to send a User-Agent and Content-Type, it should look like this:
{"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36", "Content-Type": "application/json"}
Cookies can be sent as part of the
headers
parameter, or you can include them in a separate cookies
parameter.