Skip to main content

Request Builder

This page lets you build and test API requests interactively.

POST/v1/scrape

Query Parameters

api_token

stringrequired
A private API token is required for authentication when making any request to our API.

To get your own API token, please create an account here.

url

stringrequired
This parameter is the full URL, including the protocol (HTTP/HTTPS), of the web page that you want to scrape or from which you want to extract data.

render_js

boolean

(default: true)

By default, we use a headless browser (render_js = true) that executes JavaScript on the page before delivering the final HTML result.

this is useful for several reasons:

  • Bypassing anti-scraping mechanisms: Some websites detect and block traditional HTTP requests(e.g., using Cloudflare).A headless browser helps simulate real user behavior and avoid detection.
  • Handling dynamically loaded content: Many websites load content using JavaScript frameworks such as React, Vue, and Angular, which a traditional scraper might miss.
  • Automating web interactions: You can define scenarios to fill out forms, click buttons, scroll pages, and interact with elements programmatically.
  • Capturing screenshots: The headless browser allows taking a screenshot of the web page after it has fully loaded for verification or monitoring purposes.

timeout

integer

(default: 180000)

The maximum time in milliseconds that the Scraping Rocket API will wait before timing out a request.

Setting a value that is too small can cause the request to stop before the execution is fully complete.

This is why we set timeout = 180000 by default, ensuring the request has enough time to execute.

proxy_type

enum<string>

(default: regular)

There are three available types of proxies:

- regular: Our regular proxies located in the US.

- premium: Uses our premium rotated proxy, allowing you to select any country for the proxy location.This is very useful for websites with anti - scraping mechanisms, as these proxies are difficult to detect.

- custom: Uses your own proxy.You can specify your proxy in the proxy_url parameter.

country_code

enum<string>
If the parameter proxy_type = "premium", you can choose the country where the premium proxy is located.

This parameter contains the country code of the selected country.

All countries are available with no exceptions.

proxy_url

string
If the parameter proxy_type = "custom", you can use your own proxy by entering its URL in this field.

show_only

enum<string>

(default: rich_response)

You can choose the output format you want to receive from the request:

- rich_response Returns a JSON file containing:

  • statusCode: The HTTP status code of the request.
  • statusInfo: Additional status details.
  • url: The requested URL.
  • message: Any relevant message regarding the request.
  • duration: The total time taken to process the request.
  • screenshotData: The screenshot(if requested).
  • extractedData: The extracted data(if the extraction function is applied).
  • html: The HTML result of the request.
The Content-Type in the response header is set to Content-Type:application/json

- html: Returns only the HTML scraping result.

The Content-Type in the response header is set to Content-Type:text/html

- screenshot: Returns only screenshot data as a JSON file, containing:

  • A screenshot URL.
  • A Base64-encoded version of the screenshot.
The Content-Type in the response header is set to Content-Type:application/json

screenshot

boolean

(default: false)

Take a screenshot of the full page.

This returns a JSON file containing:

  • A screenshot URL where the image can be accessed.
  • A Base64-encoded version of the screenshot.
This parameter is only available when JavaScript rendering is enabled.

block_resources

boolean

(default: true)

By default, we block resources such as JavaScript files within the HTML to prevent the script from waiting for them to load before returning the result. This significantly speeds up the request.

However, in some cases, these resources are necessary for the page to load correctly. If that's the case, you'll need to set block_resources=false to ensure all files are executed before the result is returned.

This parameter is only available when JavaScript rendering is enabled.

wait_browser

enum<string>

(default: load)

This parameter instructs the headless browser on which assets or data it should wait for before returning the result. There are four options to choose from:

- load: Use this when you need the entire page, including all assets (like images, scripts, and stylesheets), to be fully loaded.

- domcontentloaded: Choose this if you only need the DOM to be ready and want to avoid waiting for external resources to load.

- networkidle: Opt for this when you want to ensure the page is stable and there are no significant network requests happening.

- commit: Select this for the fastest response, as it proceeds as soon as the initial HTTP request is committed, without waiting for the DOM or resources to load.

Each option serves a specific use case, so pick the one that best aligns with your requirements.

This parameter is only available when JavaScript rendering is enabled.

resolve_captchas

boolean

(default: false)

We rely on a third-party service to handle CAPTCHAs, including Google's reCAPTCHA and Cloudflare CAPTCHAs.

This approach can increase the response time because the page needs to be fully loaded before the CAPTCHA can be processed, and the CAPTCHA-solving process itself also requires additional time to complete.

This parameter is only available when JavaScript rendering is enabled.

css_selector

string
Specify a CSS selector (such as a class .className or an ID #idName) to wait for before returning the result.

The script will pause until the element matching the selector is present in the DOM

This parameter is only available when JavaScript rendering is enabled.
Body Parameters

extractor_function

string
This is a Cheerio function designed to parse the HTML code obtained from scraping and extract specific data.

It can be configured to extract any desired data from the webpage.

We've simplified the process of generating this Cheerio function using our AI- powered selector.

The page is fully loaded in an iframe, allowing you to simply click on the elements you want to extract.

The Cheerio function is then automatically generated without requiring any coding, and you can reuse it for other similar URLs.

js_scenario

string
List of scenarios that enable you to execute various actions such as clicking buttons, filling forms, and scrolling, prior to scraping the desired HTML

For more details, please check here.

This parameter is only available when JavaScript rendering is enabled.

headers

string
You can send custom headers with the scraping request.

HTTP headers are key-value pairs, separated by a colon (:).

The headers parameter that is sent should be in JSON format, like this:
{"header-name-1": "header-value-1", "header-name-2": "header-value-2", ...}

For example, if you want to send a User-Agent and Content-Type, it should look like this:
{"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36", "Content-Type": "application/json"}

Cookies can be sent as part of the headers parameter, or you can include them in a separate cookies parameter.

cookies

string
Cookies are small pieces of data that websites store on a user's device, such as a computer or smartphone, through their web browser. They allow websites to retain information about the user, like login status, preferences, or tracking details, to improve and personalize the browsing experience.

When making requests, you can include cookies in two ways:

  • Within the headers parameter: Send them as part of the Cookie header in the format:
    {Cookie: cookie_name_1=cookie_value_1; cookie_name_2=cookie_value_2}
  • Using this dedicated cookies parameter: Pass them directly in the format:
    cookie_name_1=cookie_value_1; cookie_name_2=cookie_value_2
Both methods achieve the same goal of sending cookies to the server.