Skip to main content

Creating Cheerio JS Extractors

JS extractors are small JavaScript functions that can be sent to the ScrapingRocket API along with the target website's URL. These extractors receive the scraped content as a string. The extractor function is executed in the ScrapingRocket cloud and can use the Cheerio HTML parser to extract useful and clean data from the website's HTML.

Why use JS Extractors?

JS extractors are optional. If you are running the ScrapingRocket API on your own cloud server (for example, in a Python or Node.js environment), you may not necessarily need to create a JS extractor. In this case, you can process the raw ScrapingRocket output locally, using libraries like BeautifulSoup in Python or Cheerio installed via npm. This can be more convenient.

However, if you are running the ScrapingRocket API call in a no-code environment like Make.com, writing extractors becomes invaluable. They return clean JSON that no-code environments can process easily.

Developing JS Extractors with Visual Selector

At ScrapingRocket, we simplify the process by offering a Visual Selector that helps you create Cheerio extractors quickly. You can visually select the elements you want to extract from a page, and the Visual Selector will automatically generate the corresponding Cheerio configuration.

Features of the Visual Selector:

  • Real-time updates: View the generated Cheerio configuration in real-time as you select elements from the webpage.
  • Multiple formats: Export the generated Cheerio configurations into various formats that you can use in your scraping tasks.
  • Copy-paste functionality: Easily copy your Cheerio configuration and paste it into the HTML API Request Builder.

To get started with the Visual Selector, visit our Extractor Sandbox.

Integrating Extractors into API Requests

Once you've created your Cheerio configuration using the Visual Selector, you can easily integrate it into your scraping requests through the HTML API Request Builder. This allows you to pass your custom extractor directly to the ScrapingRocket API to retrieve clean, structured data.

Check out our HTML API Request Builder for more information on how to use your extractors in API requests.

Captcha Resolution

If the target website is protected by a CAPTCHA, you can use our Captcha Resolution feature to handle these challenges seamlessly. Simply enable CAPTCHA solving in the request options, and ScrapingRocket will automatically resolve the CAPTCHA for you.

Pricing

To learn more about our pricing plans, visit our Pricing Page.