Looking for the best web scraper for your project?

Allow us to compare some of the 2 most popular options in the market.

Scrapy and ParseHub are both very powerful and useful web scraping tools. Today, we will put both tools head-to-head to determine which is the best for your scraping project.

Scrapy Introduction

Scrapy is probably the most popular open-source framework for web scraping. It's been around since at least 2008. It started out as an open-source release of a python framework built for scraping a large number of websites for a commercial enterprise.

The framework turned out to be so successful on its own that the creators of it formed a company around it - ScrapingHub.

ParseHub Introduction

ParseHub is a full-fledged web scraper. It comes as a free desktop app with premium features. Hundreds of users and businesses around the world use ParseHub daily for their web scraping needs.

ParseHub was built to be an incredibly versatile web scraper with useful features such as a user-friendly UI, page navigation, IP rotations and more.

In this article, we will first compare the visual web scraping tool ParseHub to Scrapy as an open-source python project. We will also compare ParseHub to the ScrapingHub paid service which runs Scrapy spiders for a fee.

ParseHub and Scrapy Comparison (Plus Portia)

Comparing ParseHub to Scrapy is somewhat of an apples-to-oranges comparison because one is a UI tool and the other is a programming library. A more apples-to-apples comparison would be to the associated open-source project Portia, also built by ScrapingHub.

We’ve gone ahead and compared Portia and ParseHub in an in-depth guide.

But since Scrapy is so established, we will confine this article to the first comparison.

ParseHub Features vs Scrapy Features

 

FEATURE

PARSEHUB

SCRAPY

Authoring environment

Desktop app (Mac, Windows and Linux)

Python plus scrapy command-line tool

Scraper logic

Variables, loops, conditionals, function calls (via templates)

Variables, loops, conditionals, function calls (arbitrary python)

Javascript, Ajax and dynamic content

Yes

With external libraries

Pop-ups, infinite scroll, hover content

Yes

With external libraries

Debugging

Visual debugger

Python logs

Knowledge of HTML and HTTP

None required

Required

Selecting elements

Point-and-click, CSS selectors, XPath

CSS selectors, XPath

Transforming data

Regex, javascript expressions

Regex, arbitrary python

Speed

Fast parallel execution

Fast parallel execution

Hosting

Hosted on cloud of hundreds of ParseHub servers

Hosted on your local machine or your own servers. Can pay for ScrapingHub to host it for you.

IP Rotation

Included in paid plans

Must pay external service

Sites (AKA spiders, scrapers, projects)

Free plan: 5, $99/month: 20, $499/month: 120

Limited by your infrastructure or as sold by Scrapy Cloud

Support

Free professional support

Community support

Data export

CSV, JSON, API

CSV, JSON, API

Run-time configuration

Passed in as a JSON object

Passed in the command line, arbitrary python

ParseHub offers most of the web scraping power and scale of Scrapy in a much easier-to-use package. Because we're actually big fans of Scrapy, we still recommend it for a few situations:

  • Tight integration with existing python codebase and infrastructure
  • Crawling hundreds of websites and grabbing all of the HTML code

ParseHub Pricing vs Scrapy Pricing

Scrapinghub is a paid service for running web scrapers (AKA spiders or projects) created with the open-source python framework Scrapy. It is equivalent to ParseHub's "run on server" and "run on a schedule" service which is integrated into the ParseHub desktop app.

At first glance, the main difference between the two services appears to be their pricing. ParseHub packages capabilities into conventional software-as-a-service (SaaS) plans Free, Standard ($99) and Professional ($499). Scrapinghub prices its service in $9 "Scrapy Cloud units", similar to infrastructure-as-a-service (IaaS) such as Amazon EC2.

ParseHub clearly defines how many pages a minute it will provide for each plan. Scrapinghub offers additional "concurrent crawls" for $9 each. You’d have to calculate how many “Scrapy Cloud units” you would need to run your project at the same speed as a ParseHub paid plan for a closer dollar-to-dollar comparison.

Additional Features

ParseHub bundles all its features in a single package that you can upgrade or downgrade as needed. However, ScrapingHub de-couples several web scraping elements into its own platforms that can quickly add up when going with the paid options.

For example, ParseHub and Scrapinghub both offer IP rotation, but Scrapinghub sells it in a separate service, Crawlera, starting at $25 a month and up to $500 or more a month.

Free Plans

Both services offer a free plan that grants multiple projects and hundreds or more pages.

We recommend you try out the free plans for both tools first before making a decision on paid plans. Visit our download page to start web scraping for free with ParseHub now.

Final Thoughts: ParseHub vs ScrapingHub

Like we mentioned earlier, ParseHub vs Scrapinghub is somewhat of an apples-to-oranges comparison. ParseHub is designed to work at a higher level in which most of the features of Scrapinghub are bundled together.

ParseHub is also a better choice if you do not have the technical knowledge to build and deploy spiders on your own.

You may also work with an business that deals with 'Big Data' and data engineering services.

Scrapinghub is a good choice if you are already convinced that Scrapy is for you. If you are just starting out, we encourage you to try ParseHub which will get you up and running easier and faster.

[This post was originally written on July 15, 2016 and updated on August 9, 2019]