Web scraping can unlock invaluable insights for businesses of all kinds.
As a result, many companies will hire someone to take care of their web scraping projects.
If you’re interested in offering this service, you’ve come to the right place. We’ve put together a complete guide on how to offer web scraping services for your clients.
Before we get into the details, let’s review some of the basics.
What is web scraping?
Web scraping is a term used for collecting information from websites on the internet. Most commonly, this is done via automated tools or bots.
Clients who are interested in web scraping services will typically have a goal in mind like "I want all names, phone numbers, addresses and email addresses for every contact on this directory". Then, the scraping tool will crawl that website to extract that information and export it to the desired format (Excel, JSON, XML, etc.).
What do people use scraping for?
There are many reasons why people and businesses scrape data from a website. Here are some of the most common use cases that we have found:
- Scraping lead information from directories: either individual contact information or company information to populate CRMs. For example, scraping platforms such as Yelp or Yellow Pages.
- Market research: scrape pricing and other information on products from eCommerce websites, vehicles on dealership sites, trips on travel sites or property information from real estate sites
- Collect data from sites for various research purposes
- Build aggregators that collect blog posts, classified ads or jobs
- Scrape data from an old website to move the content over to a new website, where export or API are not available
- Scrape stock or cryptocurrency rates regularly
- Scrape reviews and comments for sentiment analysis
How can I scrape data from the web?
There are different ways of scraping information:
- Custom web scrapers: these are scrapers built by programmers in a variety of programming languages
- Web scraping software: these are tools that allow you to scrape data from the web without any prior information required.
Custom Web Scrapers
These are typically built by programmers in a variety of languages. People commonly use libraries like Scrapy, Beautiful Soup and Selenium to build them.
- Highly customizable and tailored to your needs
- If you hire someone to build it, little time investment on your part
- Difficult to maintain without programming knowledge
- If you have hired someone, you need to contact and pay them each time an issue arises or a change is required
- Each website requires that an entirely new scraper be built for it
Web Scraping Software
There are many software companies out there that provide software that allow you to scrape data without any programming knowledge. Some examples include: Import.io, Diffbot, Portia and our own software, ParseHub.
Get scraping now with our free Web Scraping tool - up to 200 pages scraped in minutes.
- Users can typically set up web scrapers with little or no technical knowledge
- In some cases, such as ParseHub, there is support available to set up your scraping projects
- Users can also maintain their project without having to contact a developer
- Pricing can start off quite low, with many providers offering a free version
- You can use the same piece of software to scrape multiple different sites, rather than building one new scraper for each new site you’d want to scrape
- Some web scraping software cannot handle more complex sites
- You will need to invest time in learning how to use the software. In some cases, this can be pretty easy depending on the software
- In some cases, you will need to be more technical to use more advanced features
How popular is web scraping?
The demand for web scraping services is high and rising.
A search for "web scraping" on Upwork shows that there are currently 833 jobs and Freelancer.com shows 1129.
We used ParseHub to quickly scrape the Freelancer.com "Websites, IT & Software" category and, of the 477 skills listed, "Web scraping" was in 21st position.
How much do people usually charge for web scraping services?
The cost for scraping a website varies, with some online freelancers offering extremely low prices such as $10/website.
However, scraping companies will tend to charge a higher price.
An experiment on Web Scraping services pricing
We contacted several scraping companies with a quote request for a weekly scrape of 6000 products on Amazon in four categories to extract the title, price, brand, description, ASIN, rating, number of reviews and the "Sold by" name and URL.
These are some of the quotes we received:
- $400 initial setup and $4500 - $5000 USD per year for managed services (assuming a middle ground of $4750, this is $395/month maintenance)
- $99 initial setup, $79/month for monthly maintenance and $5 per 10000 records per month (assuming 6000 records per week, this adds on $12 per month for a total of $91/month maintenance)
- $149 initial setup and $100/month maintenance
- $329 initial setup and the first 10,000 lines of data. After that, each additional record will be charged at $0.005/line. At 24,000 products per month, this is about $120/month maintenance
While it's a small sample, it is about a $244 average for the initial setup and a $177 average monthly maintenance fee. These prices are more or less in line with those reported by scraping.pro. Not too shabby if the project isn't too complex!
One-time Web Scraping jobs
A lot of these quotes are based on ongoing web scraping jobs. But what about one-time jobs?
ParseHub offers a free plan for your one-time scraping needs, additionally, we can setup and run the entire scraping job for you - just contact us to request a quote.
How difficult is it to provide web scraping services?
That depends! It depends on both the method you use (programming vs software) and the complexity of the website.
The above Amazon project, for example, takes an experienced user about 10 - 15 minutes to build and test on ParseHub.
What questions should I ask a potential client?
In order to understand the complexity of a web scraping project, typically we ask:
- What website are they trying to scrape?
- On that website, what specific elements are they interested in?
- What format would they like the data to be extracted to and how would they like that data formatted?
- Approximately how many pages will they be scraping?
- How regularly do they require this data?
On receiving this information we will look at the website and quickly build a sample project to understand how complex it will be. Things to consider are: how structured is the data? Is the layout they are requesting possible?
If the client is scraping a high volume of pages frequently, there may also be issues with the website attempting to block that traffic. In this case, rotating proxies (either from a pool of proxies or using custom proxies for that client) will usually be a requirement.
ParseHub offers IP Rotation services to prevent being blocked from popular websites for scraping.
Is web scraping legal?
Web scraping is legal in most cases. While we cannot provide legal expertise, we would encourage you to read some of the following literature and always check the terms of service of the website you are scraping.
- Dear Canada: Accessing Publicly Available Information on the Internet Is Not a Crime
- Judge Orders LinkedIn to Allow Startup Access to User Data
You can read more on the legality of web scraping here: Is web scraping legal?
In summary, web scraping is a highly in-demand skill that you can learn with relative ease. It is a great opportunity for agencies, consultants and freelancers to add web scraping to their service line-up.
[This post was originally written on May 7, 2019 and updated on August 1, 2019]