What is web scraping?
Web scraping is a term used for collecting information from websites on the internet.
Clients who are interested in web scraping services will typically have a goal in mind like "I want all names, phone numbers, addresses and email addresses for every contact on this directory" and the scraper will crawl that website to extract that information and export it to the desired format (Excel, JSON, XML... etc.).
What do people use scraping for?
There are many reasons people scrape data from the website. Here are some of the most common use cases that we have found:
- Scraping lead information from directories: either individual contact information or company information to populate CRMs
- Market research: scrape pricing and other information on products from eCommerce websites, vehicles on dealership sites, trips on travel sites or property information from real estate sites
- Collect data from sites for various research purposes
- Build aggregators that collect blog posts, classified ads or jobs
- Scrape data from an old website to move the content over to a new website, where export or API are not available
- Scrape stock or cryptocurrency rates regularly
- Scrape reviews and comments for sentiment analysis
How can I scrape data from the web?
There are different ways of scraping information:
- Custom web scrapers: these are scrapers built by programmers in a variety of programming languages
- Web scraping software: these are tools that allow you to scrape data from the web without any information required.
Custom Web Scrapers
These are typically built by programmers in a variety of languages. People commonly use libraries like Scrapy, Beautiful Soup and Selenium to build them.
- Highly customizable and tailored to your needs
- If you hire someone to build it, little time investment on your part
- Difficult to maintain without programming knowledge
- If you have hired someone, you need to contact and pay them each time an issue arises or a change is required
- Each website requires that an entirely new scraper be built for it
Web Scraping Software
There are many software companies out there that provide software that allow you to scrape data without any programming knowledge. Some examples include: Import.io, Diffbot, Portia and our own software, ParseHub.
- Users can typically set up web scrapers with little or no technical knowledge and are provided with support
- Users can also maintain their project without having to contact a developer
- Pricing can start off quite low, with many providers offering a free version
- Some web scraping software cannot handle more complex sites
- You will need to invest time into learning the software
- In some cases, you will need to be more technical to use more advanced features
How popular is web scraping?
The demand for web scraping services is high.
A search for "web scraping" on Upwork shows that there are currently 833 jobs and Freelancer.com shows 1129.
We scraped the Freelancer.com "Websites, IT & Software" category and, of the 477 skills listed, "Web scraping" was in 21st position.
How much do people usually charge for web scraping services?
The cost for scraping a website varies, with some online freelancers offering extremely low prices such as $10/website.
However, scraping companies will tend to charge a higher price. We contacted several scraping companies with a quote request for a weekly scrape of 6000 products on Amazon in four categories to extract the title, price, brand, description, ASIN, rating, number of reviews and the "Sold by" name and URL. These are some of the quotes I received:
- $400 initial setup and $4500 - $5000 USD per year for managed services (assuming a middle ground of $4750, this is $395/month maintenance)
- $99 initial setup, $79/month for monthly maintenance and $5 per 10000 records per month (assuming 6000 records per week, this adds on $12 per month for a total of $91/month maintenance)
- $149 initial setup and $100/month maintenance
- $329 initial setup and the first 10,000 lines of data. After that, each additional record will be charged at $0.005/line. At 24,000 products per month, this is about $120/month maintenance
While it's a small sample, it is about a $244 average for the initial setup and a $177 average monthly maintenance fee. These prices are more or less in line with those reported by scraping.pro. Not too shabby if the project isn't too complex!
While I did not request other quotes, it's possible that users may simply want a one-time scrape rather than scraping the site on a regular basis. We provide this service at a cost of $699/website, for example.
How difficult is it to provide web scraping services?
That depends! It depends on both the method you use (programming vs software) and the complexity of the website.
The above Amazon project, for example, takes an experienced user about 10 - 15 minutes to build and test on ParseHub. If you are not familiar with the software, it may take longer.
What questions should I ask a potential client?
In order to understand the complexity of a web scraping project, typically we ask:
- What website are they trying to scrape?
- On that website, what specific elements are they interested in?
- What format would they like the data to be extracted to and how would they like that data formatted?
- Approximately how many pages will they be scraping?
- How regularly do they require this data?
On receiving this information we will look at the website and quickly build a sample project to understand how complex it will be. Things to consider are: how structured is the data? Is the layout they are requesting possible?
If the client is scraping a high volume of pages frequently, there may also be issues with the website attempting to block that traffic. In this case, rotating proxies (either from a pool of proxies or using custom proxies for that client) will usually be a requirement.
Is web scraping legal?
Web scraping is legal in most cases. While we cannot provide legal expertise, we would encourage you to read some of the following literature and always check the terms of service of the website you are scraping.
- Dear Canada: Accessing Publicly Available Information on the Internet Is Not a Crime
- Judge Orders LinkedIn to Allow Startup Access to User Data
In summary, web scraping is a highly in-demand skill that you can learn with relative ease. It is a great opportunity for agencies, consultants and freelancers to add web scraping to their service line-up.