Web scrapers come in many different forms.
From simple browser plugins to more robust software applications. Depending on the web scraper you’re using, you might or might not be able to scrape multiple pages of data in one single run.
Today, we will review how to use a free web scraper to scrape multiple pages of data. These include pages with 2 different kinds of navigation.
For this, we will use ParseHub, a free and powerful web scraper that can extract data from any website.
Pagination with ParseHub
If you have never used ParseHub before, do not fret. It is actually quite easy to use while still being incredibly powerful.
In basic terms, ParseHub works by loading the website you’d like to scrape and letting you click on the specific data you want to extract.
Taking it a step further, you can also instruct ParseHub to interact or click on specific elements of the pages in order to browse to other pages with more data in them. That means you can make ParseHub a pagination web scraper and click through to navigate through multiple pages.
Scraping Multiple Pages on a Website
A Website’s pagination (or the lack thereof) can come in many different ways. Let’s break down how to deal with any of these scenarios while scraping data.
Clicking on the “Next Page” Button
This is probably the most common scenario you will find when scraping multiple pages of data. Here’s how to deal with it:
- In ParseHub, click on the PLUS(+) sign next to your page selection and choose the Select command.
- Using the select command, click on the “Next Page” link (usually at the bottom of the page you’re scraping). Rename your new selection to NextPage.
3. Expand your NextPage selection by using the icon next to it and delete both Extract commands under it.
4. Using the PLUS(+) sign next to your NextPage selection, choose the Click command.
5. A pop-up will appear asking you if this a next page link. Click on “Yes” and enter the number of times you’d like to repeat the process of clicking on this button. (If you want to scrape 5 pages of data total, you’d enter 4 repeats).
No “Next Button”
Sometimes, there might be no next page link for pagination. In these cases, there might just be links to the specific page numbers such as the image below.
Here’s how to navigate through these with ParseHub:
- In ParseHub, click on the PLUS (+) sign next to your page selection and click on the current page number (In this case, page 1). Rename your selection to CurrentPage.
- Click on the PLUS (+) sign next to the CurrentPage selection and add a Relative Select command.
- Using the Relative Select command, click on the current page number and then on the next page number. An arrow will appear to show the connection you’re creating. Rename this selection to NextPage.
- Now, use the PLUS (+) sign next to the NextPage selection to add a Click Command.
- A pop-up will appear asking you if this a “Next Page” link. Click on “Yes” and enter the number of times you’d like to repeat this process (If you want to scrape 5 pages of data total, you’d enter 4 repeats).
- ParseHub will now load the next page of results. Scroll all the way down and check that the NextPage Relative Selection you created is now selecting Page 3 instead of Page 2 again. If it is, then click on Page 2 and then on Page 3 to train ParseHub accordingly.
Other Methods of Scraping Multiple Pages
You might also be interested in scraping multiple pages by searching through a list of keywords or by loading a predetermined list of URLs.
These are tasks that ParseHub can easily tackle as well. Check out Help Center for these guides.
- How to scrape by entering a list of keywords into a search box
- How to scrape by loading a list of URLs
You now know how to scrape multiple pages worth of data from any website.
However, we know that websites come in many different shapes and forms. The methods highlighted in this article might not work for your specific project.
If that’s the case, reach out to us at hello(at)parsehub.com and we’ll be happy to assist you with your project.
If the website is an infinite scroll page, you can read our tutorial here: Scraping infinite scroll pages