Founded in 1996, Indigo Books and Music Inc. has become Canada’s largest book, toy, gift, fashion and lifestyle retailer. Indigo has over 86 bookstores in Canada, some stores being named Chapters, which it acquired in 2014. Indigo also owns over 123 smaller stores under the brand names Coles, Indigospirit and The Book Company. Although having multiple locations all around Canada, they also have a variety of products on their website, from fiction novels to beauty, wellness and fashion accessories!
In this guide, we will show you how to scrape hundreds or even thousands of products from Indigo’s website, with ParseHub, our free web scraper!
Begin by downloading ParseHub for free, so you can follow along with this tutorial.
Let’s start scraping!
The way the products are displayed on Indigo’s website requires us to scroll down the page and load more products before we are ready to scrape products.
- Begin by opening ParseHub and clicking the “New Project” button.
- Enter the Indigo page you wish to scrape, we will scrape books on sale with this URL: https://www.chapters.indigo.ca/en-ca/books/shop-by-price/942081-cat.html
- Click the PLUS(+) button next to your page selection, click Advanced and choose Scroll.
- Under Scroll, repeat the scroll 2 times, to ensure we are going to the bottom of the page.
- Next, manually scroll down the page until you see the “Load More” button.
- Click the PLUS(+) button next to your page selection and choose “Select”.
- Click the “Load More” button to extract it, expand the selection and delete it, and rename this selection to “LoadMore”
- Click the PLUS(+) button next to your LoadMore selection and choose “Click”.
- Click “No” on the popup, since we are just loading products, and continue with “Stay on Current Template”
Now that we have the loading functionality, it’s time to loop multiple times, using ParseHub’s Jump Command, until we load the desired amount of products:
- Click the PLUS(+) next to the page selection, and choose the “Jump” command.
- Under “Jump to the selection labeled”, enter “page” so we can jump back to the start.
- Under “Maximum jumps”, enter the number of additional pages you wish to load, we will choose 3, which means 4 pages of products in total!
Now that we preloaded the products, we can now begin the scraping process:
- Firstly, click the PLUS(+) button next to your page selection and choose “Select”.
- Click the first product’s name, then the second one, to extract all the products.
- Rename this selection on the left to “book”.
- To get the author for each book, first, click the PLUS(+) button on your “book” selection.
- Choose “Relative Select” and click the first book, and then point and click the arrow to it’s author.
- Rename this selection on the left to “author”.
To scrape prices we will use the “Relative Select” tool again but need to add some conditional logic to clean up the data.
- Begin by clicking the PLUS(+) button next to your “book” selection again and choose “Relative Select”.
- Click the first product’s name, then hover over it’s price.
- On your keyboard, zoom out the selection by holding CTRL/CMD+1, until the whole price div is outlined, and click to finalize the selection.
- Rename this selection on the left to “price”.
- Now we need to format the data, begin by expanding the price extraction.
- Next, click the PLUS(+) button next to your “book” selection and choose “Conditional”.
- Drag the Conditional command in between the price and its extraction.
- Finally, in the expression, enter: !$e.outerHTML.contains("strikeout")
After following these steps, you should only get clean and current prices!
Usually, it’s easier to scrape eCommerce websites with ParseHub, however, if you made it this far, you should have learned some new ParseHub features!
To begin scraping, click the green “Get Data” button on the left pane. You can choose to test your scrape, run it, or even schedule it. In our case, we chose “Run” which got us 4 pages of data in total, as we specified earlier using ParseHub’s Jump command.
If you followed our steps, your data export should look similar to this:
If you need help with your web scraping, feel free to reach out to our live chat support on the ParseHub homepage.
Happy Scraping! 📚