Overstock.com is a global online discount retailer, that sells discounted and wholesale furniture, rugs, decor, home improvement goods, and more. It was founded in 1999 when the website primarily sold liquidated and closeout furniture. However, now the website sells new furniture as well, at competitive prices. As we will see when scraping products, most of their items are on sale, which attracts many customers around the world.
The company has over 1,600 employees and generates over 2 billion dollars in revenue. In this guide, we will be scraping furniture on Overstock.com using ParseHub, our free web scraper.
Let’s start scraping!
- Firstly, open ParseHub on your PC, Mac or Linux system.
- Click the “New Project” button.
- You can now enter the Overstock.com URL you wish to scrape from, we will be scraping living room chairs with this URL: https://www.overstock.ca/Home-Garden/Living-Room-Chairs/2737/subcat.html
- Click the first product’s name to extract it, the rest of the product names should turn yellow.
- Click the next product’s name to train the ParseHub scraping algorithm.
- Now all 59 products should be extracted, rename this selection on the left to “product”.
Using ParseHub’s Relative Select tool, we will be able to scrape additional details from each product, such as prices, ratings and more.
- First, click the PLUS(+) icon next to the “product” selection, and choose “Relative Select”.
- Click the first product’s name, and then hover over the price.
- Hold CTR/CMD+2 to zoom into the price text node, and click the price.
- You may have to repeat this for the second product to train the algorithm.
- All prices should be respectively extracted, rename this selection to “price” on the left.
- You can expand the extraction and remove the URL, as it’s the same as the product URL.
For reviews, we will use the same steps of extracting prices, except there is an additional step to get the ratings out of five.
- Begin by clicking the product’s PLUS(+) icon, and choose Relative Select.
- Click the product’s name and hover over the star ratings.
- This time zoom out by holding CTRL/CMD+1, until the whole rating div is shown.
- Now click the rating div to extract it, you may need to repeat this process for the next product.
- Rename this selection on the left to rating, expand the data, and delete the url extraction.
- Finally, click the rating extraction, and under the Extract dropdown, choose “title Attribute”.
- Now you should see the product ratings out of five on the data preview below!
To scrape multiple pages of products, we need to use ParseHub’s pagination.
- Begin by scrolling down the webpage until you see the page navigation bar.
- Click the PLUS(+) icon next to your “page” selection and choose “Select”.
- Click the right arrow button, it should be an SVG.
- Rename this selection to “pagination” on the left, and expand it to delete the extractions.
- Click the PLUS(+) button next to your “pagination” selection and choose “Click”.
- Choose “Yes” as this is a next-page button.
- Finally, choose the additional amount of pages you wish to scrape. We chose 2 to scrape 3 pages in total.
IP Rotation (Bypass Blocks)
At the time of this guide, ParseHub’s IP rotation is required when scraping Overstock.com. Without IP Rotation, you will get empty results. Note: this is a paid ParseHub feature.
- To turn on IP Rotation, click the settings cog at the top left of ParseHub, and choose “Settings”.
- Tick the Rotate IP Addresses checkbox, and confirm the decision on the popup.
- You are now ready to begin scraping without blocks.
You are now ready to start your scraping project on ParseHub’s servers! To begin scraping the Overstock.com data, click the green “Get Data” button on the left pane. You can choose to Test, Run or Schedule your scrape. We chose Run to scrape the 3 pages a single time!
If you followed this guide correctly, your data should look similar to this:
We also have a guide on scraping any e-commerce website with ParseHub. If you’re having trouble with ParseHub, you can reach out to our live chat support.
Happy Scraping! 🛋️