The hardest part about web scraping can be getting to the data you want to scrape.
For example, you might want to scrape data from a search results page for a number of keywords.
You mighty setup separate scraping projects for each keyword.
However, there are powerful web scrapers that can automate the searching process and scrape the data you want.
Today, we will set up a web scraper to search through a list of keywords and scrape data for each one.
A Free and Powerful Web Scraper
For this project, we will use ParseHub. A free and powerful web scraper that can scrape data from any website. Make sure to download and install ParseHub for free before we get started.
We will also scrape data from Amazon’s search result page for a short list of keywords.
Searching and Scraping Data from a List of Keywords
Now it’s time to setup our project and start scraping data.
- Install and Open ParseHub. Click on “New Project” and enter the URL of the website you will be scraping from. In this case, we will scrape data from Amazon.ca. The page will then render inside the app and allow you to start extracting data.
- Now, we need to give ParseHub our list of keywords we will be searching through to extract data. To do this, click on the settings icon at the top left and click on “settings”.
- Under the “Starting Value” section you can enter your list of keywords either as a CSV file or in JSON format right in the text box below it.
- If you’re using a CSV file to upload your keywords, make sure you have a header cell. In this case, it will be the word “keywords”.
- Once you’ve submitted your list of keywords, click on “Back to Commands” to go back to your project. Click on the PLUS (+) sign next to your “page” selection, click on Advanced and click on the “Loop” command.
- By default, your list of keywords will be selected as the list of items to loop through. If not, make sure to select “keywords” from the dropdown.
- Click on the PLUS (+) sign next to your “For each item in keywords” selection and choose the “Begin New Entry” command. This command will be named “list1” by default.
- Click on the PLUS (+) sign next to your “list1” tool and choose the “Select” command.
- With the select command, click directly on the Amazon search bar to select it.
- This will create an input command, under it, choose “expression” from the dropdown and enter the word “item” on the text box.
- Now we will make it so ParseHub adds the keyword for each result next to it. To do this, click on the PLUS (+) sign next to the “list1” command and choose the “extract” command.
- Under the extract command, enter the word “item” into the first text box.
- Now, let’s tell ParseHub to perform the search for the keywords in the list. Click on the PLUS (+) sign next to your “list1” selection and choose the select command.
- Click on the Search Button to select it and rename it to “search_bar”
- Click on the PLUS (+) sign next to your “search_bar” selection and choose the “Click” command
- A pop-up will appear asking you if this is a “next page” button. Click on “No” and rename your new template to “search_results”
- Now, let’s navigate to the search results page of the first keyword on the list and extract some data.
- Start by switching over to browse mode on the top left and search for the first keyword on the list.
- Once the page renders, make sure you are still working on your new “search_results” template by selecting it with the tabs on the left.
- Now, turn off Browse Mode and click on the name of the first result on the page to select it. It will be highlighted in green to indicate that it has been selected.
- The rest of the products on the page will be highlighted in yellow. Click on the second one on the list to select them all.
- ParseHub is now extracting the product name and URL for each product on the first page of results for each keyword.
Do you want to extract more data from each product? Check out our guide on how to scrape product data from Amazon including prices, ASIN codes and more.
Do you want to extract more pages worth of data? Check out our guide on how to add pagination to your project and extract data from more than one page of results.
Running your Scrape
It is now time to run your project and export it as an Excel file.
To do this, click on the green “Get Data” button on left sidebar.
Here you will be able to test, run or schedule your scrape job. In this case, we will run it right away.
ParseHub is now off to extract all the data you’ve selected. Once your scrape is complete, you will be able to download it as a CSV or JSON file.
You now know how to search through a list of keywords and extract data from each result.
What website will you scrape first?