Yelp is one of the largest business directory websites on the internet.
With over 90 million monthly visitors across their website and mobile app, users and businesses themselves are adding valuable information to the platform every day.
But how do you accurately and quickly collect this information in a useful format? After all, Yelp does not have a simple “export” feature to collect all the business information you might need.
Web scraping is the answer.
Web Scraping and ParseHub
Web scraping allows you to easily select any content on a webpage and extract it into a spreadsheet or API. This way, you can generate massive lists of high quality leads in minutes.
If you’d like to learn more about web scraping, read our guide on what web scraping is and what it’s usually used for.
In order to quickly scrape Yelp data, we will use ParseHub, a free and powerful web scraper with a suite of incredibly useful features. Make sure to download it for free before starting your web scraping project.
Scraping Yelp Data
Alright, so let’s get scraping. For this example, let’s assume that we are a distributor of disposable coffee cups in Toronto. As a result, we are interested in building a list of coffee shops in Toronto with their phone number, address and other details.
- First, we find the URL for Yelp’s result page for the keyword “coffee shop”.
- Next, make sure to download, install and open ParseHub to setup our scraping project.
- In ParseHub, click on New Project and enter the URL we’ve selected. The webpage will now be rendered inside the app.
Scraping Business Contact Information
- After the page is rendered, you will be able to make your first selection. Click on the first business name to select. It will then turn green to indicate it has been selected.
- The rest of the business names will then turn yellow. Click on the next business name to select all of them. They should all be green now.
- Now on the left sidebar, rename your selection to business.
- Next, click on the PLUS(+) sign next to the business selection and use choose Relative Select command. Then click on the first business name and then on the phone number next to it (An arrow will appear connecting the two).
- Repeat step 4 to also scrape the business address and neighbourhood
Scraping Rating and Reviews
Scraping rating scores and review numbers from Yelp will require some advanced ParseHub knowledge. Yelp’s site is coded in a way that might make a simpler web scraper not work.
Luckily, ParseHub can easily tackle this and we will make it a snap by walking you through the process.
- First, we will once again user Relative Select. We click on the business name first, and then over the star rating itself. (You might notice that the star ratings won't be highlighted when hovering over them, that's ok. You can still click on this element and extract data.)
- Feel free to rename the selection to rating.
- You will notice that by default ParseHub does not extract any data. So we will go into the extract command settings on the left sidebar.
- Here, we will use the extract command and choose “aria-label Attribute”. This will now update your project with the correct information.
- Lastly, use Relative select one last time to also scrape the number of reviews for each business.
Pro Tip: Reviews are important! A business with lots of reviews and a high rating is more likely to stay in business for a long time and become a long-term quality customer.
Dealing with Pagination
ParseHub is now ready to scrape the entire first page of results for your keyword. Next, we will instruct it to scrape the next couple of pages of results.
- On the left sidebar, click on the PLUS(+) sign on the page selection. Then use the select command.
- With the select command chosen, click on the “Next” link at the bottom of the Yelp page.
- By default, ParseHub will extract the link text and URL. We will use the icon next to this selection and remove these 2 items. Feel free to rename the selection to next.
- Use the PLUS(+) sign next to the next selection and choose the click command.
- A pop-up will appear asking if this a “Next” button. Click “Yes” and enter the number of times you’d like to click this button. For now, we’ll do 5 in order to scrape the first 6 pages of results.
Running your scrape
Now that you’re done setting up your scraping project, it’s time to run it!
Use the Get Data button on the left sidebar and click on Run to start your scraping job. Depending on how many pages you’ve chosen, the time for it to be completed will vary. In our case, the 6 pages were scraped in about a minute.
Pro Tip: For longer time-consuming jobs, always do a Test Run first. You might not want to wait for your file to come out and not be formatted in the way you want it.
Once your scrape is completed, you will be able to download it as an excel sheet or a JSON file.
Lead Generation and Web Scraping on Yelp
Lead generation with web scraping unlocks a new world of potential for your business. Gone are the hours putting information together or the amounts of dollars spent on lists form lead generation companies.
High-quality lead lists are now just a click away.
Or maybe you’ll use this data for an API and build your next awesome project with it.
If you’ve built something awesome with the help of ParseHub let us know over at our Twitter.