News websites are full of valuable data.
This kind of data can be used for sentiment analysis, financial analysis and much more.
As a result, you might want to scrape data from a news website and extract it to an excel spreadsheet for further analysis.
Using a web scraper makes this an easy task to complete.
Free and Easy Web Scraping
For this project, we will use ParseHub, a free and powerful web scraper that can extract data from any website. Make sure to download and install ParseHub for free.
A web scraper will allow you to render the website you’re looking to scrape and click on the data you want to scape. The scraper will then automate the process and scrape data on to an excel spreadsheet.
For this example, we will scrape the news feed page for Newsweek.
Web Scraping a news site
It’s time to get our web scraping project started. Make sure to download and install ParseHub before getting started.
- Open ParseHub and click on “New Project”. Enter the URL you want to scrape, in this case, we will submit the Newsweek URL we selected. ParseHub will now render the website inside the app.
- Start by clicking in the title of the first news article on the page. It will be highlighted in green to indicate that it’s been selected.
- The rest of the headlines on the page will be highlighted in yellow. Click on the second one on the page to select them all. They will all now be highlighted in green. In the left sidebar, rename your selection to headline.
- Click on the PLUS (+) sign next to your headline selection and choose the “relative select” command.
- Using the Relative Select command, click on the Headline of the first article and then on the category above it. An arrow will appear to show the association you’re creating. Rename your selection to category.
- Repeat steps 4-5 to also add the article’s byline. Your project should now look like this:
Want to learn how to scrape more data? Check out our in-depth guide on how to scrape data from any website.
ParseHub is now pulling the data you’ve selected from the first page of news articles. We will now tell ParseHub to scrape additional pages of articles.
- Click on the PLUS(+) sign next to your page selection and choose the select command.
- Scroll all the way down to the bottom of the page and click on the “next page” button. Rename your selection to next.
- Use the icon next to your next selection to expand it.
- Now delete both extractions under this command.
- Click on the PLUS(+) sign next to your next command and choose the click command.
- A pop-up will appear asking you if this a next page link. Click on “yes” and enter the number of times you’d like to repeat this process, in this case, we will repeat it 5 more times.
Running your scrape
It is now time to run our scraping project. To do this, click on the green “Get Data” button on the left sidebar.
Here, you will be able to test, run or schedule your project. In this case, we will run it right away.
ParseHub will now go and collect the data you’ve requested from the website. Once the scrape is completed you will be notified.
Note: Keep in mind that some news websites will block some IPs from web scraping. To get around this, you might need to turn on IP Rotation in ParseHub.
Once your run is complete, you will be able to download it as a CSV/Excel sheet or JSON file.
You now know how to scrape data from any news website. If you run into any issues while setting up your project, reach out to us via the live chat on our site and we’ll be happy to assist you.