In this blog, we will teach you how to scrape news headlines, authors and content from  DigitalJournal.com, and any other news outlet. To get started, download our free web scraper, ParseHub!

Digital Journal has been covering news since 1998 and has over 70,000 articles published every single month. It was founded in Mississauga, Ontario in Canada. In 2001, Digital Journal began publishing magazines which it distributed all over Canada. Now they work with hundreds of large-scale publishers and PR firms. Being a big repository of the latest news and trends, there are a number of news categories you can scrape from, such as international news, tech, science, social media, business, entertainment, lifestyle, sports and much more.

Let’s start scraping news articles!

Step 1: Scraping Headlines

  1. Begin by opening the ParseHub application and clicking the blue “New Project” button.
  2. Enter the Digital Journal URL you wish to scrape from, we will be the latest tech articles with this URL: https://www.digitaljournal.com/tech-science
  3. Once the page loads on ParseHub’s browser, click the first article’s headline to extract it.
  4. The rest of the articles will turn yellow, click the next one to train the algorithm.
  5. Rename this selection on the left to “article”.

Step 2: Scraping Article Details

  1. Click the PLUS(+) button next to your “article” selection, and choose “Click”.
  2. Choose “No” on the popup, as this is not a next page button, we are going to each article to extract its contents.
  3. Create a new template with the green “Create New Template” button.
  4. Wait for the article page to load, then click its image, to extract it.
  5. Rename this selection on the left to “image”.
  6. Click the PLUS(+) button next to the “page” selection and choose “Select”.
  7. Hover over the article’s text content, click and hold CTR/CMD+1 until the full div is selected, then click your mouse to extract it.
  8. Rename this extraction to “content” on the left.

Step 3: Begin Digital Journal Scraping

Now that you have extracted headlines, and have set up a ParseHub template to extract each article’s inner data, you are ready to begin scraping!

Click the green “Get Data” button on the left to begin. You can choose to Test, Run or Schedule your scrape. We chose “Run” to scrape a single time on ParseHub’s server. Your scrape results should look similar to this:

P.S. You can also scrape HTML data from each article, instead of just text.

Need help scraping articles, trends or other websites? Contact our live chat support.

Happy Scraping! 💻