There are several ways of keeping track of the stock market.
From finance websites to modern investing apps.
However, you might want to access information on stocks in a format that is more convenient than a website and that gives you more details than an investing app. After all, the more information you have about the stocks you are investing in, the better investment decisions you can make.
In this case, we will go over the process of pulling information from Yahoo Finance into an excel spreadsheet.
Yahoo Finance and Web Scraping
Using a web scraper, you will be able to choose a specific set of stocks from Yahoo Finance and extract the exact information you’d need from each stock. For this example, we will extract information from the stocks in the Berkshire Hathaway Portfolio.
To complete this task, we will use ParseHub, an incredibly powerful and free web scraping tool.
- Make sure to download and open ParseHub. Then, click on New Project.
- Once you’ve created your new project, submit the URL you’d like to scrape. In this case, we have selected the Yahoo Finance page that keeps track of the Berkshire Hathaway Portfolio.
Scraping Yahoo Finance
- Once you submit the URL for your project, ParseHub will render the webpage. You will now be able to select the first element you’d like to extract.
- Scroll down to the list of stocks and click on the first stock symbol on the list. In this case, that’s AMZN. It will be highlighted in green to indicate that it has been selected.
- On the left sidebar, rename the selection to stock. ParseHub is now pulling the Symbol and details URL for this stock.
- Now, we will select the rest of the stocks in the list which are highlighted in yellow. Click on the second symbol on the list to select them all. They will all now be highlighted in green.
- We will now ask ParseHub to also pull the current date so we know what day the information belongs to. To do this, click on the PLUS(+) sign next to your stock selection.
- Click on Advanced to choose the Extract command. On the new extract command, use the dropdown and choose “Today’s Date”. Then rename your extraction to date.
- Repeat Step 5 to choose the Relative Select command.
- Using the Relative Select command, click on the first symbol on the list and then on the next field next to it. An arrow will appear to show the association. Then rename your selection to company.
- Repeat steps 7-8 to select the rest of the data fields, including Last Price, Change, Change Percentage, Market Time, Volume, Average Volume and Market Cap. Rename all your selections accordingly. Your project should now look like this:
Scraping the Stock Details Page
In the Yahoo Finance site, if you click through the details page for a specific stock, you can access additional details that are not present in the page we just scraped.
As a result, we will now tell ParseHub to click on each stock on the list we just scraped and pull additional data from these pages.
- Next to the stock selection, click on the PLUS(+) symbol and select the “click” command.
- A Click Setup screen will pop up, asking if this is a “next page” button. Click “No” and choose “Create New Template”. Name your new template details_page and click on “Create New Template”.
- ParseHub will now display the details page for the first stock on the list and create a Select command.
- Start by selecting the first label on the details table. In this case, it’s “Previous Close”. It will be highlighted in green to indicate that it has been selected.
- The rest of the labels on the table will be highlighted in yellow. Click on the rest of the labels to select them all, they will now be highlighted in green. Rename your selection to labels.
- Use the icon next to the labels selection to expand it.
- Delete the “begin new entry in labels” command.
- Now click on the PLUS(+) sign next to the labels selection and use the Conditional command. This will allow us to select which labels we want to pull information from.
- For our first Conditional command, we will use the following expression: $e.text.contains(“Previous Close”)
- Next, click on the PLUS(+) sign next to the conditional command to add a Relative Select command. Now use this command to click on the Previous Close label and then on the value beside it.
- Now ParseHub will extract the stock’s Previous Close into its own column. You can now copy-paste the conditional command we’ve created to pull additional labels. Just make sure to edit the conditional expression for each. For example, the expression for Open will be: $e.text.contains("Open”).
- Lastly, using drag and drop, make sure your selections are aligned correctly and not nested within each other. Your final template should look something like this:
Running and Scheduling your Scrape Job
Your scrape job is now complete. However, you might not want to extract the data right in the middle of the day.
You might be more interested in pulling data right before the stock market opens. We will now ask ParseHub to run our scrape job daily at 9AM EST, before the markets open.
Pro-Tip: Project Scheduling is a paid ParseHub feature.
To do this, on the left sidebar click on “Get Data”. Then click on the Schedule button next to the Run button.
On this menu, enter the time schedule the time you’d like to run the scrape at and click on “Save and Schedule”.
Exporting this kind of financial data on a schedule can be quite valuable.
But you might be interested in taking it to the next level.For example, you can setup your scrape job to output to a Google Spreadsheet to create an online document with all the data you have extracted.