Is web scraping legal?
This question is asked a lot. In fact, according to Google Trends, searches for the term “web scraping legal” have been on a steady rise over the past 4 years.
This comes as no surprise given the growth of web scraping and many recent legal cases that related to web scraping.
Today, we will go over a few notorious legal cases and the insight of a tech lawyer to breakdown the topic and answer the question regarding the legality of web scraping.
Web Scraping Publicly Available Data
First, we have to make a clear distinction about the type of data we are talking about when discussing the legality of web scraping.
This is data that can be accessed by anyone with an internet connection. For example, a public LinkedIn profile or a Craigslist listing.
How to know if data on the internet is considered publicly available:
- The user who posted said data has decided to make it public.
- A user does not need to create an account or login to access the data.
- The website’s robots.txt does not block web scrapers or spiders.
On the other hand, there are cases of collecting and scraping private data which exist in a completely different realm of legality. Most notably, there is the case of Cambridge Analytica and their collection of private data from Facebook Users.
In this context, we will be referring exclusively to publicly available data.
Notable Web Scraping Legal Cases
Legal cases are some of the best resources when looking at the legality of any activity. We will review 2 recent and notable legal cases surrounding web scraping.
LinkedIn vs hiQ Labs
hiQ Labs is a data analytics firm that focuses on workforce data and people analytics. Their analysis provides insights for their clients about specific industries.
One of the ways that hiQ Labs collected data to fuel their insights was by scraping data from public LinkedIn profiles.
As a response, LinkedIn blocked hiQ Labs tools’ from accessing this publicly available data and served them with a cease and desist letter. Their argument was that hiQ Labs’ activities were in violation of the Computer Fraud and Abuse Act (CFAA).
hiQ went on to fight this by filling a suit and obtaining a preliminary injunction in 2017. The district court found that hiQ was “likely to succeed” on its claims that accessing publicly available data was not a violation of the CFAA.
Computer Fraud and Abuse Act (CFAA)
The thing about the Computer Fraud and Abuse Act is that it criminalizes access of protected computers and servers without authorization or beyond their authorized access.
Therefore, there is a disconnect between the CFAA and the automated access of publicly available data.
As a result, the 9th US Circuit Court of Appeals upheld hiQ’s injunction on September of 2019.
While this is not a Supreme Court ruling or the creation of a specific law that protects web scraping, it definitely paves the way for a potential future verdict.
Craigslist vs Padmapper and others
In a similar case from 2017, Craigslist filed a suit against a number of startups (including Padmapper) which scraped Craigslist data to support their services.
The defendants were worried after the trial court did not toss the case. As a result, this case was settled out of court.
Cases like these will now probably be less likely due to the hiQ Labs vs LinkedIn case.
In his piece, Jason calls for the US Congress or the US Supreme Court to make a decision for the legality of web scraping. He claims this is needed in order to achieve an “open and healthy internet”.
While we are definitely not lawyers, we have a similar take to Jason’s.
If a website or user makes the decision to make their data public, then scraping it should be legal.
We believe that in 20 years, people will be surprised to learn that web scraping existed in a legal grey area during our times.
The legality of web scraping is still relatively up in the air. But that doesn’t mean web scraping is illegal, either.
However, the hiQ vs LinkedIn case might be the resolution this issue needs if there’s a verdict set by the US Supreme Court at some point.