Data scraping is a method of extracting large amounts of data from websites. To accomplish this, data scraping programs employ automated “bots” or “spiders” that harvest the information contained on your website and convert that information to their own use, commercial or otherwise.
Surprisingly, according to a report issued by Incapsula (a Web security company) in December 2013, bot traffic constitutes 61.5 percent of all website visitors. While not all bot traffic is bad (think Google search bots), the Incapsula report claims that nearly 30 percent of all bot traffic is malicious in nature, with five percent of all traffic coming in the form of data-scraping software.
The problem with data scraping
According to the Incapsula report, the damages caused by scraping bots includes:
• Website content theft and duplication
• Theft of email addresses for spam purposes
• The reverse engineering of pricing and business models.
These activities in turn could lead to increased bandwidth usage or network problems and could ultimately lead to complaints by legitimate users of the site.
How to prevent data scraping
Unfortunately, preventing data scraping is a difficult task in that the technology used generally mimics the usage of a real person. There are, however, tools in place to help a website owner combat improper data scraping of its site. Technologies such as Distil Networks are now being developed that can block and track those bots and spiders.
Also, you should know, especially if the commercial exploitation of your data by a scraper causes you financial harm, numerous legal causes of action exist that can be used in a lawsuit against a party that illegally scrapes data from your site. These include, but are not limited to copyright infringement, breach of contract and violations of the Computer Fraud and Abuse Act (CFAA).
Causes of action
2. Computer Fraud and Abuse Act. The Computer Fraud and Abuse Act (18 U.S. Code § 1030(a)(2)) imposes criminal and civil liability on “whoever … intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains … information from any protected computer.”
Content on this site or content controlled by the providers is provided solely for your personal, non-commercial use. Such content may not be framed, copied, reproduced, republished, uploaded, posted, transmitted, distributed, scraped, spidered and/or exploited in any way, including by email or other electronic means for commercial use without proper license agreements with Website Owner. Such use of the content is considered theft and will be prosecuted.
Lisa Allen is an associate attorney at the Lotus Law Center, where she specializes in helping small business clients with their legal needs. The Lotus Law Center was founded as a way to make legal services affordable for all sizes of businesses. Focusing on the practice of business and technology law, the Lotus Law Center provides premium personal and professional responses to the legal needs of business clients at an affordable fixed or hourly rate. This article is not meant as legal advice and is for general information purposes only. Consult your attorney for your specific needs. Contact her at Lisa@lotuslawcenter.com.