The Big Data movement taking hold in Financial Services is based in a large part on the aggregation of data from different sources, often using web scraping methods like a bot or spider. And while it's easy to see the web as an open source of information for the taking, it's worth reminding that it's really not. So before finding yourself at the wrong end of a cease and desist letter or court order, take a good look at the terms of service.
These words of warning come from The Practicing Law Institute's lecture, "Everyone's Doing It, But Is It Legal? Web Scraping & Online Data Harvesting," as part of the Social Media 2014: Addressing Corporate Risks seminar.
[To hear about how financial firms are managing their complex data architectures, attend the Future of the Financial Services Data Center panel at Interop 2014 in Las Vegas, March 31-April 4. You can also REGISTER FOR INTEROP HERE.]
Some PointersAre you scraping data from other sites? Interested in scraping? Or worried about being scraped? Cendali and Anthony Dreyer, Esq at Skadden, Arps, Slate, Meagher & Flom LLP, offer up this bit of advice:
If you are interested in scraping:
- Consider what information you need to crawl, and how you intend to use it (is it copyrighted?)
- Consider how often you need to crawl (repeated crawling can weigh on a site's servers, potentially triggering liability)
- Consider what others in the same industry are doing
- Respect Robots.txt files (Robots.txt is a text file that website owners can put in web site hierarchy to instructs automated software not to crawl the site)
If you are concerned about being scraped:
- Use Robots.txt
3 Kinds of Contract Agreements for Websites
Perhaps the best defense, for the scraper or scrapee is the Terms of Service. If it's not explicitly stated that scraping is against the site's user agreement then the scraper may have a better legal ground to stand on in court. Naturally, an iron clad terms of service (coupled with cease and desist orders) helps protect the site being scraped.
[Wall Street is making headway in the social media space. Read: ING Goes Social: Rolls Out LinkedIn for Advisors to learn more.]
Of course, things aren't always that simple. According to Dreyer court cases in this arena have demonstrated that the display of contract agreement is also an important element in a defense. There are three kinds of online agreements for websites:
- Click-wrap: These require users to consent to terms and conditions by clicking that pesky "I Agree" or "I Accept" button before the user can proceed to use a website. These are generally considered enforceable, due to the clear actionable assent. Although courts acknowledge users don't really read the terms of agreement they do so at their own risk.
- Browse-wrap: This is the posting of a link to the terms and conditions on a websites for users to click on if interested, but is not required to use the site. It is usually found at the very bottom of a webpage on a toolbar. In this case user consent is implied by continued use of the site. However, the visibility and accessibility of the link plays an extremely significant roll in court.
Cendali adds that while marketers are often at battle with legal over the size and prominence of terms of service, a company's best defense is to make sure all the terms of service are prominent. Use all the terms like scraping, crawling, spidering, data harvesting etc and don't feel bad about bothering viewers with bigger notices.
Case in point, after a few interesting court cases of its own over data scraping, Ticketmaster now has what can be considered a very clear, very bold and all capitalized browse-wrap link at bottom of the webpage. "People may not like it on their site but you find creative ways to show it," offers Cendali.