100 Petabytes Too Big Data? Well, Not Enough for Facebook
How much data is just enough? Something like 105 Terabytes every 30 minutes, or a massive volume of 100 Petabytes? Well, if you think this is just too much, you are wrong as there is an entity for which this gigantic data volume is not enough. None other than our favorite social networking site Facebook has surpassed the limits of Hadoop, whose total volume currently weighs at 100 Petabytes.
Every day, Facebook receives 2.7 billion Likes, while 2.5 billion content items are shared on the social networking site. It uses Hadoop to empower many of its features, like messaging, along with optimizing its advertising performance and to conduct data analysis. With Hadoop’s data analysis techniques, it determines the effectiveness of features or advertisements against each other based on specific demographics, and also leverage the results to tweak features and improve targeting.
Facebook utilizes Hive, an open source project created by Facebook that is the most widely used access layer within the company to query Hadoop using a subset of SQL, and HiPal, social network’s homegrown, closed source, and end-user tool. It needs all these in order to handle and analyze its gigantic volume of data. While Hive allows Facebook to have business intelligence, HiPal compliments it by enabling data discovery, query authoring, charting, and dashboard creation in graphical form.
So, what the scenario is that Facebook has reached the upper limit of raw Hadoop capacity by declaring itself the world’s largest Hadoop cluster.
It has also started the Prism project in order to overcome the limitation of Hadoop, which is that Hadoop must confine data to one physical data center location. With Prism, a logical abstraction layer is added so that a Hadoop cluster can run across multiple data centers, effectively removing limits on capacity. Facebook says it will open-source Prism soon.
After all, it is certain to expand its database!
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.