How Facebook stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency.

Priyanka Bhide
4 min readMar 12, 2021

Have you ever seen one of the videos on Facebook that shows a “flashback” of posts, likes, or images — like the ones you might see on your birthday or on the anniversary of becoming friends with someone? If so, it is the example of Big Data

Every day, we feed Facebook’s data beast with mounds of information. Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. That is a LOT of data.

At first, this information may not seem to mean very much. But with data like this, Facebook knows who our friends are, what we look like, where we are, what we are doing, our likes, our dislikes, and so much more. Some researchers even say Facebook has enough data to know us better than our therapists!

Apart from Google, Facebook is probably the only company that possesses this high level of detailed customer information. The more users who use Facebook, the more information they amass. Heavily investing in its ability to collect, store, and analyze data, Facebook does not stop there. Apart from analyzing user data, Facebook has other ways of determining user behavior.

  1. Tracking cookies: Facebook tracks its users across the web by using tracking cookies. If a user is logged into Facebook and simultaneously browses other websites, Facebook can track the sites they are visiting.
  2. Facial recognition: One of Facebook’s latest investments has been in facial recognition and image processing capabilities. Facebook can track its users across the internet and other Facebook profiles with image data provided through user sharing.
  3. Tag suggestions: Facebook suggests who to tag in user photos through image processing and facial recognition.
  4. Analyzing the Likes: A recent study conducted showed that it is viable to predict data accurately on a range of personal attributes that are highly sensitive just by analyzing a user’s Facebook Likes. Work conducted by researchers at Cambridge University and Microsoft Research shows how the patterns of Facebook Likes can very accurately predict your sexual orientation, satisfaction with life, intelligence, emotional stability, religion, alcohol use and drug use, relationship status, age, gender, race, and political views — among many others.

Facebook Inc. analytics chief Ken Rudin says, “Big Data is crucial to the company’s very being.” He goes on to say that, “Facebook relies on a massive installation of Hadoop, a highly scalable open-source framework that uses clusters of low-cost servers to solve problems. Facebook even designs its hardware for this purpose. Hadoop is just one of many Big Data technologies employed at Facebook.”

Here is one example that show how Facebook uses its Big Data.

Example — The Flashback

Honoring its 10th anniversary, Facebook offered its users the option of viewing and sharing a video that traces the course of their social network activity from the date of registration until the present. Called the “Flashback,” this video is a collection of photos and posts that received the most comments and likes and set to nostalgic background music.

Other videos have been created since then, including those you can view and share in celebrating a “Friendversary,” the anniversary of two people becoming friends on Facebook. You’ll also be able to see a special video on your birthday.

There are two Problems with Facebook:

Ken Rudin states that companies who rely on Big Data often owe their frustration to two mistakes:

  1. They rely too much on one technology, like Hadoop. Facebook relies on a massive installation of Hadoop software, which is a highly scalable open-source framework that uses bundles of low-cost servers to solve problems. The company even designs its in-house hardware for this purpose. Mr. Rudin says, “The analytic process at Facebook begins with a 300 petabyte data analysis warehouse. To answer a specific query, data is often pulled out of the warehouse and placed into a table so that it can be studied. The team also built a search engine that indexes data in the warehouse. These are just some of the many technologies that Facebook uses to manage and analyze information.”
  2. Companies use big data to answer meaningless questions. Mr. Rudin also says, “At Facebook, a meaningful question is defined as one that leads to an answer that provides a basis for changing behavior. If you can’t imagine how the answer to a question would lead you to change your business practices, the question isn’t worth asking.”

Thank You!

--

--