Analyse your Personal Facebook Data with Python

emmy adigun
Analytics Vidhya
Published in
5 min readDec 17, 2020

--

image credit: towardsdatascience.com

I have been trying to do some stuff on social media analysis beginning with Twitter but I have been having challenge creating Twitter dev account API…so I switched to Facebook and tried to analyze my post count since joining Facebook in 2009….yeah..that’s like 11 years ago..interesting!!..follow me!!

  1. Download your Facebook data

Facebook permits us to download as file every activity done on the site. You can download your data here. But you may not need all of the selected/checked items — that file could be large, depending on how often you use Facebook and the duration. Mine is 1MB.

We will answer these questions while analyzing our post

  • How often do I post
  • Am I using Facebook more or less than I used to?

Going back to our data download Facebook page. We will deselect all options and check off only posts. We will also change the requested file format from HTML to JSON, then we hit the Create File.

After the file creation, Facebook then notifies you that the file is ready for download. This could take a while depending on your Facebook history.

Download the zip file, unzip it and look for the ‘posts’ folder and a file in it called ‘your_posts_1.json’.

2. Data Cleaning

Now it’s time to get our hands dirty..lool. Fire up Jupiter Notebooks or VS code and we start by importing our libraries. We import Pandas and also our Facebook JSON file we just downloaded reading it into a DataFrame.

The built-in pd.read_json() function will be able to interpret our JSON data in a DataFrame automatically

Our focus here isn’t on the actual post content or attached media files. We are concerned with frequency — how often new posts are being made.

Next is to clean our data. We do this by dropping some of the unnecessary columns. Some rows say NaN — these were posts that included only images, no text. The images would be linked in the ‘attachments’ column. We will also rename the ‘timestamp’ column to ‘date’

At this stage, we have a little cleaner data to work with. We can get some information about the data we want to analyze.

df.info() will give you short and summarised information about the data

df.shape will count the rows of the posts we are analysing

df.head() will get you the top 5 rows of the data

df.tail() will check the end of the data frame

In the case of my personal Facebook data, I have posted 248 times in total…Waow!!. My data begins back in 2009.

3. Figuring out Monthly Post Count

Looking at our data month-to-month would make much sense. This means we need to group our ‘date’ column by months and count how many rows(i.e. posts) are associated with each month.

This is an example of time-series data and pandas is designed to make it relatively simple to work with. We need to do two things here:

  1. Set the date column as the index of our DataFrame
  2. Resample the data by month, counting how many posts occur in each month

For the first step, we can use set_index(). For the second step, we’ll do the following

  1. Select the column we want to resample by — the ‘data’ column
  2. Use the .resample() function with the argument ‘MS’ (for “Month Start”) to resample our data by month
  3. Use .size() to specify what we want to measure each month — in this case, the number of rows(i.e, posts) with a post date that falls within that month.
  4. Assign the resulting series to a variable called post_counts.

We’ve got our post counts broken down by month, and if we confirm from the original data set, we can see the counts are correct.

Note that months with no posts have been correctly counted as 0 rather than simply skipped.

4. Visualize your Facebook Usage

Now it’s time to visualize our data. To do this we’ll import matplotlib(and use %matplotlib inline magic to make our chart appear in the Jupiter Notebook. We’ll also import Seaborn and NumPy, which will help us make a more readable-looking chart.

Next, we’ll use sis.set() to set the size and font size or our chart

Then, we’ll set the x labels to use the index of post_counts (the dates), and use sns.barplot() to create a bar chart. In the arguments for sns.barplot(), we’ll tell the function to use the x labels we defined, to plot the data in post_counts, and to make the bar color blue.

That alone would be enough to create a basic chart, but in this case, we’ll want to take a few additional steps to make the chart readable. Specifically, we’ll want to arrange the tick positions on the x-axis to once every 24 months, so that we see a tick every other year in the resultant chart. We’ll also want to reformat the dates in the chart so that only the year is displayed.

The chart above is my personal Facebook data. I rarely posted in the early days I joined the platform in 2009. My regular usage began in 2011 and reached peak levels around 2013. I remember I was trying to promote a small mobile app I developed..lool.Between 2014–2016 there was a decline and little spike. I had a good spike in 2017. Though not up to the one in 2011. After that my usage died down.
And remember, that’s just posts, not comments! There’s a whole other JSON file for comments.

I will stop here. You can also try analyzing your Facebook usage and see if it’s better than mine. Here is the summary of what we’ve done:

  • We downloaded personal usage data from Facebook(Posts)
  • We read the JSON file into a pandas DataFrame
  • We broke the data down by month and counted the number of posts each month
  • We visualized the Facebook usage

Resources

https://www.dataquest.io/blog/analyze-facebook-data-python/

--

--