Linkedin Connection Analysis using Python and Streamlit — Part 1

emmy adigun
5 min readJul 28, 2022
Image Source: Linkedin — https://www.linkedin.com/company/streamlit/

In this tutorial, we will learn how to extract your personal data from the Linkedin account, use Python to analyze and derive insights and finally automate the analysis using Streamlit. Yipee!! Sounds interesting right? Let’s move!!!

Our focus in this Part is to analyze your Linkedin connection with Python. First things first, If you don’t have a Linkedin account, It’s simple you can go to Linkedin page to create one. Linkedin gives you access to your data and you can download and analyze this data to draw insights from it.

Downloading your Linkedin Account data
Check out Linkedin’s clear guide as to how to download your account data. Alternatively, you can follow the process below:

The easiest and fastest way to obtain a copy of your LinkedIn data is to initiate a data download from your Settings & Privacy page.

  1. Click the Me icon(the icon that shows your profile photo) at the top of your LinkedIn homepage.
  2. Select Settings & Privacy from the dropdown.
  3. Click the Data Privacy on the left rail.
  4. Under the How LinkedIn uses your data section, click Get a copy of your data.
  5. Select the data that you’re looking for and Request archive.

You can select specific categories of data or a larger download. If you select a specific type of data, you’ll receive an email within minutes. If you select the larger download, you’ll receive an email within 24 hours. Use the link provided in the email to download the information you requested. The data will be available for download for 72 hours.

Select the data you want to analyse

In this tutorial, we will be using the Connections data. Feel free to also analyse anything else that you are curious about.

Import Libraries and Load Dataset

To begin, launch your jupyter notebook and import the necessary libraries, load our dataset, and print out the first 20 records.

The figure above shows the output of my connections to date. Let’s check out the number of connections I have on Linkedin

The number of linkedin connections I have stands at 1,214

Insights
How is my connection activity over time

There are a few spikes in my connection activities, there are periods I had a huge spike in my connections(07-June-2019) and periods the spike dropped (12-Jan-2020).

Next is to find out where our connections are working.

We analyze the ‘Company ’ column using the ‘groupby()’ function to group our data and use the ‘count()’ function to count how many of our Connections work in the various companies.

Let’s sort these values in descending order, by using the sort_values() function by setting ascending=False. We will sort it by the count of the ‘Connected On’ column.

You can also sort it by any other variable like First Name, Last Name, or Position.

From the figure above, we can see that most of my connections are working at Interswitch Group, Venture Garden Group, Microsoft, Freelancers, Flutterwave etc

Let’s visualize our data using Plotly for better insights.

Let’s use Tree Plot in Plotly to have a better visualization of our connections company analysis.

Treemap gives us a better view. The size of each company box represents the size of the connections working at that particular company.

When you plot the Tree Map, you can hover on the boxes to have a better view of the individual companies and the number of connections working there.

Which Positions do my connections hold?

Let’s now try to find out which specific positions our connections are occupying.

From the above, we can see that most of my connections are Software Engineers, Founder, Product designer, Software developer.
If you observed after the Position “Software Developer” there appears to be a break. We can’t see all of the connections. What we can do is find the positions that are having more than 20% connections.

From below, I am going to count all the number of positions and find the percentage of each position, and also give a condition to make the selection (e.g. I can find all the positions that are having more than 20% connections).

Let’s visualize this with Plotly.

What if we try out Word Cloud?

Let’s use WordCloud to have a better view.

We define a function called CreateWordCloud, which will take in a text and generate a wordcloud based on the text data fed to it.

This looks quite better right? Yeah, I think so too.

Conclusion

Now in this tutorial, we have used the Connections data and analyzed it, and drawn some insights from it.

You can download any different type of your Linkedin data and perform a similar analysis.

In Part 2, we will discuss how to automate and deploy your app on Streamlit.

References

https://www.linkedin.com/pulse/how-properly-analyze-your-personal-linkedin-data-withpython-tds/

--

--