blog thumbnail
self-taught
tech

Visualising My Youtube Viewing History

Sigma School
16th August 2023

After reading a post on someone visualising their Netflix viewing history, I thought to myself, what better way of showing what I learnt than applying my newfound knowledge on analysing and visualising my Youtube history. SPOILER: Turns out BTS wasn't my top :( 

report of history activity on Youtube account

To collect my Youtube Data, I went to https://takeout.google.com/settings/takeout to extract my Youtube Search and Watch History including all my subscription data.

To begin with, first install the compulsory packages by running these commands. Also, it would be good for you to do this in a virtual environment so that it doesn't mess up your current dependencies.

pip install streamlit
pip install numpy
pip install matplotlib
pip install pandas

Now import the packages at the start of the python file

import pandas as pd
import streamlit as st
import numpy as np
import matplotlib.pyplot as plt
import json
from matplotlib.colors import Normalize
from wordcloud import WordCloud, ImageColorGenerator

The cool thing about Streamlit is that it automatically renders everything for you! So when I want to create a title for the web app, I can simply type in:

st.title('Youtube Data Visualisation')
st.subheader("Seach History Analysis:Word Cloud")

For extracting the data, I used pandas to extract certain columns and transform it to more detailed columns that I can work with. Therefore, I can easily use the GroupBy function later on to count how many times I watch Youtube on a Yearly, Monthly and Daily basis.

df= pd.read_json('history/search-history.json')
search_df = pd.read_json('history/search-history.json')
search_df['time']=pd.to_datetime(search_df['time'])
search_df['title']=search_df['title'].str.replace('Searched for ', '')
search_df['Year'],search_df['Month']=search_df['time'].dt.year,search_df['time'].dt.month_name()
search_df['Day']=search_df['time'].dt.day
search_df['Day_of_week']=search_df['time'].dt.day_name()

To generate a wordcloud of my top searched words I used the wordcloud package to help me

wordcloud2 = WordCloud(background_color="white").generate(' '.join(search_df['title']))

I can now plot the wordcloud onto my streamlit app like this!

plt.imshow(wordcloud2, interpolation='bilinear')
plt.axis("off")
plt.show()
plt.savefig('wordcloud.png')
st.image('wordcloud.png', format='PNG')

wordcloud of top searched words, with 'choreo' and 'bts' being the biggest

A Wordcloud of my top searched words

Now let's look at my Search History Frequency per year. I chose a pie chart to show this as it clearly differentiates the differences in frequencies per year.

Pie Chart of My Search Frequency per Year

Pie Chart of My Search Frequency per Year

labels = piechart_labels
sizes = year_freq
explode = (0, 0.1, 0, 0)  # only "explode" the 2nd slice 

fig1, ax1 = plt.subplots()
colors = ['#F0725C','#F6AC5A','#80CEBE','#B9C0EA']

ax1.pie(sizes, explode=explode,colors=colors, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
#ax1.pie(sizes, explode=explode, labels=labels, shadow=True,autopct='', startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

st.pyplot()

In order to clean the data from the "Time" Column I use pandas to extract the Year, Month, Day and Hours of the "Time" values found in the column.

watch_hist['time']=pd.to_datetime(watch_hist['time'])
watch_hist['Year'],watch_hist['Month']=watch_hist['time'].dt.year,watch_hist['time'].dt.month_name()
watch_hist['Day']=watch_hist['time'].dt.day
watch_hist['Day_of_week']=watch_hist['time'].dt.day_name()
st.write(watch_hist['Year'].value_counts())

This is where I can get my Yearly Watch Frequency

Watch frequency per annum, with 2020 being the lowest and 2019 is the highest

Now if we tried to select our monthly Watch Frequency in the year of 2019, we can use the following code below.

year_2019=watch_hist.loc[watch_hist['Year'] == 2019]
monthly_freq=year_2019['Month'].value_counts()
#order months correctly
new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
#Sort out months
monthly_freq = monthly_freq.reindex(new_order, axis=0)

#Plot the bar chart
ax = monthly_freq.plot(kind='bar',figsize=(14,8),title="Monthly Frequency in 2019",color=colors)
ax.set_xlabel("Months")
ax.set_ylabel("Frequency watch times")
plt.savefig('monthly_freq.png')
st.write(ax.figure)

Bar Chart of Monthly Watch Frequency in 2019

Bar Chart of my Monthly Watch Frequency (2019)

Voila! The streamlit library will help you render the bar chart on your web app!

Cheat Sheet for Streamlit

  1. st.write() - write dataframes, bar charts, pie charts and kind of figures
  2. st.title() - write a title for your Web App
  3. st.subheader() - write a subheader under your title
  4. ` - This will render markdown on your Web App

Congratulations for getting to the end of this short Tutorial about Streamlit for dashboarding. Next time we will look into creating a menu for our dashboard as well and make our dashboard even more dynamic!

Github Repo

Email us directly at hello@sigmaschool.co!

Want to learn to find out more about what we do?

Learn more here: https://sigmaschool.co

Let’s get social! Find us on:

Facebook: https://www.facebook.com/joinsigma/

Instagram: https://www.instagram.com/joinsigma/

Linkedin: https://linkedin.com/company/79085028/

Related Blogs