After reading a post on someone visualising their Netflix viewing history, I thought to myself, what better way of showing what I learnt than applying my newfound knowledge on analysing and visualising my Youtube history. SPOILER: Turns out BTS wasn't my top :(
To collect my Youtube Data, I went to https://takeout.google.com/settings/takeout to extract my Youtube Search and Watch History including all my subscription data.
To begin with, first install the compulsory packages by running these commands. Also, it would be good for you to do this in a virtual environment so that it doesn't mess up your current dependencies.
pip install streamlit pip install numpy pip install matplotlib pip install pandas
Now import the packages at the start of the python file
import pandas as pd import streamlit as st import numpy as np import matplotlib.pyplot as plt import json from matplotlib.colors import Normalize from wordcloud import WordCloud, ImageColorGenerator
The cool thing about Streamlit is that it automatically renders everything for you! So when I want to create a title for the web app, I can simply type in:
st.title('Youtube Data Visualisation') st.subheader("Seach History Analysis:Word Cloud")
For extracting the data, I used pandas to extract certain columns and transform it to more detailed columns that I can work with. Therefore, I can easily use the GroupBy function later on to count how many times I watch Youtube on a Yearly, Monthly and Daily basis.
df= pd.read_json('history/search-history.json') search_df = pd.read_json('history/search-history.json') search_df['time']=pd.to_datetime(search_df['time']) search_df['title']=search_df['title'].str.replace('Searched for ', '') search_df['Year'],search_df['Month']=search_df['time'].dt.year,search_df['time'].dt.month_name() search_df['Day']=search_df['time'].dt.day search_df['Day_of_week']=search_df['time'].dt.day_name()
To generate a wordcloud of my top searched words I used the wordcloud package to help me
wordcloud2 = WordCloud(background_color="white").generate(' '.join(search_df['title']))
I can now plot the wordcloud onto my streamlit app like this!
plt.imshow(wordcloud2, interpolation='bilinear') plt.axis("off") plt.show() plt.savefig('wordcloud.png') st.image('wordcloud.png', format='PNG')
A Wordcloud of my top searched words
Now let's look at my Search History Frequency per year. I chose a pie chart to show this as it clearly differentiates the differences in frequencies per year.
Pie Chart of My Search Frequency per Year
labels = piechart_labels sizes = year_freq explode = (0, 0.1, 0, 0) # only "explode" the 2nd slice fig1, ax1 = plt.subplots() colors = ['#F0725C','#F6AC5A','#80CEBE','#B9C0EA'] ax1.pie(sizes, explode=explode,colors=colors, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90) #ax1.pie(sizes, explode=explode, labels=labels, shadow=True,autopct='', startangle=90) ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle. st.pyplot()
In order to clean the data from the "Time" Column I use pandas to extract the Year, Month, Day and Hours of the "Time" values found in the column.
watch_hist['time']=pd.to_datetime(watch_hist['time']) watch_hist['Year'],watch_hist['Month']=watch_hist['time'].dt.year,watch_hist['time'].dt.month_name() watch_hist['Day']=watch_hist['time'].dt.day watch_hist['Day_of_week']=watch_hist['time'].dt.day_name()
st.write(watch_hist['Year'].value_counts())
This is where I can get my Yearly Watch Frequency
Now if we tried to select our monthly Watch Frequency in the year of 2019, we can use the following code below.
year_2019=watch_hist.loc[watch_hist['Year'] == 2019] monthly_freq=year_2019['Month'].value_counts() #order months correctly new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] #Sort out months monthly_freq = monthly_freq.reindex(new_order, axis=0) #Plot the bar chart ax = monthly_freq.plot(kind='bar',figsize=(14,8),title="Monthly Frequency in 2019",color=colors) ax.set_xlabel("Months") ax.set_ylabel("Frequency watch times") plt.savefig('monthly_freq.png') st.write(ax.figure)
Bar Chart of my Monthly Watch Frequency (2019)
Voila! The streamlit library will help you render the bar chart on your web app!
Cheat Sheet for Streamlit
Congratulations for getting to the end of this short Tutorial about Streamlit for dashboarding. Next time we will look into creating a menu for our dashboard as well and make our dashboard even more dynamic!
Email us directly at hello@sigmaschool.co!
Want to learn to find out more about what we do?
Learn more here: https://sigmaschool.co
Let’s get social! Find us on:
Facebook: https://www.facebook.com/joinsigma/
Instagram: https://www.instagram.com/joinsigma/
Linkedin: https://linkedin.com/company/79085028/