Intro to SQL with 5000 movies¶

This notebook is a part of a talk i have been giving to introduce people to SQL. Only pre-requisite to attend this talk is some general curiosity.

Acknowledgements¶

This dataset was generated from The Movie Database API. This product uses the TMDb API but is not endorsed or certified by TMDb. We used the dataset published on kaggle that includes 5000 movies

Survey¶

Experience with SQL¶

- [ ] I prefer No SQL- I use it few times a month

[ ] I wish to use it few times a month - [ ] it's my SUPERPOWER - [ ] I want to name my child Robert'; DROP TABLE Students; Ref xkcd

import os

1.¶

SQLite¶

We are using the sqlite database for this exercise. I have loaded the movies into a sqllite database.

import sqlite3

db_path = os.path.expanduser('~/temp/intro_to_sql/moviesT500_v2.db')
conn = sqlite3.connect(db_path) 
c = conn.cursor()

2.¶

pandas¶

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language

import pandas as pd

3.¶

Plotly¶

Plotly's Python graphing library makes interactive, publication-quality graphs.

import plotly.express as px

from IPython.display import display
def display_all(df):
    with pd.option_context("display.max_rows", 1000, "display.max_columns", 300,'display.max_colwidth', -1):
        display(df)
# 100% width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

4.¶

let's take a peek into the dataset's and limit it to just 5 records¶

this will help us to look at all the columns and what values are represented in them.

sql_query = """
select * from tmdb_5000_movies
limit 5
"""
pd.read_sql_query(sql_query, conn)

5.¶

Let's look at the max and min release date for all the movies in our database¶

sql_query = """
select  min(release_date), max(release_date) from tmdb_5000_movies 
"""
pd.read_sql_query(sql_query, conn)

first movie was released in 1916 and the last one was released in 2017

6.¶

Let's count the number of movies that was released in each year¶

sql_query = """
select 
date(release_date, 'start of year') as year,
count(*) as movies
from tmdb_5000_movies 
group by
date(release_date, 'start of year')
"""
df = pd.read_sql_query(sql_query, conn)

let's plot this data onto a histogram/bar chart

fig = px.bar(
    df,
    x='year',
    y='movies',
)
fig.show()

7.¶

let's play some movie trivia with the column "keywords"¶

we narrow down to a movie with a few keywords ex. spider can be to narrow down to spiderman movies

sql_query = '''
SELECT * from tmdb_5000_movies 
WHERE 
keywords LIKE '%spider%' 

limit 5
'''


df = pd.read_sql_query(sql_query, conn)
df

8.¶

can you tweak the above SQL code to narrow down to your favorite movies.¶

	budget	genres	homepage	id	keywords	original_language	original_title	overview	popularity	production_companies	production_countries	release_date	revenue	runtime	spoken_languages	status	tagline	title	vote_average	vote_count
0	237000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://www.avatarmovie.com/	19995	[{"id": 1463, "name": "culture clash"}, {"id":...	en	Avatar	In the 22nd century, a paraplegic Marine is di...	150.437577	[{"name": "Ingenious Film Partners", "id": 289...	[{"iso_3166_1": "US", "name": "United States o...	2009-12-10	2787965087	162	[{"iso_639_1": "en", "name": "English"}, {"iso...	Released	Enter the World of Pandora.	Avatar	7.2	11800
1	300000000	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	http://disney.go.com/disneypictures/pirates/	285	[{"id": 270, "name": "ocean"}, {"id": 726, "na...	en	Pirates of the Caribbean: At World's End	Captain Barbossa, long believed to be dead, ha...	139.082615	[{"name": "Walt Disney Pictures", "id": 2}, {"...	[{"iso_3166_1": "US", "name": "United States o...	2007-05-19	961000000	169	[{"iso_639_1": "en", "name": "English"}]	Released	At the end of the world, the adventure begins.	Pirates of the Caribbean: At World's End	6.9	4500
2	245000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://www.sonypictures.com/movies/spectre/	206647	[{"id": 470, "name": "spy"}, {"id": 818, "name...	en	Spectre	A cryptic message from Bond’s past sends him o...	107.376788	[{"name": "Columbia Pictures", "id": 5}, {"nam...	[{"iso_3166_1": "GB", "name": "United Kingdom"...	2015-10-26	880674609	148	[{"iso_639_1": "fr", "name": "Fran\u00e7ais"},...	Released	A Plan No One Escapes	Spectre	6.3	4466
3	250000000	[{"id": 28, "name": "Action"}, {"id": 80, "nam...	http://www.thedarkknightrises.com/	49026	[{"id": 849, "name": "dc comics"}, {"id": 853,...	en	The Dark Knight Rises	Following the death of District Attorney Harve...	112.312950	[{"name": "Legendary Pictures", "id": 923}, {"...	[{"iso_3166_1": "US", "name": "United States o...	2012-07-16	1084939099	165	[{"iso_639_1": "en", "name": "English"}]	Released	The Legend Ends	The Dark Knight Rises	7.6	9106
4	260000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://movies.disney.com/john-carter	49529	[{"id": 818, "name": "based on novel"}, {"id":...	en	John Carter	John Carter is a war-weary, former military ca...	43.926995	[{"name": "Walt Disney Pictures", "id": 2}]	[{"iso_3166_1": "US", "name": "United States o...	2012-03-07	284139100	132	[{"iso_639_1": "en", "name": "English"}]	Released	Lost in our world, found in another.	John Carter	6.1	2124

	budget	genres	homepage	id	keywords	original_language	original_title	overview	popularity	production_companies	production_countries	release_date	revenue	runtime	spoken_languages	status	tagline	title	vote_average	vote_count
0	258000000	[{"id": 14, "name": "Fantasy"}, {"id": 28, "na...	http://www.sonypictures.com/movies/spider-man3/	559	[{"id": 851, "name": "dual identity"}, {"id": ...	en	Spider-Man 3	The seemingly invincible Spider-Man goes up ag...	115.699814	[{"name": "Columbia Pictures", "id": 5}, {"nam...	[{"iso_3166_1": "US", "name": "United States o...	2007-05-01	890871626	139	[{"iso_639_1": "en", "name": "English"}, {"iso...	Released	The battle within.	Spider-Man 3	5.9	3576
1	215000000	[{"id": 28, "name": "Action"}, {"id": 12, "nam...	http://www.theamazingspiderman.com	1930	[{"id": 1872, "name": "loss of father"}, {"id"...	en	The Amazing Spider-Man	Peter Parker is an outcast high schooler aband...	89.866276	[{"name": "Columbia Pictures", "id": 5}, {"nam...	[{"iso_3166_1": "US", "name": "United States o...	2012-06-27	752215857	136	[{"iso_639_1": "en", "name": "English"}]	Released	The untold story begins.	The Amazing Spider-Man	6.5	6586
2	139000000	[{"id": 14, "name": "Fantasy"}, {"id": 28, "na...	http://www.sonypictures.com/movies/spider-man/	557	[{"id": 1014, "name": "loss of lover"}, {"id":...	en	Spider-Man	After being bitten by a genetically altered sp...	82.502566	[{"name": "Columbia Pictures", "id": 5}, {"nam...	[{"iso_3166_1": "US", "name": "United States o...	2002-05-01	821708551	121	[{"iso_639_1": "en", "name": "English"}]	Released	With great power comes great responsibility.	Spider-Man	6.8	5265
3	85000000	[{"id": 35, "name": "Comedy"}, {"id": 10751, "...	None	9986	[{"id": 1701, "name": "hero"}, {"id": 3875, "n...	en	Charlotte's Web	Wilbur the pig is scared of the end of the sea...	14.132583	[{"name": "Paramount Pictures", "id": 4}, {"na...	[{"iso_3166_1": "DE", "name": "Germany"}, {"is...	2006-12-15	144000000	97	[{"iso_639_1": "en", "name": "English"}]	Released	Something unexpected, unbelievable, unforgetta...	Charlotte's Web	5.8	288
4	40000000	[{"id": 12, "name": "Adventure"}, {"id": 14, "...	None	24418	[{"id": 3133, "name": "vampire"}, {"id": 3986,...	en	Cirque du Freak: The Vampire's Assistant	Darren Shan is a regular teenage kid. He and h...	16.424807	[{"name": "Universal Pictures", "id": 33}]	[{"iso_3166_1": "US", "name": "United States o...	2009-10-23	28169671	109	[{"iso_639_1": "en", "name": "English"}]	Released	Meet Darren. He's sixteen going on immortal.	Cirque du Freak: The Vampire's Assistant	5.5	348