Intro to SQL with 5000 movies

This notebook is a part of a talk i have been giving to introduce people to SQL. Only pre-requisite to attend this talk is some general curiosity.

Acknowledgements

This dataset was generated from The Movie Database API. This product uses the TMDb API but is not endorsed or certified by TMDb. We used the dataset published on kaggle that includes 5000 movies

Survey

Experience with SQL

- [ ] I prefer No SQL- I use it few times a month

  • [ ] I wish to use it few times a month - [ ] it's my SUPERPOWER - [ ] I want to name my child Robert'; DROP TABLE Students; Ref xkcd
In [1]:
import os

1.

SQLite

We are using the sqlite database for this exercise. I have loaded the movies into a sqllite database.

In [2]:
import sqlite3

db_path = os.path.expanduser('~/temp/intro_to_sql/moviesT500_v2.db')
conn = sqlite3.connect(db_path) 
c = conn.cursor()

2.

pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language

In [3]:
import pandas as pd

3.

Plotly

Plotly's Python graphing library makes interactive, publication-quality graphs.

In [4]:
import plotly.express as px
In [5]:
from IPython.display import display
def display_all(df):
    with pd.option_context("display.max_rows", 1000, "display.max_columns", 300,'display.max_colwidth', -1):
        display(df)
# 100% width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))        

4.

let's take a peek into the dataset's and limit it to just 5 records

this will help us to look at all the columns and what values are represented in them.

In [6]:
sql_query = """
select * from tmdb_5000_movies
limit 5
"""
pd.read_sql_query(sql_query, conn)
Out[6]:
budget genres homepage id keywords original_language original_title overview popularity production_companies production_countries release_date revenue runtime spoken_languages status tagline title vote_average vote_count
0 237000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.avatarmovie.com/ 19995 [{"id": 1463, "name": "culture clash"}, {"id":... en Avatar In the 22nd century, a paraplegic Marine is di... 150.437577 [{"name": "Ingenious Film Partners", "id": 289... [{"iso_3166_1": "US", "name": "United States o... 2009-12-10 2787965087 162 [{"iso_639_1": "en", "name": "English"}, {"iso... Released Enter the World of Pandora. Avatar 7.2 11800
1 300000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... http://disney.go.com/disneypictures/pirates/ 285 [{"id": 270, "name": "ocean"}, {"id": 726, "na... en Pirates of the Caribbean: At World's End Captain Barbossa, long believed to be dead, ha... 139.082615 [{"name": "Walt Disney Pictures", "id": 2}, {"... [{"iso_3166_1": "US", "name": "United States o... 2007-05-19 961000000 169 [{"iso_639_1": "en", "name": "English"}] Released At the end of the world, the adventure begins. Pirates of the Caribbean: At World's End 6.9 4500
2 245000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.sonypictures.com/movies/spectre/ 206647 [{"id": 470, "name": "spy"}, {"id": 818, "name... en Spectre A cryptic message from Bond’s past sends him o... 107.376788 [{"name": "Columbia Pictures", "id": 5}, {"nam... [{"iso_3166_1": "GB", "name": "United Kingdom"... 2015-10-26 880674609 148 [{"iso_639_1": "fr", "name": "Fran\u00e7ais"},... Released A Plan No One Escapes Spectre 6.3 4466
3 250000000 [{"id": 28, "name": "Action"}, {"id": 80, "nam... http://www.thedarkknightrises.com/ 49026 [{"id": 849, "name": "dc comics"}, {"id": 853,... en The Dark Knight Rises Following the death of District Attorney Harve... 112.312950 [{"name": "Legendary Pictures", "id": 923}, {"... [{"iso_3166_1": "US", "name": "United States o... 2012-07-16 1084939099 165 [{"iso_639_1": "en", "name": "English"}] Released The Legend Ends The Dark Knight Rises 7.6 9106
4 260000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://movies.disney.com/john-carter 49529 [{"id": 818, "name": "based on novel"}, {"id":... en John Carter John Carter is a war-weary, former military ca... 43.926995 [{"name": "Walt Disney Pictures", "id": 2}] [{"iso_3166_1": "US", "name": "United States o... 2012-03-07 284139100 132 [{"iso_639_1": "en", "name": "English"}] Released Lost in our world, found in another. John Carter 6.1 2124

5.

Let's look at the max and min release date for all the movies in our database

In [7]:
sql_query = """
select  min(release_date), max(release_date) from tmdb_5000_movies 
"""
pd.read_sql_query(sql_query, conn)
Out[7]:
min(release_date) max(release_date)
0 1916-09-04 2017-02-03

first movie was released in 1916 and the last one was released in 2017

6.

Let's count the number of movies that was released in each year

In [8]:
sql_query = """
select 
date(release_date, 'start of year') as year,
count(*) as movies
from tmdb_5000_movies 
group by
date(release_date, 'start of year')
"""
df = pd.read_sql_query(sql_query, conn)

let's plot this data onto a histogram/bar chart

In [9]:
fig = px.bar(
    df,
    x='year',
    y='movies',
)
fig.show()

7.

let's play some movie trivia with the column "keywords"

  • we narrow down to a movie with a few keywords ex. spider can be to narrow down to spiderman movies
In [10]:
sql_query = '''
SELECT * from tmdb_5000_movies 
WHERE 
keywords LIKE '%spider%' 

limit 5
'''


df = pd.read_sql_query(sql_query, conn)
df
Out[10]:
budget genres homepage id keywords original_language original_title overview popularity production_companies production_countries release_date revenue runtime spoken_languages status tagline title vote_average vote_count
0 258000000 [{"id": 14, "name": "Fantasy"}, {"id": 28, "na... http://www.sonypictures.com/movies/spider-man3/ 559 [{"id": 851, "name": "dual identity"}, {"id": ... en Spider-Man 3 The seemingly invincible Spider-Man goes up ag... 115.699814 [{"name": "Columbia Pictures", "id": 5}, {"nam... [{"iso_3166_1": "US", "name": "United States o... 2007-05-01 890871626 139 [{"iso_639_1": "en", "name": "English"}, {"iso... Released The battle within. Spider-Man 3 5.9 3576
1 215000000 [{"id": 28, "name": "Action"}, {"id": 12, "nam... http://www.theamazingspiderman.com 1930 [{"id": 1872, "name": "loss of father"}, {"id"... en The Amazing Spider-Man Peter Parker is an outcast high schooler aband... 89.866276 [{"name": "Columbia Pictures", "id": 5}, {"nam... [{"iso_3166_1": "US", "name": "United States o... 2012-06-27 752215857 136 [{"iso_639_1": "en", "name": "English"}] Released The untold story begins. The Amazing Spider-Man 6.5 6586
2 139000000 [{"id": 14, "name": "Fantasy"}, {"id": 28, "na... http://www.sonypictures.com/movies/spider-man/ 557 [{"id": 1014, "name": "loss of lover"}, {"id":... en Spider-Man After being bitten by a genetically altered sp... 82.502566 [{"name": "Columbia Pictures", "id": 5}, {"nam... [{"iso_3166_1": "US", "name": "United States o... 2002-05-01 821708551 121 [{"iso_639_1": "en", "name": "English"}] Released With great power comes great responsibility. Spider-Man 6.8 5265
3 85000000 [{"id": 35, "name": "Comedy"}, {"id": 10751, "... None 9986 [{"id": 1701, "name": "hero"}, {"id": 3875, "n... en Charlotte's Web Wilbur the pig is scared of the end of the sea... 14.132583 [{"name": "Paramount Pictures", "id": 4}, {"na... [{"iso_3166_1": "DE", "name": "Germany"}, {"is... 2006-12-15 144000000 97 [{"iso_639_1": "en", "name": "English"}] Released Something unexpected, unbelievable, unforgetta... Charlotte's Web 5.8 288
4 40000000 [{"id": 12, "name": "Adventure"}, {"id": 14, "... None 24418 [{"id": 3133, "name": "vampire"}, {"id": 3986,... en Cirque du Freak: The Vampire's Assistant Darren Shan is a regular teenage kid. He and h... 16.424807 [{"name": "Universal Pictures", "id": 33}] [{"iso_3166_1": "US", "name": "United States o... 2009-10-23 28169671 109 [{"iso_639_1": "en", "name": "English"}] Released Meet Darren. He's sixteen going on immortal. Cirque du Freak: The Vampire's Assistant 5.5 348

8.

can you tweak the above SQL code to narrow down to your favorite movies.