Oh, it is on, like a prawn who yawns at dawn.
- Andy Bernard
PrepDB.ipynb
Structure
TV Show Name
│
└───Season 1
│ │───Episode 1.mp4
│ │───Episode 2.mp4
│ │ ...
│ │───Episode 24.mp4
│ │───Episode 1.srt
│ │───Episode 2.srt
│ │ ...
│ │───Episode 24.srt
│
└───Season 2
│ │ ...
│
└───Season 3
...
I’ve use .mp4
and .srt
here, but they can be other formats too.
The first cell in PrepDB.ipynb
simply reads the names of these files. The rest of this notebook reads individual subtitles, cleans them and then streams them into BigQuery line-by-line.
3 episodes in the GIF above were about 45,000 rows/subtitles.
BigQuery table schema
MakeGIF.ipynb
- Just set the variable name
selected_dialogue
as the word/dialog you want to search for.- You will be asked which dialog you want to base your GIF on.
Check out jeevz.py
where I made a chat bot to ask for GIFs!
I used this and this to make the chatbot. You should probably write your own, mine breaks easily.
These are easy enough to offer:
–> Also, in BigQuery, implement a UDF for Levenshtein distance/ Cosine similarity so that the user need not remember it word for word
Any cloud based solution will basically need episodes available online to be downloaded ino an execution env