Tags

, , , ,

In the last couple of month I worked on getting my head around Numpy, Python and Pandas. Before I get into the technical challenges and talk about the steep learning curve in a following blog post – it is first frustrating but than en-lighting :-) – I need to show some results first!

I thought, before I am the 1 millionth person working on a Kaggle project, I try to get my own data set to play with…
So I came up with the idea of analyzing the transcripts of STNG. I did not have to google very long and I found some nice looking transcripts at chakoteya.net. I did some web scraping to download all the text files, and put them into a Pandas DataFrame.

Thanks to the author of the transcripts: I had to do a little data cleaning – some misspellings here, deleting some line breaks there… But there wasn’t much necessary. It’s pretty good quality!

Long story short: Have a look! Here are some examples:

The „line-pie“: the distribution of spoken lines for the 25 characters with the most spoken lines in STNG:

Picard hat obviously a lot to say…

the lines-pie

 

The number of episodes a character had the most lines in:

Picard not suprisingly dominated 76 episodes. But who was K’EHLEYR again ??

PICARD       76 episodes
DATA         20
RIKER        16
LAFORGE      10
CRUSHER       9
WORF          8
TROI          4
LWAXANA       3
BARCLAY       2
WESLEY        2
K'EHLEYR      2 (who was that again??)
CLARA         1
CLEMENS       1
CONOR         1
JEV           1
ARMUS         1
DURKEN        1
FAJO          1
AMANDA        1
JAMESON       1
JELLICO       1
MADRED        1
MARR          1
OKONA         1
PICARD JR     1
Q             1
RAL           1
RASMUSSEN     1
RIKER 2       1
RO            1
SALIA         1
SCOTT         1
SITO          1
SPOCK         1
ALKAR         1

Picard lost his words

In the last 50 episodes Picard had more episodes with far less spoken lines than average.

picard-had-more-episodes-with-fewer-lines-in-the-end

The „Crusher-Pulaski-Gap“ – Episodes 26 to 47:

And she was never seen afterwards?

the crusher-pulaski-gap

 

„The Timescape“-Epsiode

The one where three main characters at once talk highly over average:

the-timescape-episode

What you always wanted to know about Wesley

evaluating-wesley

And more, and more and more diagrams and insights

in the IPython notebook  github.com/…/startrekng-episodes-analysis_02.ipynb

Have fun! Any feedback is more than welcome!