• IT Specialist
  • Who I am
  • Blog

Kopfknacker

~ breaking my head...

Tag Archives: multinomial naive bayes

Analysing DATA2 – Star Trek and Predict Who Said What via Multinomial Naive Bayes

24 Sonntag Jan 2016

Posted by Christoph Diefenthal in Artificial Intelligence, Data Analytics, Technologie

≈ Leave a Comment

Tags

AI, artificial intelligence, machinelearning, multinomial naive bayes, naive bayes, nlp, picard, star trek, stng, tfidf

In Part 1 of analysing the „Star Trek: The Next Generation“ transcripts, I performed some statistical analysis of the characters and episodes: who has the most lines and appears in which episodes, etc.

In my new Ipython notebook I concentrated on the text itself. This was actually my first motivation to work with the Star Trek: The Next Generation transcripts from chakoteya.net: I wanted to try out some machine learning algorithms. I came up with the idea to predict which STNG character is the speaker of a text-line or just any word.

Predicting Who Said What

I would say the results are pretty convincing if you look at some phrases:

  • „My calculations are correct“ is ascribed to Data with 78% probability.  
  • Who would not have thought, that it is Troi uttered a sentence like „Captain, I’m sensing a powerful mind.“ with 73% probability
  • And who would use the word „Mom“? Obviously Wesley with 88% probability.
  • Where instead „Mother“ is a word used by Deanna Troi with 60% probability.
  • But „Deanna!“ is used by Riker, not that exclusively (just 48% probability)
  • And he is called „Number One“ by no other than Picard with almost 100% probability

Some more examples:

some_predictions

Also, the characters most used words are very descriptive for the characters, as we know them:

most_used_words

But have a look for yourself.

How to get there

To do all this, there where some steps included, which have been a real good practice in python, numpy, pandas and sklearn.

  1. I had to download and clean the data, which was a good practice in startrekng-episodes-analysis_01.ipynb.
  2. I did some statistical analysis of dataset with python, numpy and pandas in startrekng-episodes-analysis_02.ipynb.
  3. Finally we arrived in startrekng-episodes-analysis_03.ipynb, where I concentrate on predicting the speakers with the use of 2 algorithms: the „Term Frequency – Inverse Document Frequency“  and „Multinomial Naive Bayes“ („sklearn.feature_extraction.text.TfidfVectorizer“ and „sklearn.naive_bayes.MultinomialNB“)

Practical Background

To do all this, I learned a lot from the „pandas“-book and from the scikit-learn-examples like the MLComp-text-classification. And obviously almost nothing could have been done without stackoverflow.

Theoretical Background

If anyone is interested in the foundation of the algorithms, I recommend the coursera MOOC „Probabilistic Graphical Models“ by Stanford Professor Daphne Koller and the „Natural Language Processing“ course by Dan Jurafsky and Christopher Manning. Also the udacity course about Machine Learning is pretty helpful.

Categories

  • Artificial Intelligence
  • Data Analytics
  • Innovation
  • Leadership
  • Learning
  • Motivation
  • Organisation
  • Philosophical
  • Technologie
  • Uncategorized
  • User Interface

Tags

3D 3D Drucker AI anfänger artificial intelligence aufmerksamkeit begreifen biblionetz blog deeplearning delegieren denkfehler dueck early adopters erfindung erwartungen führung gedanken gelassenheit hüther innovation intelligenz ki konstruktivismus konzepte lernen machinelearning motivation multitouch organisieren programmieren real schreiben sinek software softwareentwicklung statistik thebrain triz vertrauen virtuell wahrheit wissen worte zukunft

Last Posts

  • Auf zu neuen Welten
  • Neural Network – really easy explained – I mean: really!
  • Analysing DATA2 – Star Trek and Predict Who Said What via Multinomial Naive Bayes
  • Analyzing DATA – Pandas, Python and Star Trek: The Next Generation
  • No one can tell you you can’t learn about yourself!

Archive

  • September 2018
  • März 2016
  • Januar 2016
  • Oktober 2015
  • August 2015
  • Juni 2015
  • Februar 2015
  • Januar 2015
  • Dezember 2014
  • November 2014
  • September 2014
  • August 2014
  • Juli 2014
  • Juni 2014
  • März 2014
  • Februar 2014
  • Januar 2014
  • November 2013
  • Oktober 2013
  • August 2013
  • Juli 2013
  • Juni 2013
  • Mai 2013

Meta

  • Anmelden
  • Feed der Einträge
  • Kommentare-Feed
  • WordPress.org

Tweets

Meine Tweets

Proudly powered by WordPress Theme: Chateau by Ignacio Ricci.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Notwendig immer aktiv

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Nicht notwendig

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.