Automatic Extraction of Topics from Documents: Five Probabilistic Topic Model Tests |
( Volume 2 Issue 11,November 2016 ) OPEN ACCESS |
Author(s): |
Sandra Jhean-Larose, Nicolas Leveau, Guy Denhiere , Ba-Linh Nguyen |
Abstract: |
In this paper, we test the capability of the Topic model to extract topics from documents (Griffiths &Steyvers, 2003, 2004; Griffiths, Steyvers&Tenenbaum, 2007). After presenting the mathematical aspects of the model and demonstrating its behavior on a small corpus, we attempt to falsify the model by manipulating (i) the size and similarities between the sub-corpora, (ii) the relative weight of sub-corpora,and (iii) the permeability to the scope and nature of contexts added to a fixed corpus. The model successfully passed our five tests, demonstrating that first, extracted topics were relevant and congruent to the content of the corpus, and second, that their probability appropriately reflected the relative weight of sub-corpora.
|
Paper Statistics: |
Cite this Article: |
Click here to get all Styles of Citation using DOI of the article. |