A Detailed Stylometric Investigation of the İnce Memed Tetralogy*
Abstract
We analyze four İnce Memed novels of Yaşar Kemal using six style markers: “most frequent words,” “syllable counts,” “word type -or part of speech- information,” “sentence length in terms of words,” “word length in text,” and “word length in vocabulary.” For analysis we divide each novel into five thousand word text blocks and count the frequencies of each style marker in these blocks. The principal component analysis results show clear separation between the first two and the last two volumes; the blocks of the first two novels are also distinguishable from each other. The blocks of the last two volumes are intermixed. This parallels the fact that the author planned the last two volumes as three separate novels, but later condensed them into two. The style markers showing the best separation are “most frequent words” and “sentence length”. We use stepwise discriminant analysis to determine the best discriminators of each style marker and then use them in cross validation. The related results concur with the principal component analysis results. For example, the cross validation results obtained by “most frequent words” and “sentence length,” respectively, provide 87% and 81% correct classification of the text blocks to their corresponding volumes. Further investigation based on multiple analysis of variance (MANOVA) reveals how the attributes of each style marker group distinguish among the volumes.
The following license files are associated with this item: