Full-text links:

Download:

Current browse context:

cond-mat

References & Citations

Bookmark

(what is this?)
CiteULike logo Connotea logo BibSonomy logo del.icio.us logo Digg logo Reddit logo

Condensed Matter > Statistical Mechanics

Title: Entropic analysis of the role of words in literary texts

Abstract: Beyond the local constraints imposed by grammar, words concatenated in long sequences carrying a complex message show statistical regularities that may reflect their linguistic role in the message. In this paper, we perform a systematic statistical analysis of the use of words in literary English corpora. We show that there is a quantitative relation between the role of content words in literary English and the Shannon information entropy defined over an appropriate probability distribution. Without assuming any previous knowledge about the syntactic structure of language, we are able to cluster certain groups of words according to their specific role in the text.
Comments: 9 pages, 5 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Computation and Language (cs.CL)
Cite as: arXiv:cond-mat/0109218v1 [cond-mat.stat-mech]

Submission history

From: Dami\'an H. Zanette [view email]
[v1] Wed, 12 Sep 2001 18:08:07 GMT (180kb)