![]() He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. Some scholars consider Hapax legomena useful in determining the authorship of written works. Hapax legomena also pose challenges in natural language processing. ![]() For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew see § Hebrew examples) hapax legomena sometimes pose problems in translation. Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on. Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively ( / ˈ d ɪ s/, / ˈ t r ɪ s/, / ˈ t ɛ t r ə k ɪ s/) refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "being said once". The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works but more than once in that particular work. hapax legomena sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. In corpus linguistics, a hapax legomenon ( / ˈ h æ p ə k s l ɪ ˈ ɡ ɒ m ɪ n ɒ n/ also / ˈ h æ p æ k s/ or / ˈ h eɪ p æ k s/ pl. Zipf's law predicts that the words in this plot should approximate a straight line with slope -1. About 17%, such as "dexterity", appear twice (so-called dis legomena, in blue). About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). Rank-frequency plot for words in the novel Moby-Dick.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |