|This article needs additional citations for verification. (October 2013)|
A grapheme is the smallest semantically distinguishing unit in a written language, analogous to the phonemes of spoken languages. A grapheme may or may not carry meaning by itself, and may or may not correspond to a single phoneme. Graphemes include alphabetic letters, typographic ligatures, Chinese characters, numerical digits, punctuation marks, and other individual symbols of any of the world's writing systems.
A grapheme is an abstract concept, similar to a character in computing. A glyph is a specific shape that represents that grapheme, in a specific typeface. For example, the abstract concept of "the Arabic numeral one" is a grapheme, which would have two different glyphs (allographs) in the fonts Times New Roman and Helvetica.
Graphemes are often notated within angle brackets, as ⟨a⟩, ⟨B⟩, etc. This is analogous to the slash notation (/a/, /b/) used for phonemes, and the square bracket notation used for phonetic transcriptions ([a], [b]).
Glyphs and allographs
In the same way that the surface forms of phonemes are speech sounds or phones (and different phones representing the same phoneme are called allophones), the surface forms of graphemes are glyphs (sometimes "graphs"), namely concrete written representations of symbols, and different glyphs representing the same grapheme are called allographs. Hence a grapheme can be regarded as an abstraction of a collection of glyphs that are all semantically equivalent.
For example, in written English (or other languages using the Latin alphabet), there are many different physical representations of the lowercase letter "a", such as a, ɑ, etc. But because the substitution of any of these for any other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written ⟨a⟩. Italic and bold face are also allographic.
There is some disagreement as to whether capital and lower-case letters are allographs or distinct graphemes. Capitals are generally found in certain triggering contexts which do not change the word: When used as a proper name, for example, or at the beginning of a sentence, or all caps in a newspaper headline. Some linguists consider digraphs like the ⟨sh⟩ in ship to be distinct graphemes, but these are generally analyzed as sequences of graphemes. Ligatures, however, such as ⟨æ⟩, are distinct graphemes, as are various letters with distinctive diacritics, such as ⟨ç⟩.
Types of graphemes
The principal types of phonographic graphemes are logograms, which represent words or morphemes (for example Chinese characters, the ampersand & representing the English word and, Arabic numerals); syllabic characters, representing syllables (as in Japanese kana); and alphabetic letters, corresponding roughly to phonemes (see next section). For a full discussion of the different types, see Writing system: Functional classification of writing systems.
Not all graphemes are phonographic (write sounds). There are additional graphemic components used in writing, such as punctuation marks, mathematical symbols, word dividers such as the space, and other typographic symbols.
Correspondence between graphemes and phonemes
As mentioned in the previous section, in languages that use alphabetic writing systems, the graphemes stand in principle for the phonemes (significant sounds) of the language. In practice, however, the orthographies of such languages entail at least a certain amount of deviation from the ideal of exact grapheme–phoneme correspondence. A phoneme may be represented by a multigraph (sequence of more than one grapheme), as the digraph sh represents a single sound in English (and sometimes a single grapheme may represent more than one phoneme, as with the Russian letter я). Some graphemes may not represent any sound at all (like the b in English debt), and often the rules of correspondence between graphemes and phonemes become complex or irregular, particularly as a result of historical sound changes that are not necessarily reflected in spelling. "Shallow" orthographies such as those of standard Spanish and Finnish have relatively regular (though not always one-to-one) correspondence between graphemes and phonemes, while those of French and English have much less regular correspondence, and are known as deep orthographies.
Multigraphs representing a single phoneme are normally treated as combinations of separate letters, not as graphemes in their own right. However in some languages a multigraph may be treated as a single unit for the purposes of collation; for example, in a Czech dictionary, the section for words that start with ⟨ch⟩ comes after that for ⟨h⟩. For more examples, see Alphabetical order: Language-specific conventions.
- The Cambridge Encyclopedia of Language, second edition, Cambridge University Press, 1997, p. 196
- Zeman, Dan. "Czech Alphabet, Code Page, Keyboard, and Sorting Order". Old-site.clsp.jhu.edu. Retrieved 31 March 2012.