Alphabets: A Brief Introduction

An alphabet is the ordered, standardized set of letters that is used to write or print a written language.

A letter is a character in an alphabet that represents one or several, alternative phonemes (i.e., the fundamental sounds of a spoken language) and/or that is used in combinations with other letters to represent one or several, alternative phonemes. A character is any letter, symbol or mark used in writing or printing a language. In addition to letters of alphabets, characters include numerals, punctuation marks and symbols used by non-alphabetic writing systems.

Alphabets are by far the most common of the several types of systems, also called scripts, that are used to write languages. Most alphabets, including that used by the English language, are based on the Roman alphabet (also referred to as the Latin alphabet), which was first used by the ancient Romans to write Latin.

Origin of Alphabets

The word alphabet is derived from the first two letters of the Greek alphabet, alpha and beta.

The oldest known writing system is cuneiform (named after the wedge-like shapes of the characters that were formed in clay tablets with reed styluses), which emerged in Sumer (in the southern part of what is now Iraq) more than 5,000 years ago. It was followed closely by the development of writing in Egypt and the Indus valley (in western India).

Most scholars believe that the first alphabets originated in the Near East, perhaps evolving from, or at least being influenced by, cuneiform or Egyptian hieroglyphics. The first widely used alphabet appears to have been that of the Phoenicians (who originated in what is now Lebanon), which was in use by at least 1,200 b.c. This alphabet contained 22 letters for consonant sounds and had no letters for vowels (as is the case with the Hebrew and Arabic alphabets, which descended from it). The Phoenicians spread their alphabet around the Mediterranean, including to the Greeks and the Etruscans (who preceded the Romans in Italy).

The Roman alphabet was adapted mainly from the Etruscan alphabet during the 7th century BC; exceptions included Y and Z, which were taken from the Greek alphabet. The Roman alphabet had 23 letters. As with the Phoenician, Greek and other early alphabets, there were no lower case (i.e., small) letters, only upper case (i.e., capital) letters. Also, there was no punctuation, and there were no spaces between words. The Romans wrote numbers with seven letters of the alphabet (i.e., Roman numerals) rather than with the Arabic numerals that are almost universally used today.

The modern Roman alphabet as used to write the English language contains 26 letters, each in an upper case and a lower case version. Other characters used by English include the Arabic numerals, punctuation marks and a variety of symbols (e.g., the ampersand, the equals sign and the dollar sign). Many other modern languages that use the Roman alphabet add a variety of accents to some of the basic letters, and some also add a few extra letters.


A phoneme is a basic sound, or range of similar sounds, that can distinguish words in a given language. That is, changing one phoneme in a word can produce another word -- or make the word unintelligible. For example, changing the first phoneme on the word cat can produce a word with a very different meaning, such as rat.

Some phonemes in a particular language can be pronounced in slightly different ways and still be recognized as those phonemes. However, the range of sounds that constitutes a single phoneme in one language may contain multiple phonemes of another language. For example, in the English language the letters l and r represent two different phonemes. Consequently, there is usually no confusion when a native speaker hears the words led and red spoken. However, in the Japanese language these are not distinctive sounds, and there is a single phoneme which includes the range of sounds between the sounds represented by the English characters l and r. Thus there are no characters in the Japanese writing system which can specifically represent an l as opposed to an r; there are only characters that represent both sounds inclusive of intermediate sounds.

The number of phonemes varies widely according to the language. Languages can contain from two to 30 vowels and from five to more than 100 consonants. The English language has approximately 41 phonemes (depending upon the dialect), which is above average because of its relatively large number of vowel phonemes, at 13. At the extremes are Pirahã (an indigenous language of Brazil), which has only 10 phonemes, and !Xóõ (an indigenous language of Botswana and Namibia), which has 141!

The International Phonetic Alphabet (IPA) was devised in the 1880s as a means of representing all of the several hundred phonemes that are used by the world's currently spoken languages. It was based on the Roman alphabet but added a number of letters, including variations of existing letters. The IPA is used by linguists as a basis for describing the sounds of languages and is also used in some dictionaries and text books to indicate pronunciation.

Phonemes and Letters

In most written languages there is not a one-to-one correspondence between letters and phonemes. That is, there are (1) some letters that can represent more than one phoneme (but only one at a time) and/or (2) some phonemes that can be represented by alternative individual letters and/or some combination(s) of letters.

For example, in English the letter c represents multiple sounds, such as in cat and face. Also, the c sound as used in cat is written with k in many words, such as in the word cake, and the c sound as used in face is written with an s in many words, such as in vase. This is in contrast to some letters (such as b) that represent only a single sound and whose sound can be represented only by that letter. Some letters represent only a single sound when used by themselves but indicate a different sound when used in combination with certain other letters, e.g. the g and h in laugh indicate an f sound.

A language in which each letter represents only one phoneme and every phoneme is represented by only one letter is often popularly (but not accurately) called a phonetic language. In such a language, a writer could predict the spelling of a word given its pronunciation, and a speaker could predict the pronunciation of a word given its spelling.

Some languages, such as Italian, Spanish and Finnish (the main language of Finland), have a very regular spelling system with close to a one-to-one correspondence between letters and phonemes. In fact, the Italian language has no verb corresponding to spell, because a correct pronunciation exactly corresponds to a correct spelling. In standard Spanish, it is possible to predict the pronunciation of a word from its spelling because each letter can only produce a single phoneme; however, it is not always possible to predict the spelling of a word from its pronunciation because some phonemes are represented in more than one way.

Were English to have an alphabet with a one-to-one correspondence between letters and phonemes (as has been proposed from time to time), there would be 41 letters (i.e., one for each of the 41 phonemes in the language). However, the English alphabet as it is currently used provides more than 500 ways to represent these 41 phonemes.

The relationship between the alphabet of a language and the phonemes of that language can be complicated by dialects. That is, different dialects often use different phonemes for the same word but such words usually retain the same spelling.

Comparisons of Alphabets

Among languages that are written with alphabets, the number of letters varies considerably. The smallest known alphabet is that of the Rotokas language (spoken in Bougainville, an island to the East of Papua New Guinea), which contains only eleven letters. The largest known alphabet is Armenian with 39 letters.

English and most European languages are written with alphabets based on the Roman alphabet; the major exceptions in Europe today are the Greek and the Cyrillic alphabets.

The Cyrillic alphabet traces its roots back to the development of the Glagolitic alphabet in the ninth century. It was derived mainly from the Greek alphabet by Saint Cyril and his brother Saint Methodius, both of whom were Greek monks, for use in writing religious books in the Slavic languages (a group of closely related Eastern European languages, the most widely spoken of which is Russian). The Cyrillic alphabet has been used to write more than 50 languages, including Russian, Ukrainian, Bulgarian, Serbian and other Slavic languages as well as many non-Slavic languages of the former Soviet Union (and even extending to Asiatic languages such as Mongolian). The Cyrillic alphabet as used in modern Russian contains 33 letters.

The Arabic alphabet is the world's second most widely used alphabet, and it is likewise a descendent of the Phoenician alphabet. It contains 28 basic letters and is written from right to left. There is no difference between written and printed letters, and there is no concept of upper and lower case. Most of the letters are attached to one another, even when printed, and their appearance changes according to whether they are preceded or followed by other letters or stand alone. Vowels are not explicitly written. The Arabic alphabet is used to write about a hundred languages including Arabic, Kurdish (spoken by the Kurds), Persian, Pushtu (spoken in Afghanistan) and Urdu (the main language of Pakistan).

There are numerous other alphabets in use today that are descended from the Phoenician alphabet but not from the Roman alphabet. A few of the more widely used ones are Armenian, Burmese, Devanagari (used by Hindi, Nepali and some other Indian languages), Georgian, Hebrew, Mongolian (the traditional, pre-Cyrillic alphabet, which is currently being revived), Thai and Tibetan. A rare example of a modern alphabet that is not descended from the Phoenician alphabet is Hangul (although its developer may have been influenced by Phoenician-descended alphabets), which is used to write Korean.

Alternatives to Alphabets

In addition to alphabets, there are two other major types of writing systems: syllabic and logographic.

A syllabary is a set of characters that represent (or approximate) the syllables of a language, with one distinct character for each possible syllable. A syllable is the next largest unit of sound in a language after a phoneme; it consists of a vowel sound or a consonant-vowel combination. Syllabaries typically contain many more characters than do alphabets.

Syllabaries are best suited to languages that have relatively simple syllable structures, such as Japanese, which has only about a hundred syllables. The English language, in contrast, contains thousands of syllables as a result of its relatively large number of vowels and its complex consonant clusters. To write English using a syllabary, every possible syllable in English would have to have a separate character, which would be quite cumbersome to remember and use (and which would provide no real advantage).

The third major type of writing system, logographic, employs characters that represent objects or abstract ideas. The most important modern logographic writing system by far is Chinese, whose characters are also used, with varying degrees of modification, in Japanese and Korean (as a supplement to Hangul) and were formerly used in Vietnamese. The ancient Egyptians, the Sumerians and the Mayans also used logographic systems.

A major feature of logographic writing systems is the large number of characters required. This is particularly true for Chinese, which has more than 40,000 characters (although a well-educated person may only know about 5,000 of these), in sharp contrast to the few dozen each that are used by most of the world's languages. In fact, Chinese accounts for the majority of characters that are known to have ever existed.

Actually, the Chinese writing system is not purely logographic. This is because individual characters are often compounds which consist of an element that represents the meaning and an element that represents the pronunciation. Also, combinations of characters are sometimes used mainly for their phonetic values to represent proper nouns (e.g., names of people or places) from other languages.

Likewise, alphabetic and syllabic scripts frequently make some use of logograms and logographic values. The most common example is Arabic numerals, each of which has the same meaning regardless of which language or dialect it is used in and how it is pronounced. Other examples are symbols such as the ampersand and dollar sign. Also, individual letters sometimes have more than just a phonetic value: for example, in the English language the letter A often indicates high quality and the letter X sometimes indicates the unknown or an adult rating.

Created July 20, 2004.
Copyright © 2004 - 2006 The Linux Information Project. All Rights Reserved.