Character encoding

A character encoding is a code that pairs a set of natural language characters (such as an alphabet or syllabary) with a set of something else, such as numbers or electrical pulses. Common examples include Morse code, which encodes letters of the Roman alphabet as series of long and short depressions of a telegraph key; and ASCII, which encodes letters, numerals, and other symbols as both integers and 7-bit binary versions of those integers.

In some contexts (especially computer storage and communication) it makes sense to distinguish a character repertoire, which is a full set of abstract characters that a system supports, from a coded character set or character encoding which specifies how to represent characters from that set using a number of integer codes.

In the early days of computing, most systems used only the character repertoire of the ASCII code. This was soon seen to be inadequate, and a number of ad-hoc methods were used to extend this. The need to support multiple writing systems, including the CJK family of scripts, required a far larger number of characters to be supported, and required a systematic approach to character encoding to be used, rather than the previous ad-hoc approaches.

For example, the full repertoire of Unicode encompasses over 100,000 characters, each being assigned a unique integer code in the range 0 to hexadecimal 10FFFF (a little over 1.1 million, so not all integers in that range represent coded characters). Other common repertoires include ASCII and ISO 8859-1, which are identical to the first 128 and 256 coded characters of Unicode respectively.

The term character encoding is sometimes overloaded to also mean how characters are represented as a specific sequence of bits. This involves an encoding form where the integer code is converted to a series of integer code values that facilitate storage in a system that uses fixed bit widths. For example, integers greater than 65535 will not fit in 16 bits, so the UTF-16 encoding form mandates that these integers be represented as a surrogate pair of integers that are less than 65536 and that are not assigned to characters (e.g., hex 10000 becomes the pair D800 DC00). An encoding scheme then converts code values to bit sequences, with attention given to things like platform-dependent byte order issues (e.g. D800 DC00 might become 00 D8 00 DC on an Intel x86 architecture). A character set or character map or code page shortcuts this process by directly mapping abstract characters to specific bit patterns. Unicode Technical Report #17 explains this terminology in depth and provides further examples.

Since most applications use only a small subset of Unicode, encoding schemes like UTF-8 and UTF-16, and character maps like ASCII, provide efficient ways to represent Unicode characters in computer storage or communications using short binary words. Some of these simple text encodings use data compression techniques to represent a large repertoire with a smaller number of codes.

See also: Chinese character encoding

Popular character encodings

External links



In the News

New Research Lab To Study Impulsive Behavior In Children And Adults
A new research laboratory for the study of impulsivity in adults and children has opened in a renovated building at the corner of Cloverdale and Medical Center Boulevard at Wake Forest University Baptist Medical Center.

3-D Imaging Goes Ballistic
Government-funded research produces new gun-fingerprinting technology that can analyze and compare 3-D images of spent rounds. Luke O'Brien reports from Washington, D.C.

Cocaine Use Related To Level Of Education Achieved
The decreased use of cocaine in the United States over the last 20 years mostly occurred among the highly educated, while cocaine use among non-high school graduates remained constant.

African-Americans More Prone To Higher Heart Weight Than Whites, Study
Adult African-Americans have higher heart weight -- a condition that can lead to serious heart disease -- at two to three times the rates of whites, researchers from UT Southwestern Medical Center have shown.

CareerJournal.com: 50+Professionals
Compilation of articles about job hunting and career planning for people aged 50 years and older. Some of the topics covered include top employers and job locations, early retirement, age bias, online job opportunities, and adding value to a workplace. From the Wall Street Journal's executive career website.

Lightning-produced radiation a potential health concern for air travel
New information about lightning-emitted X-rays, gamma rays and high-energy electrons during thunderstorms is prompting scientists to raise concerns about the potential for airline passengers and crews to be exposed to harmful levels of radiation.

Fractured Leg Bone Not The End Of Tutankhamen Mystery
Original X-rays of Tutankhamen's body, taken by scientists at the University of Liverpool, could throw new light on the mystery of the young King's death.

Four Questions That May Save Your Grandma's Life
A four-question screening tool can predict which older patients with appetite problems are likely to lose weight, placing them at greater risk of death, according to Saint Louis University research. The questionnaire is called the SNAQ (pronounced snack), the Simplified Nutritional Appetite Questionnaire, and takes less than two minutes to answer.

The Ultimate Geek Gift Guide
Hunting for media streamers, GPS navigators or a home robot for the special techie in your life? Look no further. By Christopher Null and Robert Strohmeyer. PLUS:Great Gamers' Gift Guide of 2005

Study Identifies Possible Mechanism For Brain Damage In Huntington's D
Researchers from the MassGeneral Institute for Neurodegenerative Disease have identified a possible mechanism underlying how the gene mutation that causes Huntington's disease leads to the degeneration and death of brain cells.




MP3 Music Downloads

Preview songs, Download Free Music,Burn CDs at ITunes.com
iTunes_RGB_9mm

 


Google




InformationQuickFind.com - Find Information Fast

Links | Privacy Policy | News |