Collation

In computer science and library and information science, collation is the assembly of written information into a standard order. In common usage, this is called alphabetisation, though collation is not limited to ordering letters of the alphabet. Collating lists of words or names into alphabetical order is the basis of most office filing systems, library catalogues, and books of reference. Collation differs from classification in that classification is concerned with arranging information into logical categories, while collation is concerned with the partial ordering of those categories. Collation differs from a sort algorithm in that whereas sort algorithms decide which pairs of elements to compare, collation defines a partial order <= on pairs that the sort algorithm uses to determine when to swap the elements.

The simplest collation system is numerical sorting: ordering numbers by their magnitude. For example, 4 7 3 5 colates to 3 4 5 7. While this might appear to work only for numbers, computers can use this method for any information since everything is a number to a computer. For example, a computer using ASCII code (or any of its supersets such as Unicode) and numerical sorting would collate a b C d $ to $ C a b d. Why the curious "ASCIIbetical order"? The numerical values that ASCII uses are $ = 36, a = 97, b = 98, C = 67, and d = 100. This style of collation is commonly used, often with the refinement of converting uppercase letters to lowercase before comparing ASCII values, since most people do not expect capitalised words to jump the head of the list. This system fails to properly sort numbers written as text because a human-readable number stored in a computer text string is a sequence of numeric codes for numerals. For example, 156.1 (a string) is represented by ASCII code as the five ordered numbers 49, 53, 54, 46, and 49; 35.29 corresponds to 51, 53, 46, 50, and 57; because 49 comes before 51, 156.1 comes before 35.29.

A more elaborate collation system is alphabetical sorting, which orders words or names based on the order of letters in an alphabet or abjad. Each nth letter is compared with the nth letter of other words in the list, starting at the first letter of each word and advancing to the second, third, fourth, etc until the order is established. For example, foo bar bibble collates to bar bibble foo because (1) f comes after b so bar and bibble both precede foo and (2) a comes before i so bar precedes bibble.

Numeric sorting on a computer and alphabetical sorting often produce the same ordering for English. The difference between computer-style numerical sorting and true alphabetical sorting becomes obvious in languages with alphabets larger than twenty-six letters. For example, the thirty-letter alphabet of Spanish treats ñ as basic letter following n, and formerly treated ch and ll as basic letters following c, l, respectively. Ch and ll are still considered letters, but are alphabetized as digraphs. (The new alphabetization rule was issued by the Royal Spanish Academy in 1994.) (On the other hand, the letter rr follows rqu as expected.) A numeric sort would order ñ incorrectly following z and treat ch as c + h, also incorrect. Similar differences between computer numeric sorting and alphabetic sorting occur in Danish and Norwegian (aa is ordered as å at the end of the alphabet), German (ß is ordered as s + s), Icelandic (ð follows d), English (æ is ordered as a + e), and many other languages. Usually the spaces or hyphens between words are ignored.

See also Latin alphabet for a list of collating rules for latin based alphabets.

Languages that used a syllabary or abugida instead of an alphabet (for example, Cherokee) can use approximately the same system if there is a set ordering for the symbols.

Another form of collation is radical-and-stroke sorting, used for non-alphabetic writing systems such as Chinese logographs and Japanese kanji, whose thousands of symbols defy ordering by convention. In this system, common components of characters (radicals) are identified. Character are then grouped by their primary radical, then order by number of pen strokes within radicals. When there is no obvious radical or more than one radical, convention governs which is used for collation. For example, the Chinese character for "mother" (媽) is sorted as a thirteen-stroke character under the three-stroke primary radical (女).

The radical-and-stroke system is cumbersome compared to an alphabetical system in which there are a few characters, all unambiguous. As a result, logographic languages often supplement radical-and-stroke ordering with alphabetic sorting of a phonetic conversion of the logographs. For example, the kanji word Tokyo (東京) can be sorted as if it is spelled out in the Japanese alphabet sequence "to-u-ki-yo-u" (とうきょう). Nevertheless, the radical-and-stroke system is the only practical method for constructing dictionaries that someone may use to look up a logograph whose pronunciation is unknown.

External links and references



In the News

Genomic 'Firestorms' Underlie Aggressive Breast Cancer Progression
The first high-resolution analysis of genomic alterations in breast tumors is reported in the scientific journal Genome Research. In this analysis, scientists from Cold Spring Harbor Laboratory, in collaboration with researchers from Scandinavia, identified three distinct patterns of genomic variation that underlie breast tumor formation, one of which -- "firestorms"-- may be predictive of aggressive disease progression and short survival.

AIDS in Libya
Previously we reported on the case of six medical workers in Libya who face the death sentence having been charged with deliberately contaminating more than 400 children with HIV in 1998. The evidence in their defence has now reached the molecular level and is published today online in Nature.Oliver Pybus and colleagues in an international [...]

Psychologists Find More Sensitive Tests For Predicting Alzheimer's As
Two recent studies may help clinicians and researchers better predict and understand dementia of the Alzheimer's type early in its history. Both studies appear in the September issue of Neuropsychology, which is published by the American Psychological Association (APA). Psychologists focus on early detection in part because current medications are useful only when given very early in the course of the disease.

[Ironic] LONDON: A jailed cocaine dealer is working as Santa Claus on
John Tams, who dons beard, boots and red suit to work in a cafe's Christmas grotto, said he wanted to give something back to the community...

Tool-wielding Chimps Provide A Glimpse Of Early Human Behavior
Chimpanzees inhabiting a harsh savanna environment and using bark and stick tools to exploit an underground food resource are giving scientists new insights to the behaviors of the earliest hominids who, millions of years ago, left the African forests to range the same kinds of environments and possibly utilize the same foods.

Recovered King Of Beasts Returns To His Home, Thanks To Unique Operati
Samson the lion from the Hai-Kef zoo in Rishon Lezion, Israel, who had undergone a brain operation -- unique in the world -- at the Veterinary Teaching Hospital of the Hebrew University of Jerusalem, has recuperated and has returned to his cage and to his sister, Delilah.

A New Turn-on For Genes: Scientists Find Structure Relevant To Cell Gr
Researchers discovered a special type of molecular structure that helps keep genes properly turned off until the structure is ejected from those genes in a regulated manner to help turn the genes on. The discovery by researchers at the University of Utah's Huntsman Cancer Institute is relevant to normal cell growth and cancer.

The National Archives: Veterans and Their Families
Compilation of resources for locating archival material about U.S. military veterans. Provides brochures, FAQs, and other information about military service records and official military personnel files, covering enlistment, duty stations, awards and medals, separation, and other administrative details. Also includes material about obtaining medical records, what information is not available from the National Archives, information available from other agencies and organization concerned with veterans, and more. From the U.S. National Archives and Records Administration (NARA).

Unraveling The Mystery Of Modern Potatoes' Origins
When it comes to veggies, almost everyone can agree on potatoes. But despite its popularity, the common brown potato has a colorful history that some researchers are still disputing.

[Scary] Pregnant woman says 'maternal instinct' helped her kill attack
FORT MITCHELL, Ky. - A pregnant woman who killed her attacker said a maternal instinct helped her fight off the woman who investigators believe was after her unborn child."I do believe that I fought harder because it was for my child,"Sarah Brady told ABC's "Good Morning America"in interviews aired Sunday and Monday. "It is a maternal instinct to protect your child to the very end."Katherine Smith, 22, died Thursday after luring Brady to her apartment to pick up a package supposedly delivered to the wrong address. When Smith pulled out a knife and attacked the pregnant woman, Brady fought back, striking Smith on the head with an ash tray and stabbing her three times with her own knife, police said. Brady, 26, said she didn't know Smith before the two met at Smith's apartment and can't be certain why Smith wanted to kill her."I really am not sure what was going through her mind,"Brady told ABC. "The only thing I thought was that she was going to kill me and my child and that is the only thing that ran through my mind."


MP3 Music Downloads

Preview songs, Download Free Music,Burn CDs at ITunes.com
iTunes_RGB_9mm

 


Google




InformationQuickFind.com - Find Information Fast

Links