Multilingual Content Development

1) Explain how are characters represented in computers?
Within computers, each character is designated by a single number, because computer can only understand numbers and that too only in the form of electrical current flow. But actual process is even more complex as computer is an electronic device and it only understands voltage ON and OFF. That is why a notation is developed to represent voltage ON and OFF as ‘1’ and ‘0’ respectively. This
particular system of notation is known as ‘Binary system’ and coding of characters using such system is known as ‘character encoding’. In ASCII, when ‘A’ labelled key is pressed, computer understands it as ‘01000001’ (or decimal 65) and prints ‘A’ on the screen.

2) Discuss the ASCII standard for character coding
ASCII was developed by a working committee known as X3.4 constituted by IBM, AT&T and its subsidiary Teletype. Basically this code was 8-bit character coding system and the arrangement was logical, i.e., if one wants to write sorting programs he can manage using ASCII value. But this particular version of code considered only the uppercase characters and there were a lot of blank positions were left assuming that it can be used for Roman smalls.

3) What is the Indian Standard for character coding in Indian scripts? Explain
ISCII (Indian Standard Code for Information Interchange) is extended ASCII. It uses last 128 characters position for characters representation in Indic scripts in the range 0-255 provided by ASCII. It uses the same Keyboard as of ASCII with Inscription of other 10 languages over it. It uses control characters SO (Shift out) and SI (Shift in) for selection of ASCII and ISCII respectively. In ISCII, ASCII character are placed in the lower half (0-127) of the 8-bit code table while Indian
script characters are in the upper half (160-255). ISCII caters to the following 10 Indian scripts - Devanagari, Gujarati, Punjabi, Bengali, Assamese, Oriya, Telugu, Tamil, Malayalam, Kannada. The ISCII code table is a superset of all the characters required for the above mentioned scripts. First version was released in 1983 and adopted by the Bureau of Indian Standards (BIS) in 1991 after revisions in 1986 and 1988. (for read section 4 above).

4) What are the important features of the UNICODE standard?
Unicode is efficient in character coding with no redundancies. It is a 16 bit code (allows for 65,536 characters) and has the following features:

  • Plain text
  • Dynamic composition
  • Logical Order
  • Unification
  • Compatibility characters
  • Equivalent sequence


5) Discuss the possible applications of UNICODE.
One of the basic applications of Unicode is the ability to provide computer based services in languages other than English also. It has several other applications such as:

  • Multilingual encoding enables information to be presented in different regional languages.
  • Multilingual searches
  • Multilingual works such as Polyglot
  • Transliteration
  • Cross Lingual Information Retrieval (CLIR)


KEYWORDS
ASCII : American Standard Code for Character Interchange.
Bit : When information is digitised, it is turned into ones and zeros. So all digital information is made up of bits.
Byte : A group of data bits that are processed together. Typically, a byte consists of 8 bits. There are
kilobytes, Megabytes, Gigabytes, Terabytes, etc.
1 Byte = 8 bits 1 kilobyte = about 1,000 bytes 1 Megabyte = about 1,000,000 bytes 1 Gigabyte =
1,000,000,000 bytes 1 Terabyte = 1,000,000,000,000 bytes.
C-DAC : Centre for Development of Advanced Computing, India
Code : A set of symbols for representing something. E.g., most computers use ASCII codes to represent characters.
File : A format for encoding information in a file. Different file types have different formats. File
formats specify first whether the file is a binary or ASCII file, and second, how the information is
organised.
GIST : Graphics and Intelligence based Scripting Technology (GIST) is developed by C-DAC, India.
It supports major Indic languages.
Glyphs : Particular shapes of a given characters as they are displayed/rendered on screen
Hexadecimal : A numbering system which uses a base of 16. The first ten digits are 0-9 and the next six are A-F. Hexadecimal numbers are used to colour web pages. For example, the hexadecimal
equivalent for the colour white is #FFFFFF.
ISCII : Indian Standard Code for Character Interchange.
Linux : A free Unix-type operating system originally created by Linus Torvalds with the assistance of developers around the world. Developed under the GNU General Public License, the source code
for Linux is freely available to everyone.
Multilingual data : Data in more than one language in the same document or file.
UNICODE : Universal code for representing world’s scripts in computers

Source : IGNOU Study Materials