ABSTRACT

Computers can only process texts whose characters are represented by a system that relates to the binary system computers can interpret. This is called character encoding. One such character encoding scheme based on the English alphabet is ASCII (American Standard Code for Information Interchange). Character encoding facilitates the storage of text in computers and the transmission of text through telecommunication networks. Character encoding, however, does not say anything about the semantics, interpretation, or structure of a text. Such information on a text is called metainformation. If we want to add any meta-information to a text so that it can be processed by computers, we need to encode or markup texts. We can do this by inserting natural language expressions (or codes representing them) in the text with the same character encoding the text is using, but separated from the text by specific markers. One such expression, we call a tag. All of the tags used to encode a text together constitute a markup language. The application of a markup language to a text, we call text encoding.