Difference Between XML and XHTML

XML vs. XHTML

Extensible Markup Language (also known as XML) is a set of rules. These rules are specific for documents encoding electronically. The main objective of XML is to emphasise simplicity, generality and usability over the internet. XML is noted as a textual data format that has support from Unicode to be easily read across all languages. Though the main objective of the XML design is to focus on documents specifically, it is often also used in order to represent arbitrary data structures (web services, for instance). As it is, there is a plethora of programming interfaces that developers are able to use for the purpose of accessing XML data. There are also a variety of schema systems that are designed specifically for helping to define XML based languages.

Extensible Hypertext Markup Language (also known as XHTML) is a part of the XML markup language family. It actually simulates, or simply acts as an extension of the Hypertext Markup Language (or HTML). XHTML acts as an application from the XML family, and is a more restrictive subset of SGML. As a result of the well formed nature of XHTML documents, it is possible for them to be parsed using an XML parser – which further differentiates XHTML from HTML.

An XML document is comprised entirely of characters found in Unicode. There are a few control characters that are excluded from Unicode; however, those that are found in Unicode are capable of becoming content in an XML document. There is a plethora of facilities that identify the encoding of the Unicode characters within an XML document. There are also facilities that express those characters that are incapable of being used. Unicode is encoded into bytes in order to be stored or transmitted – these translated Unicode expressions are known as encodings. XML uses any, if not all, of Unicode defined encodings, as well as a number of different encodings whose characters appear in Unicode. It provides a mechanism that allows an XML processor to determine which encoding is in use.

There are three specific versions of XHTML: XHTML 1.0 Strict, which includes elements and characteristics that are not marked deprecated in HTML 4.01; XHTML 1.0 Transitional, which includes elements particular to presentations (‘font’ and ‘strike’, for example); and XHTML 1.0 Frameset, which allows frameset documents to be defined. XHTML can also be modularized, which provides an abstract collection of attributes that XHTML is able to be subsetted and extended through. This is simply a means to aid XHTML in extending its scope into other eminent platforms (mobile devices and web enabled television, for instance).

Summary:

1. XML is a set of rules that are set for encoding documents; XHTML is the XML equivalent of HTML that is a more restrictive subset of SGML.

2. XML is comprised entirely of Unicode; XHTML comes in three versions: XHTML 1.0 Strict, XHTML 1.0 Transitional and XHTML 1.0 Frameset.