For the biologist, the bioinformatic analysis of genes requires the
compilation of tables of gene characteristics. To do this, data is often
taken manually out of databases in an ad hoc fashion. Different databases
(TIGR, MIPS, BLAIR, and NCBI, for example) give different outputs in
different formats. We would like to be able to extract information from the
databases in a common, structured file format in a way that allows for easy
rearranging and processing of the data.
The Extensible Markup Language (XML) is being used increasingly to represent
semi-structured data and transmit it over the Internet. XML data is data that
is marked up by tags in a manner similar to those in the Hyper-Text Markup
Language (HTML). For example, the following code shows one way of using XML
to mark up the protein with accession number "BAA03739.1". It is taken from
the National Center... (more)