What's Next:
A New Language May Ease Web Use
(The New York Times, June 3, 1999)
By Anne Eisenberg
THE latest browsers understand it, undergraduates are signing up for
classes to learn it, and - ultimate cyberspace compliment - start-ups
are forming around it.
This Internet phenomenon of the moment is XML, or Extensible Markup
Language, a new language that may one day power the second generation of
the World Wide Web.
"The Web has two main problems - it's slow and it's hard to find the one
piece of information you need when you search,” said Tim Bray, chief
technical officer at Textuality, in Vancouver, British Columbia, and one
of the architects of the new language. “XML is designed to fix both
these problems.”
Right now the lingua franca of the web is HTML, or Hypertext Markup
Language, the most popular publishing language ever but one that has its
limitations, Mr. Bray said. “HTML is easy to learn and use, but it was
never intended for many of the complicated jobs it is being asked to
do.”
The problem lies with the kinds of tags used to mark HTML text. HTML
uses tags like <p> to show that a paragraph follows, or <H1> to indicate
a headline. But HTML tags do not mark the meaning or semantic content of
the text. For example, they can mark text to say, “This is displayed in
bold,” but they are not capable of saying, “This is the price of an
item” on an E-commerce site or “This is a date.”
“Think about ordering a book on the Web right now," Mr. Bray said. “If
you want to know which of the books on the subject is the most recent,
or cheapest, it's a full Internet roundup each time while you ask for a
new page and an overburdened server sends it to you.”'
XML will make it easier to search a site for detailed information.
XML is designed to remedy this problem by specifying not what the
information looks like but what the information is. It uses tags or
markers just as HTML does, but while HTML marks the text to say “This is
in color” or “This is in boldface," XML marks the text to say “This is a
price” or “This is a date." An XMLenabled browser can sort and
manipulate this data right at the desktop.
XML was intended to replace HTML, but in the near future it is most
likely to be used in tandem with it. The new language has many practical
advantages. For example, if the the user asked a travel site for flight
times between San Francisco and Chicago, the browser could receive not
only the schedule but a small program that could select the flights by
cost, time of day or even seat selection. The browser could do the work
that previously had to be done by the server.
“Multiply that by the entire Net and you have servers and networks that
are far less loaded and run more quickly,” Mr. Bray said.
XML came to life in 1996 when the World Wide Web Consortium commissioned
a group of markup language experts, organized and led by Jon Bosak of
Sun Microsystems, to deal with the limitations of HTML.
“The problem was not in having tags but in having one set of tags,” said
C. M. Sperberg-McQueen, of the University of Illinois at Chicago and a
member of the original committee. “Businesses who use the Web needed to
say, “This is an order number. This is a part number.' They needed to
identify each part of their business documents.” Dr. Sperberg-McQueen
and his colleagues developed the new markup language as a simplified
version of a difficult precursor, SGML, or Standard Generalized Markup
Language, completing the job in 1998.
Many groups have already begun developing their own applications for
XML. In medical applications, for example, there is an urgent need for a
single, easy-to-handle language capable of saying “This is a patient's
name” or “This is a blood-sugar level” on the Web so that many machines
can communicate quickly. “There are very active efforts right now in the
medical community to hammer out a tag set so the disparate computers can
talk with one another more easily,” Mr. Bray said.
Indeed, whole companies are springing up around XML. Erutech, a Seattle
start-up, will offer XMLbased services for the health insurance
industry. Dave Wascha, Erutech's vice president of marketing, said that
the industry has to learn new ways to exchange mountains of information
over the Internet. “It took a few years of ramping up with HTML to
realize the language wouldn't work efficiently to exchange data,” he
said.
XML is most likely to spread quickest among businesses and
organizations, not individual users, some experts predict. “I don't
believe in XML for the average person,” said Michael Macrone, an
independent Web developer in San Francisco. “The promise of a
cross-platform, universal technology is always a false one for the
ordinary person, because too many factors like the version of the
browser or the speed of the modem must be controlled.” Many people have
old browsers or slow modems, for example, and are not interested in
upgrading.
“The public thinks of their computers like their TV's or refrigerators –
they don't want to install a new freezer each year,” Mr. Macrone said.
“XML will be fine for superusers, but the average person is not going to
download the required software or put up with the flawed performance and
errors messages that always ensue with any new program.”
Mr. Bosak is very much aware of positions like Mr. Macrone's.
“Certainly, once you put a product out there, it's difficult to replace
it with the next generation,” he said. If old browsers do linger, Mr.
Bosak predicts XML will live on the server and be translated to HTML for
users. Companies will continue to develop middleware - a layer of
servers that knows XML and translates it to HTML for the browsers. “This
model works, but severely limits what we intended,” Mr. Bosak said.
Mr. Bray said he looks forward to a second generation of the Web powered
by XML. “We are locking up a high proportion of our intellectual capital
in short-lived, inaccessible, proprietary formats," he said. “I can't
open some of my old Microsoft Word files. I can't go through my own
writing for the last few years to extract all the information on one
topic.” XML deals with this problem. “XML files will be fully documented
and stable,” Mr. Bray said. “They can be read and reused in 2028, and we
won't have to pay a tax to any software vendors for the right to use
them.”
– 30 –
First published in The New York Times (June 3, 1999)