Unicode
=======

The current release of MLton does not support Unicode.  We are working
on adding support.

 * `WideChar` structure.
 * UTF-8 encoded source files.

There is no real support for Unicode in the <:DefinitionOfStandardML:Definition>;
there are only a few throw-away sentences along the lines of "ASCII
must be a subset of the character set in programs".

Neither is there real support for Unicode in the <:BasisLibrary:Basis Library>.
The general consensus (which includes the opinions of the
editors of the Basis Library) is that the `WideChar` structure is
insufficient for the purposes of Unicode.  There is no `LargeChar`
structure, which in itself is a deficiency, since a programmer can not
program against the largest supported character size.

MLton has some preliminary support for 16 and 32 bit characters and
strings.  It is even possible to include arbitrary Unicode characters
in 32-bit strings using a `\Uxxxxxxxx` escape sequence.  (This
longer escape sequence is a minor extension over the Definition which
only allows `\uxxxx`.)  This is by no means completely
satisfactory in terms of support for Unicode, but it is what is
currently available.

There are periodic flurries of questions and discussion about Unicode
in MLton/SML.  In December 2004, there was a discussion that led to
some seemingly sound design decisions.  The discussion started at:

   http://www.mlton.org/pipermail/mlton/2004-December/026396.html

There is a good summary of points at:

   http://www.mlton.org/pipermail/mlton/2004-December/026440.html

In November 2005, there was a followup discussion and the beginning of
some coding.

  http://www.mlton.org/pipermail/mlton/2005-November/028300.html

We are optimistic that support will appear in the next MLton release.

== Also see ==

The <:fxp:> XML parser has some support for dealing with Unicode
documents.
