Unicode - the Universal Character Set

The international standard ISO 10646 defines the Universal Character Set (UCS).

UCS contains the characters required to represent practically all
known languages.  This includes not only the Latin, Greek, Cyrillic,
Hebrew, Arabic, Armenian, and Georgian scripts, but also also Chinese,
Japanese and Korean Han ideographs as well as scripts such as
Hiragana, Katakana, Hangul, Devanagari, Bengali, Gurmukhi, Gujarati,
Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Khmer, Bopomofo,
Tibetan, Runic, Ethiopic, Canadian Syllabics, Cherokee, Mongolian,
Ogham, Myanmar, Sinhala, Thaana, Yi, and others. For scripts not yet
covered, research on how to best encode them for computer usage is
still going on and they will be added eventually. This might
eventually include not only Hiero glyphs and various historic
Indo-European languages, but even some selected artistic scripts such
as Tengwar, Cirth, and Klingon. UCS also covers a large number of
graphical, typographical, mathematical and scientific symbols,
including those provided by TeX, Postscript, APL, MS-DOS, MS-Windows,
Macintosh, OCR fonts, as well as many word processing and publishing
systems, and more are being added.

The UCS characters 0x0000 to 0x007f are identical to those of the
classic US-ASCII character set and the characters in the
range 0x0000 to 0x00ff are identical to those in ISO 8859-1 Latin-1.