🍋
Menu
Text

Unicode

Unicode Standard

A universal character encoding standard assigning a unique code point to every character in every writing system.

Technical Detail

Unicode relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.

Example

```javascript
// UTF-8 encode/decode
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8');

const bytes = encoder.encode('Hello 世界');
// → Uint8Array [72, 101, ..., 228, 184, 150, 231, 149, 140]

decoder.decode(bytes);  // 'Hello 世界'
```

Related Tools

Related Terms