Unicode Converter
Convert text to Unicode code points and Unicode escape sequences back to text. Free online Unicode converter, no signup needed, runs in your browser.
Convert Text to Unicode Code Points and Back
Unicode is the universal character encoding standard that covers every writing system in use today—from Latin and Cyrillic to Chinese, Arabic, Japanese, Korean, Hindi, emoji, mathematical symbols, ancient scripts, and tens of thousands of other characters. Every character in Unicode has a unique numeric identifier called a code point, expressed in the notation U+XXXX (where XXXX is a hexadecimal number). Our free Unicode converter handles both directions: enter text to see the Unicode code points for each character, or enter Unicode escape sequences to decode them back to the original characters.
The tool outputs standard U+XXXX notation—the canonical format used in Unicode specifications, documentation, and developer discussions. This format makes it immediately clear which character each code point represents and how it fits into the Unicode code space.
Unicode: The Universal Character Standard
Before Unicode, computers used dozens of incompatible encoding standards that each covered only a limited set of characters from specific languages or regions. ASCII covered English (128 characters). ISO-8859-1 extended it to cover Western European languages (256 characters). Separate encodings existed for Greek, Cyrillic, Arabic, Hebrew, Chinese, Japanese, Korean, and virtually every other script. Documents that mixed scripts from different encodings were nearly impossible to handle correctly, and text could display as garbled characters whenever it crossed encoding boundaries.
Unicode solved this by creating a single, universal standard with a code space large enough to contain every character from every writing system ever used. The current Unicode standard (version 15.1) defines code points for over 149,000 characters across 161 scripts, plus emoji, mathematical symbols, musical notation, historic scripts, and many other specialized character sets. The first 128 Unicode code points (U+0000 through U+007F) are identical to ASCII, ensuring backward compatibility.
Unicode Code Points and Encoding Schemes
Unicode defines code points—the abstract numbers assigned to characters—separately from the byte representations used to store and transmit them. Several encoding schemes translate Unicode code points to byte sequences:
UTF-8 is the dominant encoding for web content and file storage. It uses 1 byte for ASCII characters (U+0000–U+007F), 2 bytes for most Latin and non-Latin European scripts (U+0080–U+07FF), 3 bytes for common Chinese, Japanese, and Korean characters (U+0800–U+FFFF), and 4 bytes for emoji, historic scripts, and supplementary characters (U+10000–U+10FFFF). UTF-8's backward compatibility with ASCII makes it the dominant encoding for the web.
UTF-16 uses 2 bytes for most characters and 4 bytes (surrogate pairs) for characters above U+FFFF. It's the internal encoding used by Windows, Java, and JavaScript. JavaScript string methods like `charCodeAt()` return UTF-16 code units, which is why emoji (above U+FFFF) appear to have a `length` of 2 in JavaScript.
UTF-32 uses a fixed 4 bytes per code point, making random access and code point counting trivial but consuming more memory than UTF-8 for typical text. It's used internally by some programming languages and as an intermediate representation for text processing.
Practical Uses for Unicode Code Point Conversion
Debugging Encoding and Display Issues
When text displays incorrectly—showing garbled characters, question marks, or boxes—inspecting the actual Unicode code points of the problematic characters helps identify whether the issue is an encoding mismatch, an unsupported character, or a font rendering problem. A character that looks like a space but isn't behaving like one might be a non-breaking space (U+00A0), a zero-width space (U+200B), or a different space character. Inspecting the code point identifies it immediately.
CSS and JavaScript Character References
Special characters in CSS content properties and JavaScript string literals are often specified using Unicode escape sequences. In CSS: `content: '\2665'` inserts a heart symbol (♥, U+2665). In JavaScript: `'\u2665'` is the same character. In ES6 JavaScript: `'\u{1F600}'` is 😀 (U+1F600). Our converter produces the code points needed to construct these escape sequences.
Identifying Homograph Attack Characters
Homograph attacks use Unicode characters that look visually similar to ASCII letters but have different code points—for example, the Cyrillic lowercase а (U+0430) looks nearly identical to the Latin lowercase a (U+0061). Domain names and URLs using lookalike characters can be used to create convincing phishing pages. Inspecting the code points of suspicious text reveals whether it uses actual ASCII characters or Unicode lookalikes.
Free, Private, and Instant
The Unicode converter runs entirely in your browser. No text or code points you enter are transmitted to any server or stored anywhere. The tool is completely free with no account required and works on any device with a modern browser.