Emoji & Unicode Glossary
The essential terms behind emoji and Unicode, explained without jargon. Developer debugging string lengths? Designer choosing the right emoji? Just curious about how ๐จโ๐ฉโ๐งโ๐ฆ works โ this reference has you covered.
32 terms ยท Alphabetically ordered ยท Cross-referenced
CLDR (Common Locale Data Repository)
A large collection of locale-specific data maintained by the Unicode Consortium. CLDR provides the official short names and keywords for every emoji, which is why searching "thumbs up" finds ๐ on your phone. It also defines how emoji names are translated across languages, so the same emoji can be found regardless of what language you speak.
Code Plane
A continuous group of 65,536 (2ยนโถ) codepoints within Unicode. There are 17 planes in total, numbered 0 through 16. Plane 0, the Basic Multilingual Plane (BMP), contains most everyday characters. Most emoji live on Plane 1, the Supplementary Multilingual Plane (SMP), which is why emoji codepoints start with "1" โ like U+1F600 for ๐.
Codepoint
A unique numerical value assigned to each character in the Unicode Standard. Codepoints are written in hexadecimal with a "U+" prefix. For example, the grinning face emoji is U+1F600. Simple emoji use a single codepoint, while complex emoji like flags or families are built from multiple codepoints combined together.
Dingbats
A collection of decorative symbols and ornaments included in Unicode (block U+2700โU+27BF). Dingbats predate emoji and include characters like โ (scissors), โ (airplane), and โ (envelope). Many dingbat characters gained emoji presentations in later Unicode versions, making them some of the oldest symbols that now display as colorful emoji on modern devices.
Emoji
Small digital pictographs used in electronic communication. The word comes from Japanese: "e" (picture) + "moji" (character). Originally created by Shigetaka Kurita in 1999 for Japanese mobile carrier NTT DoCoMo, emoji became a global standard when added to Unicode 6.0 in 2010. Unlike emoticons, emoji are actual characters with assigned codepoints, rendered as colorful graphics by each platform.
Emoji Presentation
The colorful, graphical rendering of a character as opposed to its plain text form. Some Unicode characters exist in both text and emoji presentations. Characters with a default emoji presentation (like ๐) always render as graphics, while others (like โบ) default to text and need a Variation Selector to appear as emoji.
Emoji Sequence
A combination of multiple Unicode codepoints that together render as a single emoji. There are several types: ZWJ sequences join emoji with the Zero Width Joiner (like family emoji), flag sequences pair two Regional Indicator Symbols, keycap sequences add a combining enclosing keycap to digits, and presentation sequences use Variation Selectors.
Emoji Version
A release of the emoji specification managed by the Unicode Consortium, separate from the Unicode Standard version numbers. For example, Emoji 15.1 was released alongside Unicode 15.1, but earlier releases did not always align. Emoji versions define which new emoji are added, which sequences are recommended, and how emoji should be displayed.
Emoticon
A text-based representation of a facial expression using standard keyboard characters. Emoticons appeared in the early 1980s and predate emoji by about two decades. Western emoticons are read sideways (like :-) for a smiley), while Japanese-style emoticons (kaomoji) are read straight on. Many messaging apps automatically convert common emoticons into their emoji equivalents.
Fitzpatrick Scale
A dermatological classification of human skin color, originally developed by Thomas Fitzpatrick in 1975 to measure UV sensitivity. Unicode adopted a simplified version in 2015 (Unicode 8.0) with five skin tone modifiers for emoji. The modifiers map to Fitzpatrick Types IโII (light) through Type VI (dark), allowing users to personalize human emoji to better reflect their appearance.
Flag Sequence
A pair of Regional Indicator Symbol characters that together display as a country or territory flag emoji. The two-letter combination corresponds to an ISO 3166-1 alpha-2 country code. Not all platforms support every flag โ unsupported flags typically show the two letter indicators side by side instead.
Grapheme Cluster
The technical term for what a user perceives as a single character on screen. Many emoji are composed of multiple codepoints but render as one visual unit. A family emoji can contain up to 7 codepoints joined by ZWJ characters, yet it looks like one picture. This distinction matters for developers counting "characters" โ string length and visual length often differ with emoji.
Kaomoji
Japanese-style emoticons made from a wide range of Unicode characters, read face-on rather than sideways. The name combines "kao" (face) and "moji" (character). Kaomoji can express far more nuance than Western emoticons, using characters from multiple scripts. They remain popular in Japan and across Asian digital culture, even alongside modern emoji.
Keycap Sequence
An emoji sequence that displays a digit, letter, or symbol inside a rectangular button shape. It is formed by combining a base character (0โ9, # or *), a Variation Selector (U+FE0F), and a Combining Enclosing Keycap character (U+20E3). Keycap emoji are commonly used for numbered lists, phone menus, and similar visual cues.
Modifier
A special Unicode character that changes the appearance of the preceding emoji without altering its core meaning. The most common modifiers are the five skin tone modifiers based on the Fitzpatrick Scale. Modifiers only work with compatible base emoji โ primarily those depicting people or body parts. Applying a modifier to an incompatible emoji has no effect.
Regional Indicator Symbol
A set of 26 Unicode characters (U+1F1E6 through U+1F1FF) corresponding to the letters AโZ. When two regional indicators are placed next to each other, they form a flag emoji matching the ISO 3166-1 country code. For example, "U" + "S" renders as ๐บ๐ธ. These characters have no visual meaning on their own โ they only produce flags when paired.
Shortcode
A text alias for an emoji, typically wrapped in colons. Platforms like Slack, Discord, and GitHub each define their own shortcode sets โ they are not part of the Unicode Standard. Typing a shortcode converts it into the corresponding emoji. Because shortcodes are platform-specific, the same emoji may have different shortcodes on different services.
Skin Tone Modifier
One of five Unicode characters (U+1F3FB through U+1F3FF) that change the skin color of human emoji. When placed after a compatible emoji, the modifier replaces the default yellow tone with one of five Fitzpatrick-based shades. Only emoji depicting people or human body parts support these modifiers โ applying them to objects or animals does nothing.
Slug
A URL-friendly version of an emoji name, using lowercase letters, numbers, and hyphens. For example, the emoji "Smiling Face with Open Mouth" might have the slug "smiling-face-with-open-mouth." Slugs are used in emoji reference sites (including this one) to create clean, readable URLs for each emoji page.
Sticker
A larger, often animated image used in messaging apps like Telegram, WhatsApp, LINE, and WeChat. Unlike emoji, stickers are not Unicode characters โ they are proprietary image files specific to each platform. Stickers can be any size or shape, often feature characters or illustrations, and are typically organized into downloadable packs. They serve a similar expressive purpose to emoji but with far more visual detail.
Text Presentation
The monochrome, outline-style rendering of a character, as opposed to its colorful emoji presentation. Some characters default to text presentation and need VS16 (U+FE0F) appended to display as emoji. Others default to emoji presentation and can be forced into text style with VS15 (U+FE0E). This dual nature exists because many emoji evolved from older text symbols.
Tofu
The blank rectangular box (โก) that appears when a device cannot render a particular character or emoji. Named for its resemblance to a block of tofu, it indicates a missing glyph in the current font. Tofu commonly appears when a newer emoji is sent to an older device that hasn't been updated, or when a font doesn't include a specific Unicode character.
Unicode
An international standard (formally ISO/IEC 10646) that assigns a unique codepoint to every character in every writing system worldwide, including emoji. Maintained by the Unicode Consortium, it currently defines over 150,000 characters covering 161 scripts. Unicode replaced the patchwork of older encoding systems (ASCII, Latin-1, Shift-JIS) with a single universal standard, making global text exchange possible.
Unicode Consortium
The non-profit organization responsible for developing and maintaining the Unicode Standard, including the official emoji set. Founded in 1991 and based in Mountain View, California, its members include Apple, Google, Microsoft, Meta, and other major tech companies. The consortium reviews emoji proposals, approves new additions each year, and coordinates with the Unicode Technical Committee (UTC) on all decisions.
UTC (Unicode Technical Committee)
The technical body within the Unicode Consortium that makes decisions about the Unicode Standard and emoji. The UTC reviews proposals for new characters and emoji, decides encoding details, and maintains the standard's technical specifications. Despite sharing initials with Coordinated Universal Time, the two are unrelated โ in Unicode contexts, UTC always refers to this committee.
UTF-8
The most widely used encoding for Unicode text on the web. UTF-8 uses a variable-length scheme: ASCII characters take 1 byte, most European letters take 2 bytes, and emoji typically take 4 bytes. Its backwards compatibility with ASCII and space efficiency made it the dominant encoding โ over 98% of web pages use UTF-8 today.
UTF-16
A Unicode encoding that uses 2 or 4 bytes per character. Characters in the Basic Multilingual Plane (Plane 0) use 2 bytes, while characters outside it โ including most emoji โ require a surrogate pair of 4 bytes. UTF-16 is used internally by JavaScript, Java, and Windows. This is why JavaScript's string length for a single emoji often returns 2 instead of 1.
Surrogate Pair
A pair of 16-bit code units used in UTF-16 to represent characters outside the Basic Multilingual Plane (above U+FFFF). Since most emoji have codepoints above U+FFFF, they require surrogate pairs in UTF-16 environments like JavaScript. This is why "๐".length returns 2 in JavaScript โ the single emoji is encoded as two 16-bit units.
Variation Selector
A Unicode character that controls whether a symbol displays in text or emoji style. VS15 (U+FE0E) forces text presentation โ a plain, monochrome glyph. VS16 (U+FE0F) forces emoji presentation โ a colorful, graphical version. For example, the heart character U+2764 appears as a simple outline by default, but adding VS16 turns it into the familiar red heart โค๏ธ.
Vendor
A company that designs its own visual style for emoji, such as Apple, Google, Microsoft, Samsung, or Meta. Each vendor creates unique artwork for every emoji while following the Unicode specification's general guidelines. This is why the same emoji can look quite different across platforms โ the Unicode Standard defines meaning and name but not exact appearance.
Wingdings
A family of symbol fonts created by Microsoft in 1990, using decorative icons in place of standard letters. Typing a regular letter would display a symbol like a phone, envelope, or smiley face instead. Wingdings (along with Webdings) were predecessors to modern emoji in spirit, though they worked by font substitution rather than as standard Unicode characters. They cannot be reliably exchanged between systems.
ZWJ (Zero Width Joiner)
A special Unicode character (U+200D) with no visible width that joins multiple emoji into a single composite picture. ZWJ sequences create family groups, profession emoji, and other combinations that would be impractical to encode individually. If a device doesn't support a particular ZWJ sequence, it gracefully falls back to showing the individual emoji side by side.