Character & Word Limits 2026: Twitter, SMS, SEO, Instagram Guide
A character limit is the maximum number of Unicode code points a platform accepts in a single field: 280 for a Twitter post, 160 for a single-segment SMS in GSM-7, around 160 for a Google meta description before truncation. The number you care about depends on where you publish and whether your text contains emoji, smart quotes, or CJK characters, all of which change the math.
This guide is for social-media writers, SEO specialists, marketing copywriters, SMS senders billed per segment, and developers writing validation that has to match what Twitter, Instagram, or SMS gateways actually count. Jump to the quick reference table for the 25-platform cheat sheet, or check your draft live against six major platforms in the Word Counter, where progress bars turn red the moment you cross a limit.
Quick reference: every platform’s character and word limit
The table below covers the 30+ fields writers and developers run into most often. “Hard limit” is the platform-enforced ceiling; “Visible / above the fold” is what readers see before a truncation point; “Sweet spot” is the empirical range where content performs best.
| Platform | Hard limit | Visible / above the fold | Sweet spot | Counts emoji as |
|---|---|---|---|---|
| Twitter / X post | 280 chars | 280 | 70-100 chars | 1 codepoint |
| Twitter / X bio | 160 chars | 160 | — | 1 codepoint |
| Twitter / X display name | 50 chars | 50 | — | 1 codepoint |
| X Premium long-form | 25,000 chars | — | — | 1 codepoint |
| Instagram caption | 2,200 chars | first 125 (then “more”) | <125 for hook | 1 codepoint |
| Instagram bio | 150 chars | 150 | — | 1 codepoint |
| Instagram hashtags | max 30 | — | 5-10 | — |
| LinkedIn post | 3,000 chars | first 210 (then “see more”) | <1,300 | 1 codepoint |
| LinkedIn article | 110,000 chars | — | — | 1 codepoint |
| LinkedIn headline | 220 chars | 220 | — | 1 codepoint |
| Facebook post | 63,206 chars | ~477 desktop / ~125 mobile | <80 for organic | 1 codepoint |
| TikTok caption | 2,200 chars | first ~100 | <150 | 1 codepoint |
| YouTube title | 100 chars | 70 (search) | <60 | 1 codepoint |
| YouTube description | 5,000 chars | first 100-150 above fold | first 150 for hook | 1 codepoint |
| YouTube comment | 10,000 chars | — | — | 1 codepoint |
| Reddit title | 300 chars | — | <60 (subreddit-dependent) | 1 codepoint |
| Reddit comment | 10,000 chars | — | — | 1 codepoint |
| Discord message | 2,000 chars | 2,000 | — | 1 codepoint |
| Discord embed description | 4,096 chars | — | — | 1 codepoint |
| Slack message | 40,000 chars | — | <2,000 for readability | 1 codepoint |
| Pinterest pin description | 500 chars | first 50-60 | <125 | 1 codepoint |
| Mastodon toot | 500 chars (configurable) | 500 | — | 1 codepoint |
| Bluesky post | 300 chars | 300 | — | 1 grapheme cluster |
| Threads post | 500 chars | 500 | — | 1 codepoint |
| SEO meta description (Google) | ~160 chars desktop / ~120 mobile | 150-160 | 150-160 | 1 codepoint |
| SEO page title (Google) | ~60 chars desktop / ~50 mobile | 50-60 | 50-60 | 1 codepoint |
| Open Graph description | ~200 chars before LinkedIn/FB clip | 150-200 | 150-200 | 1 codepoint |
| Twitter Card description | 200 chars max | 200 | 150-200 | 1 codepoint |
| SMS single segment (GSM-7) | 160 chars | — | — | special — see below |
| SMS single segment (UCS-2 / emoji) | 70 chars | — | — | 1 codepoint |
| WhatsApp message text | 65,536 chars | — | — | 1 codepoint |
| Email subject line | no platform limit | ~60 desktop / ~30 mobile | <50 | 1 codepoint |
| Google Ads headline | 30 chars × 15 headlines | 30 each | 30 | 1 codepoint |
| Google Ads description | 90 chars × 4 desc | 90 each | 90 | 1 codepoint |
| App Store title | 30 chars | 30 | 30 | 1 codepoint |
| App Store subtitle | 30 chars | 30 | 30 | 1 codepoint |
| App Store description | 4,000 chars | first 252 above fold | 252 hook | 1 codepoint |
| Play Store short description | 80 chars | 80 | 80 | 1 codepoint |
| Play Store long description | 4,000 chars | first 80 above fold | 80 hook | 1 codepoint |
Content above the “sweet spot” line tends to get truncated, downranked, or cropped off the visible card. X Premium long-form and Mastodon (configurable per instance) are the rare exceptions that let you write past 500 characters without penalty. Every count above, except where SMS rules apply, is a Unicode code-point count: one emoji costs 1 character, not 2. To verify a draft against the six most common limits at once, paste it into the Word Counter; the progress bars catch over-limit text before you hit publish.
How characters are actually counted (Unicode code points vs UTF-16)
Three different tools can hand you three different character counts for the same string. “Character” is not a single thing: it could mean a Unicode code point, a UTF-16 code unit, or a grapheme cluster, and each platform picks one.
What is a “character”: codepoint vs code unit vs grapheme
A codepoint is a Unicode scalar value: any integer from U+0000 to U+10FFFF that Unicode has assigned to a character or marked as reserved. A code unit is the smallest piece of an encoding; UTF-16 uses 16-bit code units, UTF-8 uses 8-bit code units. A grapheme cluster is what humans perceive as a single visible character. Sometimes that means one codepoint, sometimes a base codepoint plus combining marks, sometimes a zero-width-joiner sequence like the family emoji 👨👩👧👦 (seven codepoints joined into one visible glyph).
For the string "a🌍👨👩👧" the three counts disagree:
| Counting method | Result | Used by |
|---|---|---|
UTF-16 code units (JS string.length) | 10 | Naive JavaScript code |
| Unicode code points | 6 | Twitter, Instagram, SMS gateways |
| Grapheme clusters | 3 | Bluesky, screen readers, text editors |
Why string.length lies about emoji
JavaScript stores strings as UTF-16 internally. Any codepoint above U+FFFF (every emoji, all astral-plane characters) is encoded as a surrogate pair: two 16-bit code units. The .length property reports those two units, not one character.
"🌍".length // 2 (UTF-16 code units)
[..."🌍"].length // 1 (codepoints — what Twitter/SMS counts)
"🌍".match(/./gu).length // 1 (codepoints via regex with /u flag)
The spread operator and the /u regex flag both iterate by codepoint, which matches what Twitter, Instagram, and SMS gateways measure against their limits. A validation function that uses raw .length will reject tweets that are actually under the cap, or, worse, let through messages your downstream system will reject.
What about CJK and combining marks
Chinese, Japanese, and Korean ideographs are each a single codepoint and count as one character on every platform. Where they get expensive is SMS: any non-GSM-7 character flips the whole message to UCS-2 encoding, dropping the segment limit from 160 to 70 (covered in the next section).
Combining marks behave differently. The accented á written as á is one codepoint; the same á written as a + ́ (combining acute accent) is two codepoints but one grapheme cluster. Most platforms count by codepoint, so the second form costs one extra character. Bluesky is the visible exception: it counts grapheme clusters, so both forms cost 1.
Counting in different languages: quick reference
// JavaScript
[...str].length // codepoints
Array.from(str).length // codepoints
// Python 3 — len() is codepoint by default
len(s)
// Go — utf8 package
utf8.RuneCountInString(s)
// Rust — chars() iterates codepoints
s.chars().count()
// Java — codePointCount
s.codePointCount(0, s.length())
For comparison, the Base64 encoder reminds you of the other direction: when text is encoded to Base64 for transmission, every 3 bytes of UTF-8 input become 4 ASCII output characters, so the encoded length depends on the byte count, not the codepoint count. Paste a single emoji and watch the Base64 output expand to 8 characters; the same emoji that costs 1 character on Twitter takes 4 bytes in UTF-8.
To see codepoint counts (the number Twitter actually measures) on any draft, the Word Counter is Unicode-correct by default.
SMS character limit: GSM-7, UCS-2, and multi-part messages
SMS is the only major channel where adding a single emoji can literally double your bill. The reason is encoding, and the math has been the same since 1985.
The 160-character magic number: GSM-7 history
The 1985 GSM-03.38 standard fixed an SMS payload at 140 bytes. With a 7-bit character encoding, 140 bytes hold 1,120 bits ÷ 7 = 160 characters. That’s where the famous sms character limit of 160 comes from. The GSM-7 character set covers 128 base characters plus a 10-character extension (covering { } [ ] | \ ~ ^ € and form feed). Inside that set you get the full 160-char budget per segment.
Characters that fall outside GSM-7 and force a switch:
- All emoji
- Curly / smart quotes (
""''); note these are different from the ASCII straight quotes"' - Most accented Latin letters beyond the 35 in GSM-7 (
é á ñ ü øetc.; GSM-7 includes onlyä ö å æ ø à è ì ò ùand a few others) - Full-width punctuation, CJK characters, Arabic, Hebrew, Greek lowercase, Cyrillic
- Backtick
`and tilde~(the tilde is in the GSM-7 extension table, so it costs 2 of your 160 chars)
UCS-2 trap: one emoji drops you from 160 to 70
The moment a single non-GSM-7 character appears anywhere in the message, the entire message switches to UCS-2 encoding. UCS-2 uses 16 bits per character, so 140 bytes ÷ 2 = 70 characters per segment. Some real examples:
"Hello, your code is 12345" → 26 chars, GSM-7, 1 segment
"Hello, your code is 12345 ✓" → 28 chars, GSM-7 (✓ in extension), 1 segment
"Hello, your code is 12345 ✅" → 28 chars, UCS-2 (emoji), 1 segment (under 70)
"Hello, "your" code is 12345 ✅" → smart quotes + emoji → UCS-2
"Hi 你好" → CJK → UCS-2, 1 segment (5 chars)
That last “Hi 你好” example is the gotcha: it’s only 5 characters but it eats UCS-2 pricing and the next 65 characters you add will fit in one segment, then segment 2 starts.
Multi-part SMS segments (concatenation)
Once you cross 160 (GSM-7) or 70 (UCS-2), the message splits into multiple segments. Each segment carries a 7-character User Data Header (UDH) used for reassembly, so the available payload per segment drops:
- GSM-7 multi-part: 153 characters per segment
- UCS-2 multi-part: 67 characters per segment
The receiving phone reassembles the segments invisibly to the recipient, but billing is per segment, not per message. A 161-character GSM-7 message costs 2 segments. A 1,000-character GSM-7 message costs 7 segments (153 × 6 = 918, 7th segment carries the last 82).
Cost math: when one emoji doubles your bill
Take an 80-character plain-text marketing message:
- Plain text: 80 chars → GSM-7 → 1 segment at price X
- Add one emoji: 80 chars → UCS-2 → 80 > 70 → 2 segments at price 2X
Doubling the bill from one emoji is real and it scales. A campaign of 100,000 messages at $0.0075 per segment costs $750 in GSM-7 vs. $1,500 in UCS-2, a $750 emoji. Every major SMS provider (Twilio, Bandwidth, AWS SNS, MessageBird, Vonage) bills this way. The encoding rules are GSM standard, not vendor policy. The history of byte-level encoding tradeoffs, and why ASCII / UTF-8 / UCS-2 even exist as separate standards, is covered in Understanding Base64, which is the same family of “bits into characters” problem applied to email instead of SMS.
How to keep messages in GSM-7
- Use ASCII straight quotes
"', not smart quotes - Use ASCII hyphen
-, not em-dash—or en-dash– - Spell out
(c)and(R), not©and® - Avoid emoji unless the campaign budget assumes UCS-2 cost
- Provider consoles (Twilio’s, Bandwidth’s, MessageBird’s) show “encoding: GSM-7” or “UCS-2” next to the preview; verify before broadcast
The fastest sanity check during drafting is the Word Counter’s SMS progress bar, which reports against the 160-char baseline. If your text triggers UCS-2, mentally divide your character count by 2.29 to estimate the segment count under the 70-char rule.
SEO limits: meta description, title tag, OG, Twitter Card
SEO character limits are softer than platform limits (Google won’t reject your page if a meta description hits 300 characters), but the practical truncation rules matter for click-through rate. The numbers below still apply in 2026.
Meta description: 150-160 character sweet spot
Google’s desktop search results truncate the meta description around 155-165 characters; mobile clips somewhere between 100 and 120. The exact truncation point varies because Google measures display pixels, not characters. A description full of W and M glyphs hits the truncation pixel earlier than one full of i and l.
Practical writing rules:
- Target 150-160 characters total
- Put core message in the first 120 characters (mobile-safe)
- Lead with the meta description character limit keyword for the page in the first 30 characters
- End with a CTA in the last 30 characters, readable even when desktop cuts the middle
The 2017-2018 era saw Google briefly expand meta description display to 320 characters, and a generation of SEO tutorials still cites that number. Google reverted to 160 in mid-2018. Writing past 200 characters today just hides the second half.
A different failure mode: descriptions under 120 characters often get replaced entirely. Google decides your description doesn’t fully serve the query and pulls a different passage from the page body, so you lose CTR control without warning.
Title tag: 60 desktop, 50 mobile
Title tags clip at roughly 60 characters on desktop and 50 on mobile. Same pixel-based truncation as descriptions, same caveat about wide glyphs.
Sweet spot: 50-60 characters, with the target keyword in the first 30 so it survives any clip. Long-tail brand suffixes (| Brand Name) belong at the end, where truncation is least painful.
Pixel-width vs character-count: Google’s actual rule
Google’s SERP description container is roughly 920 pixels wide on desktop. Average character width sits around 6.5 pixels, yielding the 140-160 character empirical target. But the per-character spread is wide: i renders at about 3 pixels, M at about 11. A description of all-caps copy (“BEST WIDGETS FOR WINTER WEDDINGS”) clips substantially earlier than a lowercase equivalent.
Pre-publish previews using pixel-accurate SERP simulators are more reliable than character counters for SEO copy.
OG description and Twitter Card description
The Open Graph protocol’s og:description is what Facebook, LinkedIn, Slack, and Discord render under a shared link preview. Display caps vary by platform: most clip around 200 characters, some extend to 300. The Twitter Card twitter:description is hard-capped at 200 characters in Twitter’s parser.
Sensible defaults:
- 150-200 characters for both OG and Twitter Card
- They can match your meta description, but OG can run slightly longer because OG length doesn’t affect search ranking
- Validate your structured-data choices (especially what gets pulled into OG by mistake) using the patterns in Security Best Practices, where untrusted OG metadata is a common phishing vector
What “no character limit” actually means
H1 tags, body content, and URL slugs have no platform-enforced SEO character limit, but soft limits still apply:
- H1 > 70 characters breaks visual hierarchy and skim-ability
- URL slugs technically unlimited; Google displays around 90 characters in the SERP, anything beyond is cosmetic
- Body content has no length cap, but Google ranks helpful content over padding, so word count alone is not a ranking signal
The Word Counter tracks both meta description (160) and title tag (60) live as you draft, with progress bars that turn amber and red as you approach the truncation pixel.
Social platforms: Twitter/X, Instagram, LinkedIn, Facebook, and beyond
Each platform’s character ceiling has a story behind it and a sweet spot below the hard limit where content actually performs.
Twitter / X: 280, premium 25,000, URL substitution rule
The standard twitter character limit is 280 characters, doubled from 140 in November 2017. X Premium subscribers can post long-form content up to 25,000 characters with rich formatting, but the 280-char post is still the dominant form for organic reach.
The non-obvious rule is URL substitution. Twitter wraps every URL, no matter how long, in a 23-character t.co short link at publish time. The 23-character cost is fixed.
published_length = raw_length − URL_length + 23
Example: a draft like "Check this: https://example.com/very-long-path?id=12345" is 53 raw characters. The URL is 38 characters, so it gets replaced with a 23-char t.co link, and the published length is 53 − 38 + 23 = 38 characters. Save 15 characters you didn’t know you had.
For pasting a long URL into a draft, the URL encoder/decoder is a quick way to verify what counts as a URL (Twitter recognizes URLs by RFC 3986 patterns, query strings and fragments included). Subdomains, schemes, ports, paths, queries, and fragments are all swallowed by the 23-character substitution.
Other Twitter fields: display name 50 chars, bio 160 chars, handle 15 chars. Threads (Meta’s Twitter equivalent) uses a 500-character limit instead.
Instagram: 2,200 caption, 30 hashtags, 125-char hook
Instagram captions allow 2,200 characters, but the feed only shows the first 125 characters before collapsing the rest behind a ”… more” tap. More than half of readers never tap. The instagram caption limit that matters for engagement is therefore 125, even though the hard limit is 2,200.
The 30-hashtag cap is hard, and attempting a 31st hashtag fails the post. The 5-10 hashtag range tends to perform best; beyond 11 the discovery boost flattens and the post starts looking like spam to the algorithm.
Other fields: bio 150 chars, display name 30 chars, DM 1,000 chars.
LinkedIn: 3,000 post, 1,300 sweet spot, “see more” fold
The linkedin character limit for posts is 3,000, but feed displays only the first 210 characters before the “see more” fold. Posts in the 1,200-1,500 character range win engagement on LinkedIn (multiple Buffer and Hootsuite studies converge on around 1,300 as the peak); they’re long enough to demonstrate value, short enough not to wear out the scroll.
LinkedIn Articles (the long-form publishing surface) allow 110,000 characters, which is effectively unlimited. Profile headlines cap at 220, about-section text at 2,600.
Facebook: 63,206 chars, 80-char organic sweet spot
Facebook’s 63,206-character post limit is mostly trivia; in practice posts under 80 characters get about 30% higher organic engagement than longer ones (HubSpot consistently reports this across years). Above the fold, desktop shows about 477 characters; mobile cuts at around 125.
Comment max is 8,000 characters. Reactions, shares, and click-throughs all skew toward shorter posts, so long copy belongs in the linked article, not the Facebook caption.
Newer platforms: Bluesky, Mastodon, Threads, TikTok
- Bluesky posts cap at 300 characters and are the unusual case: Bluesky counts grapheme clusters, so the seven-codepoint family emoji 👨👩👧👦 costs 1 character, not 7
- Mastodon defaults to 500 characters per toot, but instance admins can raise this to 5,000 or even unlimited; check the instance you’re posting from
- Threads uses Twitter-style 500-character limits with codepoint counting
- TikTok captions allow 2,200 characters with about 100 shown above the fold
Reddit, Discord, Slack: long-form and community defaults
- Reddit title 300 characters (subreddit moderators often enforce <60 via AutoModerator); comments 10,000 characters
- Discord standard message 2,000 characters; embed descriptions 4,096; Nitro raises to 4,000 on plain messages
- Slack message 40,000 characters; above 2,000 readability drops sharply and many recipients ignore long messages
Word count targets by content type
Character limits dominate social and SEO; word counts dominate everything else: academic work, billing, content marketing, manuscripts. The table below gives a target range and a reading-time estimate (230 wpm, the Brysbaert 2019 silent-reading meta-analysis median) for each common content type.
| Content type | Word target | Reading time @ 230 wpm | Notes |
|---|---|---|---|
| Tweet | 30-40 words | 10 sec | optimize for character, not word |
| LinkedIn post (sweet spot) | 170-250 words | 1 min | above the fold |
| Instagram caption (hook) | 20-25 words | <10 sec | first 125 chars |
| Blog post — short | 500-700 words | 2-3 min | listicle, news, hot take |
| Blog post — standard | 1,000-1,500 words | 4-7 min | tutorial, deep guide |
| Blog post — long | 2,000-3,000 words | 9-13 min | comprehensive guide |
| SEO pillar page | 2,500-5,000 words | 11-22 min | topical authority |
| Academic essay (high school) | 500-1,500 words | 2-7 min | varies by assignment |
| Academic essay (undergrad) | 1,500-3,000 words | 7-13 min | per assignment |
| NaNoWriMo daily | 1,667 words/day | — | 50K words in 30 days |
| Novel — short | 50,000-70,000 words | — | YA, mystery |
| Novel — standard | 80,000-100,000 words | — | adult fiction |
| Conference talk (12 min @ 130 wpm) | 1,500-1,600 words | speaking | rehearse to confirm |
| Podcast episode (30 min @ 130 wpm) | 3,900 words | speaking | scripted portion |
Reading time is the more useful target unit for content marketing; readers respond to a “5-minute read” label more reliably than to a “1,150 words” label. Word count remains the unit for billing (translation invoiced per source word), platform compliance (NaNoWriMo’s 50K, an academic 2,000-word ceiling), and contract terms. The Word Counter shows both in real time as you type, plus speaking time at 130 wpm for talks and podcasts.
6 counting mistakes that break real apps
Six recurring failures seen in shipped code and shipped marketing campaigns. Each one is paired with the symptom, the root cause, and the fix.
Mistake 1: Using string.length for character-limit validation
Symptom: A user pastes a tweet with three emoji that’s actually 270 codepoints. Your front-end validation says 276 and refuses to submit. Or, worse, your code accepts a 285-codepoint draft because the emoji budget cancels out, and Twitter rejects it server-side.
Root cause: String.prototype.length in JavaScript returns UTF-16 code units. Every emoji is a surrogate pair, costing 2 units. Every astral-plane character (math symbols, ancient scripts) does the same.
Fix: Iterate by codepoint with the spread operator or Array.from.
// ❌ wrong
function isUnderTwitterLimit(text) {
return text.length <= 280;
}
// ✅ correct
function isUnderTwitterLimit(text) {
return [...text].length <= 280;
}
For deeper regex-based codepoint iteration patterns (including grapheme cluster handling), the Regex Cheat Sheet covers the /u and /v flags and Unicode property escapes.
Mistake 2: Splitting CJK text on whitespace for word count
Symptom: A 500-character Chinese article reports as 1 word. The translation quote based on it is off by 500x.
Root cause: CJK languages don’t use word-spaces. text.split(/\s+/) returns a single token containing the entire essay.
Fix: Count each CJK ideograph as one word, which is the convention used by Microsoft Word, Google Docs, and every native CJK word processor.
function countWordsMixed(text) {
const cjk = (text.match(/[一-鿿-ヿ가-]/g) || []).length;
const latin = (text
.replace(/[一-鿿-ヿ가-]/g, ' ')
.match(/[A-Za-z0-9]+(?:['’-][A-Za-z0-9]+)*/g) || []).length;
return cjk + latin;
}
The Unicode ranges cover CJK Unified Ideographs (U+4E00 to U+9FFF), Hiragana and Katakana (U+3040 to U+30FF), and Hangul Syllables (U+AC00 to U+D7AF), which are the four blocks Microsoft Word’s word-count counts as ideographs.
Mistake 3: Forgetting Twitter URL 23-char substitution
Symptom: A draft shows 320 characters in your counter, including an 80-character URL. You spend 10 minutes trimming it, only to realize Twitter would have accepted the original at 263 characters.
Root cause: Twitter replaces every URL with a 23-character t.co link at publish time. Your raw counter doesn’t know.
Fix: Pre-compute published length using raw − URL_length + 23 for each URL. For drafts containing multiple URLs, sum the corrections. URL detection in published content follows RFC 3986, the same parsing rules the URL Encoding & Decoding guide walks through.
Mistake 4: Writing meta description to 320 chars (old guideline)
Symptom: You crafted a 280-character meta description with the CTA at the end. In Google search results, the description cuts off mid-sentence at character 158 and the CTA never appears.
Root cause: Between December 2017 and May 2018, Google briefly expanded meta description display to 320 characters. Many SEO tutorials still cite that number. Google reverted to ~160 in mid-2018 and has held there ever since.
Fix: Write to 150-160 characters. Put the primary keyword in the first 30 characters and the CTA in the last 30. Use a pixel-accurate SERP simulator for high-stakes pages; wide glyphs (W, M, K) eat the budget faster than narrow ones (i, l, t).
Mistake 5: Confusing 280 characters with 280 words
Symptom: Someone on the team writes “we need a 280-word tweet” and produces 1,500 characters of perfectly fine prose. The tweet won’t post.
Root cause: Character-versus-word confusion. The two units differ by roughly 5-6x for English prose.
Fix: Pin the rule per platform. Twitter, SMS, and SEO meta count characters. NaNoWriMo, academic assignments, translation contracts, and most content-marketing briefs count words. When in doubt, check the platform’s own counter (Twitter’s compose box, Word’s Review > Word Count) before locking the spec.
Mistake 6: Pasting smart quotes that silently switch SMS to UCS-2
Symptom: You copy a customer-receipt template from a Google Doc into your SMS sender. The original was 145 characters and shipped as one GSM-7 segment. After paste, it’s the same 145 characters but bills as 2 UCS-2 segments. Costs double across a million-message campaign.
Root cause: Google Docs and Word auto-convert " and ' to typographer’s quotes " " and ' '. Those quotes aren’t in the GSM-7 character set, which flips the entire message to UCS-2.
Fix: Normalize before transmit:
function toGsm7Quotes(s) {
return s
.replace(/[“”]/g, '"') // " " → "
.replace(/[‘’]/g, "'") // ' ' → '
.replace(/[–—]/g, '-'); // – — → -
}
Run this before billing-sensitive sends. Twilio, MessageBird, and Bandwidth all expose an encoding field on the response; log it and alert when UCS-2 appears in templates you intended as GSM-7.
FAQ
What is the difference between character count and word count?
Character count counts every character including spaces, punctuation, and emoji, measured by Unicode codepoint on most modern platforms. Word count counts whitespace-separated tokens for Latin scripts and ideograph-by-ideograph for CJK. Twitter, SMS, and SEO meta descriptions use character count. Academic essays, NaNoWriMo manuscripts, and translation invoices use word count.
Why does Twitter count emoji as 1 character but JavaScript counts them as 2?
Twitter measures by Unicode code point, and every emoji is one codepoint, one character. JavaScript’s string.length measures UTF-16 code units. Most emoji are above U+FFFF and are encoded as surrogate pairs in UTF-16, so they take two code units and .length returns 2. Use [...text].length or Array.from(text).length to get the codepoint count Twitter actually counts.
Why is the SMS character limit 160 sometimes and 70 other times?
SMS uses 7-bit GSM-7 encoding by default, giving 160 characters in a 140-byte payload. If the message contains any non-GSM-7 character (emoji, smart quotes, CJK, accented Latin beyond a small set), the whole message switches to 16-bit UCS-2 encoding and the per-segment limit drops to 70 characters. One emoji anywhere in the message triggers the switch.
What is the ideal meta description length in 2026?
Aim for 150-160 characters. Google’s desktop SERP truncates around 155-165 depending on display pixel width; mobile clips between 100 and 120. Below 120 characters Google often replaces your description entirely with a passage from page body. Lead with the primary keyword in the first 30 characters and end with the CTA in the last 30, so the message survives truncation either direction.
Does character limit include spaces and emoji?
Yes, on virtually every platform. Spaces, line breaks, punctuation, and emoji each count as one Unicode codepoint. The two exceptions worth knowing: SMS where emoji trigger the encoding switch described above, and Bluesky which counts grapheme clusters so a multi-codepoint emoji like the family 👨👩👧👦 costs 1 character instead of 7.
How is word count calculated for Chinese, Japanese, Korean text?
Each CJK ideograph counts as one word, the convention used by Microsoft Word’s Chinese-mode word count, Google Docs, native CJK editors, and every commercial translation memory system. A 500-character Chinese essay reports as 500 words. Mixed text counts CJK ideographs by character and Latin tokens by whitespace, summing the two.
How does Twitter handle URL length in the 280-character limit?
Twitter automatically wraps every URL in a 23-character t.co short link at publish time, regardless of original length. The published length follows the formula published = raw − URL_length + 23 per URL. A draft of 320 characters containing one 100-character URL ships as 243 characters. Twitter recognizes URLs by RFC 3986 patterns, so query strings and fragments are absorbed into the URL token.
Related reading
- Regex Cheat Sheet: pattern matching for character validation, Unicode property escapes
- Text Diff Online Guide: comparing two pieces of text, line by line and character by character
- URL Encoding & Decoding Guide: character escaping rules when text travels through URLs
- Understanding Base64: the other half of “bits into characters” encoding, applied to email and binary data