Skip to content
Back to Blog
Tutorials

Text Diff Online: How to Compare Two Texts (Algorithm + 6 Use Cases)

Free text diff online — instant side-by-side + unified diff, no upload. Learn the LCS/Myers algorithm, text vs JSON diff, and 6 code-review use cases.

14 min read

Text Diff Online: How to Compare Two Texts (Algorithm + 6 Use Cases)

A text diff online tool answers one question fast: what changed between version A and version B? You paste two blocks of text, the tool runs a Longest Common Subsequence algorithm, and you get a side-by-side or unified view of every insertion, deletion, and edit, usually in under a millisecond.

This guide is for developers doing code review, SREs comparing log slices, lawyers redlining contracts, and writers reviewing edits. It covers the algorithm (LCS, Myers, Patience), the two standard views, the ignore options that fix 95% of “everything looks changed” complaints, when to reach for a JSON diff instead, six copy-paste use cases, and the pitfalls that the algorithm itself explains.

To compare two texts now, open Text Diff. It runs entirely in your browser, no upload.

1. What is a text diff?

A text diff is the smallest set of insertions and deletions that transforms one text into another, with each line marked added, removed, or unchanged. Modern diffs add a second pass at the word or character level so a one-character edit highlights only that token instead of the entire line.

1.1 Why character equality (===) is not enough

Insert one line at the top of a 200-line config file and a naive character comparison reports every single byte after the insertion point as different. The text didn’t change; its position did. A diff algorithm has to recognize that the next 199 lines are still the same lines, just shifted by one, and report a single insertion. That recognition is what LCS gives you, and it is why git, GitHub, and every code review tool ship one.

1.2 Side-by-side vs unified diff

Side-by-side puts the two versions in parallel columns and color-codes the cells: green for added, red for removed, yellow for modified. Unified diff is the older text format from GNU diff, one column, - and + markers, three lines of context around each hunk. Same comparison, two presentations. Section 4 covers when to use each.

1.3 Where text diff is used

Code review on GitHub and GitLab. Local git diff output. Patches pasted into Slack. Contract redlining. Translation review. CI snapshot tests that fail with +/- output. Log timeline investigation. Comparing two .env files. Anything where two blobs of text need to be matched line by line.

Open Text Diff and paste two texts to see how this works. Every comparison runs locally inside your browser.

2. The algorithm behind text diff (LCS + Myers + Patience)

2.1 Longest Common Subsequence

Given two sequences of lines A and B, the Longest Common Subsequence is the longest list of lines that appear in both, in the same order, without requiring adjacency. Once you have the LCS, the diff is straightforward: lines in A but not in the LCS are removed, lines in B but not in the LCS are added, lines in the LCS are unchanged.

Classical LCS runs as a dynamic-programming table of size N × M. Cell (i, j) holds the length of the LCS of the first i lines of A and the first j lines of B. Fill the table left to right, top to bottom, then walk backwards from the bottom-right cell to reconstruct the edit script. Time and space are both O(N×M), fine for two thousand-line files, slow for a hundred-thousand-line log.

2.2 Myers (1986)

Eugene Myers’ 1986 paper “An O(ND) Difference Algorithm and Its Variations” reframes the problem as a shortest path through an edit graph: nodes are positions (i, j) in the two inputs, horizontal moves are deletions, vertical moves are insertions, diagonal moves are matches. The shortest path is the minimum edit script.

Myers runs in O((N+M)D) time, where D is the size of the edit script. When the two texts are similar (the usual case for diffs), D is small and the algorithm is essentially linear. It is the default in git diff, GNU diff, and GitHub’s PR renderer. For ninety-nine percent of web inputs it works.

2.3 Patience diff (Bram Cohen, 2005)

Patience diff takes a different approach: find lines that appear exactly once in each input (called “unique anchor lines”), match them up, and recurse on the gaps between anchors. The math is messier, worst case stays bad, but the output reads much better on code.

Why? Myers minimizes the edit distance, which is mathematically optimal but visually awful when the optimal alignment crosses unrelated braces or blank lines. Patience refuses to align on common boilerplate (every file has } lines, every file has blank lines), so function boundaries stay intact. Bram Cohen invented it for Bazaar; Git ships it as git diff --patience. The closely related Histogram algorithm (git diff --histogram) is slightly faster with similar output quality.

Picture two versions of the same file where a function moved. Myers may align the closing brace of function A with the closing brace of function B and report the bodies as completely different. Patience anchors on the unique function names and reports a clean move. Same input, very different review experience.

2.4 Algorithm comparison

PropertyMyers (default)PatienceHistogram
Time complexityO((N+M)D)~O(N log N) common casesimilar to Patience
Optimal edit distanceYes, shortest scriptNo, may be longerNo, may be longer
Reads naturally on codeSometimes misaligns braces and blank linesAnchors on unique linesAnchors on unique lines
Used bygit default, GNU diff, GitHub UIgit diff --patience, Bazaargit diff --histogram
Best forSpeed and correctness on most inputsCode reviews, refactor diffsSame as Patience, slightly faster

2.5 What this tool does

Text Diff uses classical dynamic-programming LCS with two aggressive optimizations: common prefix and suffix trimming, and a second token-level LCS pass for intra-line word diff. A diff of two two-thousand-line configs with one changed line collapses to a 1×1 DP table after trimming and renders in under a millisecond. For typical web inputs the choice between Myers and DP is invisible; both finish faster than the browser can paint the result.

3. Intra-line word diff, why a one-character change highlights the whole line

You change one identifier on a line and the entire line lights up red and green. Bug? No, design.

The diff first runs LCS at the line level: “line 14 was replaced.” Then for every replaced pair it runs a second LCS at the token level. Tokens are produced by splitting on Unicode word boundaries: runs of letters and digits stay together, whitespace and punctuation each become their own token. The second LCS gives you the minimal token-level edit script inside that line.

The renderer draws the full line in the highlight color so your eye finds it, then paints only the changed tokens with the bright background. The unchanged tokens around them carry a dim version of the same color, present but visually quiet. Your eye lands on the exact edit.

Example 1: identifier rename. function getUser(id) becomes function getUser(userId). The entire line is marked modified. Inside the line, only id (struck through red) and userId (bright green) carry the inline highlight. Everything else stays dim.

Example 2: log latency change. POST /api/orders 201 88ms becomes POST /api/orders 201 4200ms. The line is modified. Inline, only 88 and 4200 are bright. The path, method, and status code stay dim, which is what an incident timeline reader needs.

When too many tokens change, word-level highlighting becomes noise. The tool falls back to a paired remove + add presentation: the original line shown removed, the new line shown added, no intra-line coloring. The threshold is roughly “more than half the tokens differ.”

The summary: line-level diff tells you which line changed; word-level diff tells you which characters on that line carry the change. Click Sample inside Text Diff to see both views on identical input.

4. Side-by-side vs unified diff, two views, one diff

4.1 Side-by-side view

Two columns: original on the left, modified on the right. Lines that match are aligned horizontally. Added lines appear only in the right column with a green background; removed lines appear only in the left column with a red background; modified pairs sit next to each other with a yellow gutter and intra-line word highlights.

Use side-by-side when a human will read the diff: PR review, teaching, demos, walking through a contract change with a non-technical stakeholder. It is the view for eyes.

The downside: it does not transport. You cannot paste a side-by-side rendering into Slack and have anyone apply it. You cannot pipe it to patch. For sharing and applying, you need unified.

4.2 Unified diff format

Unified diff is a fifty-year-old plain-text format defined by GNU diff and standardized in POSIX. A complete example:

--- original
+++ modified
@@ -1,3 +1,4 @@
 1. The service is provided as-is.
 2. Either party may terminate with 30 days notice.
+2a. Termination notice must be in writing.
 3. Disputes are resolved in California courts.

The first two lines name the source files. The @@ -L,C +L,C @@ line is a hunk header: -L,C means starting at line L of the original, C lines are involved; +L,C says the same for the modified version. Inside the hunk, lines starting with a space are context (unchanged), - is removed, + is added.

Three lines of context above and below each change is the GNU default. Most tools let you change it with -U n: diff -U0 for no context, diff -U10 for ten lines. The hunk header tracks whatever you pick.

In Text Diff, click the Unified tab to switch views or click Copy unified diff to put the patch on your clipboard.

4.3 Where unified diff is portable

Unified diff travels. It is the standard interchange format for textual change.

DestinationAccepts unified diff?How
GNU patchYespatch -p1 < diff.patch
git applyYesgit apply diff.patch
GitHub PR review commentYes (in a ```diff block)Renders with color
GitLab MR commentYesSame fenced block
Bitbucket / Azure DevOps PRYesSame fenced block
Slack / Discord pastePartialRenders as text in a code block, no color
VS Code “Open Patch”YesApply patch via Source Control
Jira / Linear issue bodyPartialWorks in a code block, no apply button

The same nine lines of ---/+++/@@ text apply on patch, on git apply, render in three PR platforms, and survive a Slack paste. No other diff format has comparable reach.

4.4 When to pick which

Side-by-side for review, unified for sharing and applying. If you are reading the diff yourself, the columns are faster. If anyone or anything downstream needs to consume it (a reviewer, a tool, a patch command), copy the unified format.

5. Ignore options, whitespace, case, blank lines, line endings

Most “everything looks changed” complaints are noise. Four toggles fix 95% of them.

  1. Ignore case maps A to a. Equivalent to git diff -i. Use it for environment-variable comparisons, SQL keyword style audits, anywhere the convention is shouty caps versus quiet caps but the meaning is identical.
  2. Ignore all whitespace collapses every space, tab, and newline before comparison. Equivalent to git diff -w. The cure for tabs ↔ spaces reformatting, indentation rewrites, and “we switched to Prettier” diffs that destroy line counts. An ignore-whitespace diff on those changes typically goes from 87 modifications to 4.
  3. Ignore trailing spaces and tabs strips end-of-line whitespace only. Equivalent to git diff -b. The cure for CRLF noise after copying between Windows and Unix machines: the trailing \r characters get filtered out and the actual content lines up.
  4. Ignore blank lines drops empty lines before diffing. The cure for “I added one paragraph break and now paragraph 12 looks completely different” in prose diffs.

A 200-line config that reports “87 modifications” usually drops to “4 modifications” after Ignore all whitespace. A Windows-to-Unix copy that flags every line drops to zero with Ignore trailing spaces. Each toggle is independent and persists between sessions.

CRLF vs LF. Windows line endings are \r\n; Unix is \n; classic Mac is \r. Open a Windows file in a Unix editor that does not normalize and you keep the trailing \r. Every line will diff as “the content matches but there is a \r at the end.” Ignore trailing spaces silences this without losing real changes.

One warning. Ignore options cut both ways. Turn on Ignore case and a refactor that changes LOG.error to log.Error looks identical. Turn on Ignore all whitespace and a Python indentation bug becomes invisible. Pick the toggles for the question you are asking, then turn them off when you are done.

6. Text diff vs JSON diff vs git diff, decision matrix

Text diff is line-and-word matching with no understanding of structure. That is what you want for prose and what you do not want for JSON.

6.1 Decision matrix

Input typeText diffJSON diffGit diff
Prose / Markdown / contractBestWrong toolPartial (works on tracked files only)
Code snippet (single file paste)BestWrong toolPartial (needs a repo)
Code in a repo (multi-file)PartialWrong toolBest
API JSON responseWrong tool (false positives on key order)BestWrong tool
YAML / TOML configPartial (false positives on key order)Best (after conversion)Partial
CSV row-by-rowPartialWrong toolWrong tool
Log / heredocBestWrong toolWrong tool
Binary fileWrong toolWrong toolgit diff --binary

6.2 When text diff is the wrong tool

Three classic mistakes.

JSON with reordered keys. {"a":1,"b":2} and {"b":2,"a":1} are the same JSON document. A text diff reports every line as changed because they really are different lines. Use JSON Diff; it understands that JSON keys are unordered.

YAML configs that were reformatted. Change one value, run the file through a formatter, and indentation, key order, and quoting all shift. Text diff reports a complete rewrite. Convert both files to JSON first, then compare with JSON Diff.

Multi-file refactors with renames. Git tracks renames; text diff does not. If you compare two trees by concatenating files into one blob, every cross-file move shows up as removed + added. Use git diff (or git diff --find-renames=80%) instead.

6.3 When text diff fits

Reach for a text diff whenever you need to compare two texts whose lines themselves carry meaning. Prose. Code snippets pasted from anywhere. Contract redlines. Log slices. Translation review where you are matching natural-language sentences. .env files where order matters because shells read them top to bottom.

For the deep dive on filtering noise out of JSON diffs (timestamps, request IDs, auto-generated UUIDs), read JSON Diff Ignore Timestamps & IDs.

7. Six real-world use cases (with copy-paste inputs)

Six concrete reasons to compare two texts side by side, each with copy-paste inputs you can drop straight into Text Diff.

7.1 Code review snippet, function rename

You are reviewing a PR. The author renamed id to userId and added a guard clause. Paste both versions:

// Original
function getUser(id) {
  const u = db.users.find(x => x.id === id);
  return u;
}
// Modified
function getUser(userId) {
  if (!userId) return null;
  const u = db.users.find(x => x.id === userId);
  return u;
}

The diff shows three modified lines plus one added line. Inline word highlighting marks every iduserId token; the new guard clause appears with a green background. Ignore options off. Try this in Text Diff and copy the unified output to leave as a review comment.

7.2 Contract or policy redline, one inserted clause

Fifty paragraphs of contract, one inserted clause. Paste yesterday’s version on the left and today’s on the right:

1. The service is provided as-is.
2. Either party may terminate with 30 days notice.
3. Disputes are resolved in California courts.
1. The service is provided as-is.
2. Either party may terminate with 30 days notice.
2a. Termination notice must be in writing.
3. Disputes are resolved in California courts.

The diff renders forty-nine unchanged lines and one added line (+2a. Termination notice must be in writing.). Export the unified diff as the legal review trail.

7.3 Log timeline investigation

You suspect a latency regression. Grab a slice of access logs from before and during the incident:

GET /api/users 200 14ms
POST /api/orders 201 88ms
GET /api/orders/42 200 21ms
GET /api/users 200 14ms
POST /api/orders 201 4200ms
GET /api/orders/42 500 21ms

Inline highlight surfaces 884200 (a 50× latency jump) and 200500 (an order detail endpoint started failing). For richer log work (pulling fields, grouping by endpoint, computing percentiles), pair the diff with the jq command-line cheat sheet if your logs are JSON.

7.4 Translation review, preserving placeholders

You hired a new translation agency and want to verify the new copy matches the old in structure. Paste old translation on the left, new on the right. Toggle Ignore trailing spaces / tabs because translators routinely add a stray space at the end of strings.

The diff confirms every {username}, {count}, and %s placeholder stays in place; only the natural-language text changes. A missing placeholder shows up as a removed token in the inline diff, caught before you ship. If you need to compare placeholder formats themselves, the regex cheat sheet covers \{\w+\} and friends. Try this in Text Diff.

7.5 Config or .env audit, production vs staging

Compare two .env files. Turn on Ignore blank lines so paragraph-style grouping does not misalign sections. The diff shows you which keys differ in value, which keys exist in one environment but not the other, and where comments have drifted out of sync. Five minutes that prevent the “it works in staging but not in prod” debugging session.

7.6 Prose or draft revision

Your editor returned a draft. Paste your original on the left and the edited version on the right. Inline word diff shows you which sentences were rewritten, which were untouched, and which paragraphs were inserted. Accept or reject changes one at a time, no Track Changes feature, no Word file, no proprietary format.

8. Common pitfalls and how to read them as symptoms

Algorithm behavior explains most user pain. Five common complaints and what they actually mean.

Pitfall 1: “Every line is red after a Windows-to-Unix copy.” Symptom: every line in the diff shows as changed even though the content looks identical. Cause: trailing \r characters from CRLF line endings. Fix: toggle Ignore trailing spaces / tabs. The diff will drop to the real changes.

Pitfall 2: “I pasted JSON and 100% of lines are different.” Symptom: two JSON objects that should be equivalent show as fully changed. Cause: key reorder. Text diff treats line order as significant; JSON does not. Fix: use JSON Diff for any JSON input.

Pitfall 3: “Reformatting tabs ↔ spaces blew up the diff.” Symptom: 87 modifications, all of them indentation. Cause: your formatter changed every line’s leading whitespace. Fix: Ignore all whitespace will collapse the noise and surface the real semantic changes.

Pitfall 4: “Diff says identical but cmp disagrees.” Symptom: the diff reports no differences but a byte-level comparison says the files differ. Cause: an ignore option left on from a previous session is masking real changes. Fix: open the Ignore options panel and turn every toggle off, then re-diff.

Pitfall 5: “One short edit shows up as remove + add.” Symptom: a small change appears as a separate removed line and a separate added line instead of an inline highlight. Cause: the proportion of changed tokens crossed the inline threshold and the renderer fell back to the paired-line presentation. This is design, not a bug. Switch to the Unified view to see the classic -/+ pair that patch tools expect.

9. Privacy, performance, and when to reach for the command line

Every comparison in Text Diff runs in JavaScript inside your browser. No upload, no temporary file, no server log, no analytics on the text you paste. Safe for proprietary code, internal contracts, private logs, anything you would not be willing to paste into a third-party server.

Practical limits: about 5,000 lines or 1 MB per side. Live diff disables above 200 KB combined and switches to a manual Diff button so typing does not block the page. Above 5,000 lines the input is truncated and a warning shows. The limits exist because the diff runs on the main thread (no web worker), and a worker handoff plus serialization would cost more than the diff itself on small inputs.

When your input outgrows the browser, drop to the command line:

# Unified diff between two files
diff -u a.txt b.txt

# Same, but using git's diff engine (Patience, Histogram, color)
git diff --no-index a.txt b.txt
git diff --no-index --patience a.txt b.txt

# Streaming diff viewer for huge files (Rust, side-by-side, syntax-aware)
delta a.txt b.txt

Switch to the command line for multi-megabyte logs, binary files, multi-file repository diffs, anything where you want syntax-aware coloring like delta, or anywhere you need to pipe diff output into another tool.

10. Unicode, CJK, and RTL, international text diff notes

The tokenizer splits on Unicode word boundaries using three categories: word runs (\p{L} letters and \p{N} numbers), non-word punctuation, and whitespace. Each category produces its own tokens, so hello, world! becomes hello, ,, , world, !, five tokens.

For CJK content (Chinese, Japanese, Korean), each ideograph or kana is its own token. Change one character in a Chinese sentence and only that character carries the inline highlight while the rest of the line stays dim. Paragraph-level structure is still line-based, so a sentence rewrite that adds a line break shows up as a line-level edit, not a token-level one.

For RTL languages (Arabic, Hebrew), the diff uses logical CSS directions (ms-, me- instead of ml-, mr-). On RTL locales the gutter and line columns flip naturally; inside each diff cell, text direction follows the content, so Arabic strings render right-to-left while the + and - markers stay aligned to the start gutter.

Line ending normalization recognizes \r\n (Windows), \n (Unix), and bare \r (old Mac OS through version 9). All three split as separate lines, so a file converted from one platform to another does not collapse into a single mega-line.

11. FAQ

How does an online text diff work?

A text diff splits both inputs into lines, runs a Longest Common Subsequence algorithm (typically Myers’ O((N+M)D)) to find the smallest set of insertions and deletions, then highlights added (green), removed (red), and unchanged (gray) lines. A second token-level LCS marks the changed words inside each modified line. Text Diff runs the entire comparison locally in your browser.

What’s the difference between text diff and JSON diff?

Text diff compares line by line, ideal for prose, code, logs, and contracts. JSON Diff understands JSON’s data model: key order is irrelevant, types are strict (1"1"), arrays may be matched by key. Paste JSON into a text diff and key reorders or whitespace will surface as changes that JSON Diff ignores. Use text diff for unstructured content, JSON Diff for API responses and configs.

Why does the diff show whole lines changed when I only edited one word?

It doesn’t. The line is highlighted because something on it changed, but inside the highlight only the changed tokens carry the bright background (green for added, red strikethrough for removed). This is intra-line word diff: line context stays readable while your eye lands on the exact edit. When too much of a line changed for word-level highlighting to be useful, the diff falls back to a separate remove + add pair so the structure stays clean.

How do I ignore whitespace, case, or blank lines in the diff?

Toggle the Ignore options panel. Ignore case makes A and a equal. Ignore all whitespace collapses every space, tab, and newline, equivalent to git diff -w. Ignore trailing spaces and tabs mirrors git diff -b and silences CRLF noise. Ignore blank lines drops empty lines so paragraph re-spacing stops misaligning the diff. Each option is independent and persists between sessions.

What is unified diff format?

Unified diff is the ---/+++/@@ -L,C +L,C @@ text format introduced by GNU diff in the late 1980s and used by git, GitHub, GitLab, and the Unix patch command. Each hunk shows three context lines around the change, with - for removed and + for added. Copy unified output into a PR comment, paste it into git apply, or run patch -p1 < diff.patch. It applies cleanly.

Myers vs Patience: which diff algorithm is better for code review?

Myers is the default in git diff and GNU diff, fast and mathematically minimal, but sometimes aligns unrelated blank lines or closing braces, producing diffs that read weird. Patience (Bram Cohen, 2005) anchors on lines that appear exactly once in each input and recurses between anchors, so function boundaries stay intact. Use git diff --patience (or --histogram for similar results, slightly faster) when reviewing refactors.

Is the text I paste sent to any server?

No. Every comparison in Text Diff runs locally in JavaScript inside your browser. Your text is never uploaded, logged, stored on disk, or sent to any third party. Only your UI preferences (view mode and ignore-option toggles) are saved to localStorage so the page remembers them next visit, never the text. Verify with DevTools → Network: zero requests fire when you click Diff.

How large can the two inputs be?

Practical limit is about 5,000 lines or 1 MB per side. Live diff disables above 200 KB combined and switches to a manual Diff button. Above 5,000 lines the input is truncated with a warning. For multi-megabyte files, switch to diff -u a.txt b.txt, git diff --no-index a.txt b.txt, or delta; they stream and handle gigabytes.

Related Articles

View all articles