Skip to content

HTML to Markdown Converter

Convert HTML to clean Markdown in your browser — GFM tables, task lists, and links. Choose ATX/Setext headings and inline or reference links. Great for migrating web content or feeding LLMs. 100% private, no upload.

No Tracking Runs in Browser Free
Everything runs in your browser. Your Markdown and HTML never leave your device.
HTML
Headings
Links
Markdown
Verified against CommonMark/GFM output, lossy-conversion behavior, and heading/link style options — Go Tools Engineering Team · Jun 5, 2026

What is HTML to Markdown Conversion?

HTML to Markdown conversion takes a rendered HTML document — the tags, attributes, and nesting a browser displays — and rewrites it as Markdown, the lightweight plain-text format built for writing and version control. Where Markdown to HTML expands compact text into markup for display, this is the reverse and reductive direction: you start with rich, verbose HTML and distil it down to the small, readable set of conventions Markdown offers.

Under the hood the converter parses your HTML into a DOM tree — the same node structure a browser builds — then walks that tree and emits the Markdown equivalent for each node it recognises. An <h2> becomes ## , a <strong> becomes **text**, a <ul> becomes a bulleted list, an becomes a link, a <table> becomes a GFM pipe table. Traversing a real DOM, rather than running regular expressions over the raw string, is what lets it handle nested lists, mixed inline formatting, and tables correctly instead of breaking on edge cases.

You reach for this conversion when you are migrating out of HTML, not into it. Content trapped in a CMS, a WYSIWYG editor, an old web page, or a rich-text field is hard to diff, hard to review, and hard to move. Converting it to Markdown frees it into a format that lives happily in a Git repo, a static-site generator, or a notes app — and, increasingly, into a format that large language models read efficiently. The catch, which honest tools state plainly, is that the conversion is lossy: HTML can express things Markdown cannot, so some structure and every styling detail are deliberately discarded in exchange for clean, portable text.

The reverse operation — Markdown back to HTML, for when you are ready to publish or preview — is just as useful. Switch to the Markdown → HTML tab or open the dedicated Markdown to HTML converter.

HTML in:

  <h2>Pricing</h2>
  <p>Plans start at <strong>$9/mo</strong>. See the <a href="https://example.com/pricing">details</a>.</p>
  <table>
    <thead><tr><th>Plan</th><th>Price</th></tr></thead>
    <tbody><tr><td>Pro</td><td>$9</td></tr></tbody>
  </table>

Markdown out:

  ## Pricing

  Plans start at **$9/mo**. See the [details](https://example.com/pricing).

  | Plan | Price |
  | ---- | ----- |
  | Pro  | $9    |

  <!-- <div>, classes, and inline styles in the source are dropped — Markdown can't represent them. -->

Key Features

GFM-Aware Output

Targets GitHub Flavored Markdown, not just plain CommonMark: HTML tables become pipe tables, checkbox <li>s become task lists (`- [x]`), and <del>/<s> become ~~strikethrough~~. The Markdown drops straight into a README, a GitHub issue, or a docs site and renders the same way.

ATX or Setext Headings

Choose hash-prefixed ATX headings (# H1) or underlined Setext headings (=== for H1, --- for H2). Setext covers only the top two levels, so the converter falls back to ATX for H3 and deeper automatically — you never get an invalid heading.

Inline or Reference Links

Switch between inline links — [text](url) next to the prose — and reference links, which collect every URL into a numbered list at the foot of the document. Reference style keeps link-heavy paragraphs readable and lets you reuse a URL by label.

Fenced Code Blocks

A <pre><code> block becomes a fenced code block with triple backticks, and a language- class on the <code> element carries through as the fence's info string. Inline <code> becomes backtick spans, so snippets survive the trip intact.

Handles Nested Lists and Tables

Walks the real DOM, so nested <ul>/<ol> structures convert to correctly indented Markdown lists and ordered lists renumber from 1. Simple tables flatten to pipe tables; genuinely complex ones fall back to raw HTML rather than losing data.

100% Private, In-Browser

Every conversion runs locally with JavaScript — your HTML and the resulting Markdown never leave your device, never hit a server, and work offline after the page loads. Safe for internal CMS exports, customer content, and unpublished pages.

Examples

Web <table> to a GFM pipe table

<table>
  <thead><tr><th>Region</th><th>Sales</th></tr></thead>
  <tbody>
    <tr><td>EMEA</td><td>1,204</td></tr>
    <tr><td>APAC</td><td>980</td></tr>
  </tbody>
</table>
| Region | Sales |
| ------ | ----- |
| EMEA   | 1,204 |
| APAC   | 980   |

A scraped or copied HTML <table> collapses into a GitHub Flavored Markdown pipe table. The <thead> row becomes the header, the dashed delimiter row is generated for you, and each <tr> becomes one pipe-delimited line — ready to drop into a README or a docs page.

Links: inline vs reference style

<p>Read the <a href="https://example.com/guide">setup guide</a> and the <a href="https://example.com/api">API reference</a>.</p>
Inline:
Read the [setup guide](https://example.com/guide) and the [API reference](https://example.com/api).

Reference:
Read the [setup guide][1] and the [API reference][2].

[1]: https://example.com/guide
[2]: https://example.com/api

The same anchors render two ways. Inline keeps the URL next to the text; reference style moves every URL to a numbered list at the bottom, which keeps long paragraphs readable when a sentence carries several links. Pick the style with the Links radio.

Nested <ul>/<ol> to indented Markdown lists

<ul>
  <li>Build
    <ol>
      <li>Compile</li>
      <li>Bundle</li>
    </ol>
  </li>
  <li>Ship</li>
</ul>
- Build
  1. Compile
  2. Bundle
- Ship

Nesting is preserved by indentation: the inner <ol> sits two spaces under its parent <li> and switches from a `-` bullet to `1.` numbering. Markdown re-numbers ordered lists automatically, so the source stays clean even if the HTML used explicit value attributes.

A chunk of web-page HTML to clean Markdown

<article>
  <h1>Changelog</h1>
  <p>We shipped <strong>dark mode</strong> and fixed <code>parseDate()</code>.</p>
  <blockquote><p>Thanks to everyone who reported it.</p></blockquote>
</article>
# Changelog

We shipped **dark mode** and fixed `parseDate()`.

> Thanks to everyone who reported it.

Paste a slice of a real page — the <article> wrapper is dropped (Markdown has no container element), the <h1> becomes `#`, <strong> becomes `**`, inline <code> becomes backticks, and the <blockquote> becomes a `>` line. Structural wrappers with no Markdown equivalent simply fall away.

How to Convert HTML to Markdown

  1. 1

    Paste your HTML

    Drop in a copied web page, a CMS or WYSIWYG export, or a scraped HTML snippet. The DOM is parsed and serialised to Markdown in your browser as you paste — no upload, no size cap beyond your browser's memory.

  2. 2

    Choose heading and link styles

    Pick ATX (#) or Setext (===) headings and inline or reference links. The Markdown re-renders live, so you can compare styles instantly. The output targets GitHub Flavored Markdown — tables, task lists, and strikethrough included.

  3. 3

    Copy or download

    Click Copy to grab the Markdown, or Download to save a .md file. To go the other way, switch to the Markdown → HTML tab and paste your Markdown to get rendered HTML back.

Common Pitfalls

Expecting <div>/<span> Structure to Survive

Generic containers carry no Markdown equivalent, so they are unwrapped — their content stays but the tag, and any class or style on it, vanishes. If your layout depended on a wrapping <div> or a styled <span>, that styling is gone in the Markdown. This is expected, not a bug; Markdown simply has no way to express it.

✗ Wrong
<div class="callout warning"><span style="color:red">Heads up!</span></div>
<!-- expecting the callout box and red colour to survive -->
✓ Correct
Heads up!
<!-- container and styles dropped; only the text remains in Markdown -->

Lost <br> Line Breaks Inside Paragraphs

A <br> inside a paragraph is a soft line break, which Markdown represents with two trailing spaces before the newline (or a backslash). Pasting HTML and expecting visible line breaks to survive can surprise you when adjacent lines reflow into one. The converter emits the hard-break form, but if you hand-edit afterward, do not strip the trailing spaces.

✗ Wrong
Line one<br>Line two
<!-- if the break form is removed, these merge into one line -->
✓ Correct
Line one  
Line two
<!-- two trailing spaces preserve the <br> as a hard break -->

Deeply Nested Tables Degrading

GFM pipe tables cannot nest or hold block content. A legacy layout that puts a table (or a list, or multiple paragraphs) inside a table cell cannot become a clean pipe table — the converter flattens what it can and leaves the rest as raw HTML so nothing is lost. The fix is to simplify the source, not the output.

✗ Wrong
<table><tr><td><table><tr><td>inner</td></tr></table></td></tr></table>
<!-- nested table can't become a flat pipe table -->
✓ Correct
<!-- Flatten to a single-level table first: -->
<table><tr><td>inner</td></tr></table>
→ | inner |
  | ----- |

Expecting <script> or Styles to Survive

<script>, <style>, and head-level elements are code and presentation, not document content, so they are stripped entirely — not converted, not preserved as raw HTML. Pasting a full page and expecting behaviour or CSS to carry into the Markdown will disappoint. Markdown is a content format; if you need the code or styling, keep the HTML.

✗ Wrong
<style>.x{color:blue}</style>
<script>track()</script>
<p>Body</p>
<!-- expecting the style and script to come through -->
✓ Correct
Body
<!-- only the content survives; <script>/<style> are dropped -->

Common Use Cases

Migrate web or CMS content into Notion, Obsidian, or a static site
Pull pages out of a CMS, a WordPress export, or an old HTML site and convert them to Markdown that drops straight into Notion, Obsidian, Hugo, or Jekyll. You trade verbose markup for portable text that lives cleanly in a Git repo and diffs sensibly in review.
Export from a WYSIWYG editor
Rich-text editors emit dense, attribute-heavy HTML. Paste that output here to recover the clean Markdown underneath — headings, lists, links, and emphasis — so the content can move into a docs pipeline or a Markdown-based knowledge base instead of staying locked in the editor.
Clean HTML into Markdown to feed LLMs and RAG pipelines
Raw HTML burns tokens on tags, scripts, and styling a model never needs. Converting a scraped page to Markdown strips that noise while keeping the structure an LLM reads well, so you fit more real content in the context window and get cleaner embeddings for retrieval.
Convert a rich-text paste into Markdown
Copy formatted text from a web page, an email, or a doc and it arrives as HTML on the clipboard. Paste it here to turn that rich text into Markdown you can commit, send in a pull request, or write into your notes — formatting preserved, clutter gone.
Archive a page as Markdown
Save the meaningful content of a web page as a small, future-proof .md file instead of a heavy HTML snapshot full of scripts and tracking. Markdown stays readable in any text editor decades from now and takes a fraction of the space.
Turn legacy HTML docs into Markdown
Old documentation written as hand-coded HTML is painful to maintain. Convert it to Markdown to bring it into a modern docs-as-code workflow — where it can be linted, reviewed in pull requests, and rendered by a static-site generator.

Technical Details

CommonMark vs GitHub Flavored Markdown Output
The converter can target plain CommonMark or, by default, the GitHub Flavored Markdown superset. CommonMark defines headings, emphasis, lists, links, images, code, and blockquotes precisely. GFM adds four constructs that map directly from common HTML: <table> → pipe table, checkbox list items → task lists, <del>/<s> → strikethrough, and bare URLs → autolinks. Because most web content uses tables and the like, GFM output is the practical default; choose CommonMark only when the destination renderer does not understand GFM extensions, in which case tables fall back to raw HTML.
Lossy, Irreversible Conversion — Stated Plainly
HTML is strictly more expressive than Markdown, so the conversion cannot be lossless, and it is worth being upfront about that. Markdown has no syntax for <div>, <span>, or other generic containers; no way to carry class names, id, inline style, colspan/rowspan, or arbitrary data-* attributes; and no representation for most semantic or layout elements. Those are unwrapped (content kept, tag dropped), discarded (attributes), or — when dropping would lose meaning — preserved as raw inline HTML. A round-trip HTML → Markdown → HTML will not reproduce the original. This is a deliberate trade: Markdown exists to be clean, diffable, and human-editable, not to mirror HTML. Most competitors gloss over this; stating it lets you decide with eyes open whether Markdown is the right target.
Style Trade-offs: ATX/Setext, Inline/Reference, Fenced/Indented
Three output choices have real trade-offs. ATX headings (#) cover all six levels and grep cleanly; Setext (underlined) only exists for H1/H2, so the tool emits it for the top two levels and falls back to ATX below. Inline links keep the URL beside the text — best for sparse links; reference links pull URLs to the document foot — best for link-dense prose and reuse by label. For code, fenced blocks (triple backticks) carry a language info string and nest safely, whereas indented (four-space) code blocks cannot express a language and break inside lists — so this converter always emits fenced blocks from <pre><code>.

Best Practices

Format the HTML Before You Convert
Minified or deeply tangled HTML — especially nested layout tables and stray inline elements — converts more cleanly when it is well-formed first. Run messy source through our HTML Formatter to pretty-print and normalise it, then convert. Clean input yields clean Markdown with fewer raw-HTML fallbacks.
Expect and Review the Lossy Drops
Treat the conversion as lossy by design. Classes, inline styles, <div>/<span> wrappers, and exotic attributes are gone in the output, and that is usually what you want for portable Markdown — but skim the result for anything semantically important that lived only in an attribute (an aria-label, a colspan-merged cell) and add it back by hand if it matters.
Pick the Link Style for the Document's Density
Use inline links for prose with a link here and there — the URL stays next to its text and the source reads naturally. Switch to reference links when a section is link-heavy or reuses the same URLs: pulling them to a numbered list at the foot keeps paragraphs scannable and avoids repeating long URLs.
Convert to Markdown Before Sending Pages to an LLM
When you feed web content to a model — for a prompt, an embedding, or a RAG store — convert the HTML to Markdown first. You strip tags, scripts, and styling that waste tokens and add noise, keep the structure the model actually uses, and fit substantially more real content inside the context window.
Verify Complex Tables After Conversion
GFM pipe tables are flat — no nested tables, no block content in cells, no merged cells. After converting a data-heavy or layout table, check the Markdown: simple grids convert perfectly, but anything with colspans or nested blocks degrades and may appear as raw HTML. Simplify the source table first if a clean pipe table matters.

Frequently Asked Questions

How are inline vs reference links handled?
You choose with the Links radio. Inline style writes each anchor as [text](url) right where it appears — compact and obvious for one or two links per paragraph. Reference style writes [text][1] in the prose and collects all the URLs as [1]: https://… definitions at the bottom of the document, which keeps text with many links readable and lets you reuse a URL by label. Both produce identical rendered output; it is purely a source-readability choice. Images follow the same rule: an <img> becomes ![alt](src) inline or ![alt][1] in reference mode.
ATX vs Setext headings — which should I use?
ATX headings prefix the line with hashes — # H1, ## H2, ### H3 — and work for all six levels. Setext headings underline the text instead: a row of = under a line makes it an H1, a row of - makes it an H2. The catch is that Setext only exists for levels 1 and 2, so this converter emits Setext for <h1>/<h2> and automatically falls back to ATX for <h3> and deeper. ATX is the more common, more portable choice and is easier to grep; pick Setext only if a downstream style guide or linter requires it.
What happens to HTML that Markdown can't represent, like <div> and <span>?
Markdown has no syntax for generic containers, so structural wrappers such as <div>, <span>, <section>, and <article> are unwrapped — their text and child elements are kept, but the tag itself disappears because there is nothing in Markdown to map it to. Class names, id attributes, inline style attributes, and data-* attributes are dropped for the same reason: Markdown carries no way to express them. When an element genuinely has no Markdown equivalent and dropping it would lose meaning, the converter leaves it as raw inline HTML rather than silently deleting the content. This is by design — see the question on whether the conversion is lossless.
Does it strip <script> and styles?
Yes. <script> and <style> elements, along with their contents, are removed entirely — they are code and CSS, not document content, and have no place in Markdown. The same goes for <link>, <meta>, and other head-level elements when you paste a whole page. Inline event handlers like onclick and CSS in style attributes are dropped as well. The result is text content only, which is exactly what you want when the Markdown is headed for a docs repo, a static-site generator, or an LLM context window. If you need the styling preserved, Markdown is the wrong target format.
How are nested tables and lists handled?
Nested lists convert cleanly: each level of <ul>/<ol> nesting becomes two spaces of indentation, and ordered lists are renumbered from 1. Tables are trickier. GitHub Flavored Markdown pipe tables are flat by specification — a table cell cannot contain another table, and it cannot contain block elements like lists or multiple paragraphs. So a simple <table> converts to a clean pipe table, but a table with a nested table inside a cell, or with block content in cells, degrades: the converter flattens what it can and falls back to leaving the complex parts as raw HTML so no data is lost. Deeply nested layout tables from legacy pages are the worst case — consider simplifying the HTML first.
Is HTML to Markdown lossless?
No, and it is important to be honest about that. HTML is far more expressive than Markdown: it has hundreds of elements and arbitrary attributes, while Markdown covers a small, deliberate set — headings, emphasis, lists, links, images, code, blockquotes, and (with GFM) tables, task lists, and strikethrough. Anything outside that set has no representation: colspans, custom attributes, inline styles, <div>/<span> structure, and most semantic wrappers are dropped or preserved only as raw HTML. Converting HTML → Markdown → HTML will not reproduce the original byte-for-byte. The conversion is lossy on purpose — the goal is clean, portable, human-editable text, not a faithful round-trip. To go back the other way, use our Markdown to HTML converter.
Can I feed the Markdown to an LLM or ChatGPT?
Yes — this is one of the best modern uses. Raw HTML wastes tokens on tags, attributes, scripts, and styling that a model does not need, and the noise can degrade retrieval quality in a RAG pipeline. Converting a page to Markdown strips that overhead while keeping the structure a model reads well: headings become hierarchy, lists stay lists, tables stay tables, and links stay links. The output is typically a fraction of the original HTML's token count, so you fit more real content in the context window. Paste a scraped page here, copy the Markdown, and drop it into your prompt, embedding step, or document store.
Are my files uploaded to a server?
No. The conversion runs entirely in your browser: the HTML is parsed into a DOM and serialised to Markdown locally with JavaScript, and nothing is transmitted, stored, or logged. You can confirm it by opening your browser's Network tab — converting triggers zero network requests. That makes the tool safe for internal CMS exports, unpublished pages, customer content, and anything under NDA. There is no upload step and no size limit beyond what your browser can comfortably hold in memory.
Does it work offline?
Yes, once the page has loaded. The DOM parser and the Markdown serialiser both run in the browser with no server round-trip, so you can convert with your network disconnected — on a plane, behind a strict firewall, or any time you would rather a page never left your machine. This falls straight out of the privacy-first design: because nothing is sent anywhere, there is nothing the tool needs the network for after the initial load.
Can I convert Markdown back to HTML?
Yes. Switch to the Markdown → HTML tab, or open the dedicated Markdown to HTML converter, paste your Markdown, and get rendered HTML with a live preview, full GFM support, and fragment, full-document, or email-inline output. The two directions pair up: use HTML → Markdown to pull existing web content into a Markdown workflow, and Markdown → HTML to publish or preview it. If the source HTML is messy, our HTML Formatter can tidy it before you convert.

Related Tools

View all tools →