Skip to content
Back to Blog
Tutorials

Flatten Nested JSON to CSV: 5 Strategies & Decision Matrix

Compare 5 strategies to flatten nested JSON to CSV: dot notation, indexed arrays, explode rows, stringify. Decision matrix for Excel, Pandas & BigQuery.

12 min read

How to Flatten Nested JSON to CSV: 5 Strategies & Decision Matrix

The Geometry Problem

You hit the same wall every time. An API returns nested JSON and the analyst on Slack just wants a spreadsheet. mongoexport produces $oid wrappers and three levels of metadata, and BigQuery expects a flat table. To flatten nested JSON to CSV is not a syntax problem. It is a geometry problem. JSON is a tree, CSV is a grid, and you cannot move a tree into a grid without picking how the branches collapse.

There are exactly five collapse strategies. Pick the wrong one and you ship 200 columns to Excel, lose precision on a Twitter ID, or break the round-trip your pipeline depends on. Pick the right one and the conversion is a one-liner.

StrategyOne-linerBest for
Dot notationcustomer.address.cityExcel/Sheets analysis
Underscorecustomer_address_citySQL-friendly columns
Indexed arraysitems.0.sku, items.1.skuFixed-size arrays
Explode rowsRepeat parent for each childPandas/BigQuery analytics
Stringify"{\"city\":\"Seattle\"}" in one cellLossless round-trip

This guide walks each strategy, hands you a decision matrix keyed by consumer (Excel, Pandas, BigQuery, Postgres), and shows four real payloads where the right strategy is not the obvious one. If you also need the general bidirectional overview (parser libraries, streaming, encoding traps), see the CSV ↔ JSON conversion guide.

Why Nested JSON Doesn’t Fit CSV

JSON carries three kinds of structure CSV lacks. Hierarchy is an object inside an object. Sequence is an array. Mixed is the combination: arrays of objects, objects with arrays, arrays of arrays. A typical e-commerce order is all three at once.

CSV has exactly two dimensions: rows and columns. There is no third axis for “this column holds three children.” When you demand a grid from a tree, something gives. Either spread the children across more columns (and live with names like items.0.options.0.value), spread them across more rows (parent fields repeat), or cram them into one cell as text and stop treating them as structure.

Each strategy below answers that question differently. Some preserve readability and lose round-trip safety. Others do the reverse. None is universal. Match the answer to who reads the file next.

5 Flattening Strategies Compared

Strategy 1: Dot Notation (customer.address.city)

Dot notation walks from root to leaf and uses . to join keys. Each nested object becomes one column per leaf, with the path encoded in the column name.

{ "customer": { "address": { "city": "Seattle" } }, "email": "alice@example.com" }

becomes

customer.address.city,email
Seattle,alice@example.com

In Pandas, one line covers it:

import pandas as pd

data = [{"customer": {"address": {"city": "Seattle"}}, "email": "alice@example.com"}]
df = pd.json_normalize(data, sep='.')
df.to_csv("out.csv", index=False)

In JavaScript, a small recursive function suffices:

function flattenDot(obj, prefix = '', acc = {}) {
  for (const [k, v] of Object.entries(obj)) {
    const key = prefix ? `${prefix}.${k}` : k;
    if (v && typeof v === 'object' && !Array.isArray(v)) {
      flattenDot(v, key, acc);
    } else {
      acc[key] = v;
    }
  }
  return acc;
}

Pros: human-readable, Pandas default, preserves the original path. Cons: column names can run long (Kubernetes specs produce names like spec.template.spec.containers.0.resources.limits.memory), and the dot becomes ambiguous if a real key contains . (Google Analytics 4 event parameters do).

Strategy 2: Underscore Notation (customer_address_city)

Same idea, different separator. Replace . with _ and the result is SQL-safe: SELECT customer_address_city FROM events works without quoting the identifier. BigQuery, Snowflake, and Postgres all prefer this.

import pandas as pd
df = pd.json_normalize(data, sep='_')

The decision between dot and underscore is purely about the downstream tool. Excel analysts read dot more naturally; SQL engines accept underscore without complaint. Switch by changing one argument.

Pros: SQL-safe column names, BigQuery-compliant identifiers, no quoting required. Cons: ambiguity remains if a key contains _ (less common than . but still possible).

Strategy 3: Indexed Arrays (items.0.sku, items.1.sku)

Objects flatten cleanly because keys are unique. Arrays do not because length is unbounded. The indexed strategy treats array positions as path segments: items[0] becomes items.0.

{ "id": "ord-001", "items": [{"sku": "A"}, {"sku": "B"}] }

becomes

id,items.0.sku,items.1.sku
ord-001,A,B

This is the default Flatten behavior in our JSON to CSV converter. Each leaf gets its own column; position is recorded in the name.

Pros: every value gets its own cell, position preserved, no row duplication. Cons: column count explodes (100 items = 100 columns); rows with different array lengths produce ragged tables; downstream aggregation breaks (no SUM(items.*.qty)).

Strategy 4: Explode Rows (array to multiple rows)

Instead of widening the table to fit the array, lengthen it. Repeat the parent fields once per array element, and let each element become its own row.

{ "order_id": "ord-001", "customer": "Alice", "items": [{"sku": "A", "qty": 2}, {"sku": "B", "qty": 1}] }

becomes

order_id,customer,items.sku,items.qty
ord-001,Alice,A,2
ord-001,Alice,B,1

In Pandas, one line does both the explode and the normalization:

import pandas as pd

orders = [{"order_id": "ord-001", "customer": "Alice",
           "items": [{"sku": "A", "qty": 2}, {"sku": "B", "qty": 1}]}]
df = pd.json_normalize(orders, record_path='items', meta=['order_id', 'customer'])
df.to_csv("orders.csv", index=False)

In SQL, UNNEST does the same:

SELECT order_id, item.sku, item.qty FROM orders, UNNEST(items) AS item;

Pros: Pandas and BigQuery treat this shape natively, aggregations work (GROUP BY order_id), the schema stays narrow. Cons: parent fields duplicate in every child row (storage bloat), the 1-to-many boundary is implicit (you need an order_id), and two arrays at the same level produce a Cartesian product unless you UNNEST them carefully.

Strategy 5: Stringify (JSON-in-cell)

The other option is to not flatten at all. Serialize the entire nested value as a JSON string and put it in a single cell. The outer table stays flat; the structure is preserved verbatim inside.

{ "id": "ord-001", "items": [{"sku": "A"}, {"sku": "B"}] }

becomes

id,items
ord-001,"[{""sku"":""A""},{""sku"":""B""}]"

This is the Stringify mode in our JSON to CSV converter. The column count never explodes, the original shape is preserved byte-for-byte, and the reverse trip reconstructs the input exactly.

Pros: 100% lossless, predictable column count, round-trip safe when paired with Infer types on reverse. Cons: Excel users see escaped quotes, SQL engines need JSON functions to query into the value (JSON_EXTRACT_SCALAR in BigQuery, ->>'key' in Postgres), and spreadsheet formulas cannot reach into the cell.

5 Strategies Side-by-Side

Same input across all five: {"id":"ord-001","customer":{"name":"Alice"},"items":[{"sku":"A","qty":2},{"sku":"B","qty":1}]}.

StrategyColumnsRound-trip safeBest consumer
Dot notationgrows with arrayNoExcel analyst
Underscoregrows with arrayNoSQL warehouse
Indexed arrays2 per array slotNo (ambiguous on reverse)Fixed-size arrays
Explode rowsnarrow, 1 row per childPartial (needs key)Pandas / BigQuery
StringifyfixedYesPipeline round-trip

Decision Matrix: Which Strategy for Which Consumer

Find the consumer in the left column; the recommended strategy is on the right.

ConsumerRecommended strategyWhy
Excel / Sheets (analyst)Dot + Stringify for big arraysReadable column names; large arrays do not explode the sheet
Excel-EU (DE/FR/IT/ES)Dot + ; delimiter + UTF-8 BOMSemicolon required; BOM prevents encoding garble
Pandas (json_normalize + explode)Underscore + Explode rowsSQL-friendly columns; explode plays well with groupby
BigQuery / SnowflakeTSV + Stringify or ExplodeTab avoids quote traps; JSON_EXTRACT queries the cell
PostgreSQL COPYRFC 4180 + Underscore + flatSQL-safe columns; strict RFC quoting
MongoDB → BigQuery ETLNDJSON load directly, skip CSVBigQuery loads NDJSON natively; CSV is a detour

Excel / Google Sheets: The Locale Trap

Excel column names have no practical length limit. The real traps are three.

First, the locale split. European Excel (Germany, France, Italy, Spain) expects ; as the delimiter because , is the decimal separator. A comma-delimited CSV opens with every row collapsed into column A. The Excel preset on our json-to-csv tool switches to ; + CRLF + UTF-8 BOM in one click.

Second, scientific notation. Excel sees 9007199254740993 and renders 9.00719925474E+15. Store big integers as strings in the source JSON and turn on BOM so Excel keeps the cell as text. Our converter detects big integers automatically.

Third, the practical column limit. Excel theoretically supports 16,384 columns, but anything above ~500 becomes unmanageable. Stringify the heavy sub-trees or pre-project with jq before converting.

Pandas: json_normalize + explode

The standard pattern for nested arrays is record_path + meta in one pass:

import pandas as pd

orders = [{
    "order_id": "ord-001",
    "customer": {"name": "Alice", "city": "Seattle"},
    "items": [{"sku": "SKU-100", "qty": 2}, {"sku": "SKU-205", "qty": 1}]
}]

df = pd.json_normalize(orders, record_path='items',
    meta=['order_id', ['customer', 'name'], ['customer', 'city']], sep='_')
df.to_csv("orders.csv", index=False)

The output is one row per item with order_id, customer_name, and customer_city repeated. This beats running explode first and json_normalize after: record_path skips the intermediate object column and meta lets you control which parent fields propagate. For inputs where array elements contain nested objects, set max_level= to cap depth.

BigQuery / Snowflake: TSV + JSON-in-cell

BigQuery’s LOAD DATA is strict about CSV quoting and often misparses files with commas inside quoted text. TSV is safer because tabs almost never appear inside text fields:

bq load --source_format=CSV --field_delimiter='\t' \
  dataset.orders gs://bucket/orders.tsv \
  order_id:STRING,customer:STRING,items:STRING

When you load nested data as Stringified JSON in a single column, BigQuery still queries into it with JSON_EXTRACT_SCALAR:

SELECT order_id, JSON_EXTRACT_SCALAR(items, '$[0].sku') AS first_sku
FROM dataset.orders;

Snowflake provides the same capability through VARIANT, with path queries like items:0.sku::STRING. In both engines, Stringify + JSON path queries beats full flattening when nested arrays are large or variable-length.

PostgreSQL COPY: RFC 4180 Strict

COPY ... FROM ... WITH (FORMAT csv, HEADER true) is the strictest RFC 4180 reader you will commonly meet. Two behaviors trip people up.

First, COPY does not accept a UTF-8 BOM. The byte order mark becomes a literal prefix on the first column name (id instead of id), and every query referencing id fails silently. Turn BOM off for Postgres targets.

Second, COPY cannot natively parse nested data. Either explode arrays into multiple rows before loading, or define the destination as jsonb and stringify the nested value:

CREATE TABLE orders (order_id text PRIMARY KEY, customer text, items jsonb);
COPY orders FROM '/tmp/orders.csv' WITH (FORMAT csv, HEADER true);
SELECT order_id, items->0->>'sku' AS first_sku FROM orders;

For pipelines that already speak JSON end-to-end, skip CSV and use COPY ... FROM ... WITH (FORMAT text) with JSON-line input instead.

Real-World Payload Walkthroughs

Walkthrough 1: E-commerce Orders (customer + items array)

A typical order combines nested customer info with a variable-length items array:

[{ "id": "ord-001",
   "customer": { "name": "Alice", "address": {"city": "Seattle", "country": "US"} },
   "items": [{"sku": "SKU-100", "qty": 2}, {"sku": "SKU-205", "qty": 1}] }]

The right strategy depends on who reads the file. Finance wants one row per item to compute revenue per SKU; that is the explode strategy, producing two rows with id and customer.name repeated. Operations wants one row per order for fulfillment dashboards; that is dot notation with Stringified items so the array does not blow out the column count. Same input, two outputs, both correct for their consumer.

Try it directly: paste the payload into our JSON to CSV converter and toggle Flatten versus Stringify on the Nested option. The “Nested E-commerce Orders” example loads the same shape.

Walkthrough 2: GitHub Issues API (labels array + user object)

The /repos/{owner}/{repo}/issues endpoint returns a mixed nested shape:

[{ "id": 1001, "title": "Bug: login 404", "state": "open",
   "labels": ["bug", "priority:high"], "user": {"login": "alice"} }]

user is an object with one useful field; labels is a string array of unbounded length. The pragmatic flatten is hybrid: dot notation on user (you only care about user.login), and inline-join on labels to a single cell separated by ;:

id,title,state,labels,user.login
1001,Bug: login 404,open,bug;priority:high,alice

You cannot capture both “join arrays into a cell” and “flatten objects with dots” in a single strategy. Our converter handles the object-flatten automatically; pre-process the labels with jq (map(.labels = (.labels | join(";")))) or accept the default array-stringify behavior.

Walkthrough 3: MongoDB mongoexport ($oid + metadata)

mongoexport --jsonArray produces Extended JSON wrappers:

[{ "_id": {"$oid": "6634a1b2c3d4e5f600000001"},
   "email": "alice@example.com",
   "metadata": { "signupDate": "2026-01-15T10:30:00Z",
                 "preferences": {"newsletter": true, "theme": "dark"} } }]

The $oid wrapper produces a column literally named _id.$oid, which most SQL engines reject. Pre-process with jq to unwrap it:

mongoexport --collection=users --jsonArray | jq 'map(._id = ._id."$oid")' > users.json

For the deeply nested metadata.preferences block, choose by consumer. Analyst export: dot-flatten the whole thing; metadata.preferences.theme reads fine. Pipeline round-trip: stringify metadata to keep the structure intact. For full jq patterns that pair with CSV pipelines, see our jq command-line cheat sheet.

Walkthrough 4: Kubernetes Pod Spec (deeply nested)

A kubectl get pod -o json response is a worst case for flat strategies. The structure routinely goes six levels deep (spec.template.spec.containers.0.resources.limits.memory). Naive dot-flattening produces column names exceeding 70 characters and 200+ columns of output. Two strategies work.

Pre-project with kubectl jsonpath. Pick only the fields you actually need:

kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[0].image}{"\t"}{.status.phase}{"\n"}{end}' > pods.tsv

Stringify the spec, flatten the metadata. Keep metadata (name, namespace, labels) flat and stringify spec into a single cell:

kubectl get pods -o json | jq 'map({name: .metadata.name, namespace: .metadata.namespace, spec: (.spec | tostring)})'

Then paste into the converter with Flatten mode. The spec column becomes one JSON cell; metadata columns stay readable. Avoid the anti-pattern kubectl get pod -o json | json-to-csv flatten with no pre-projection. The column count will be unworkable.

Round-Trip Safety: Flatten is Lossy, Stringify is Lossless

One thing most guides skip: dot notation, underscore notation, indexed arrays, and explode-rows are all one-way projections. Once you flatten with any of them, the original JSON cannot be perfectly reconstructed from the CSV alone.

Counter-examples are easy to construct. A column named customer.address.city is ambiguous between {"customer": {"address": {"city": "..."}}} and {"customer": {"address.city": "..."}}. Indexed arrays look reversible, but CSV cannot say whether items.0.sku should reconstruct to an array or an object with a numeric key. Explode-rows requires a group-by key; without order_id, you cannot tell which rows belonged to the same parent.

Only Stringify survives the round-trip. The nested value is preserved verbatim as a JSON string, so the reverse converter reads the cell, parses it, and reinserts the original intact. Convert with Stringify, save the CSV, paste into our CSV to JSON converter, turn on Infer types, and you get bytes identical to the input.

A short rule: pipeline round-trip → Stringify. Single-shot analysis or reporting → dot, underscore, or explode based on consumer.

Doing It in Our Browser Tool

The JSON to CSV converter exposes two of the five strategies directly: Flatten (combining dot notation and indexed arrays) and Stringify (preserving structure inside a cell). The other three (underscore, explode-rows, SQL-target presets) are one preprocessing step away.

A typical session takes five clicks:

  1. Validate the input with our JSON Formatter so syntax errors do not become silent conversion failures.
  2. Paste the JSON into the JSON to CSV converter. The conversion runs instantly.
  3. Set Nested to Flatten for dotted and indexed keys, or Stringify to keep arrays and objects in single cells.
  4. Pick a preset: RFC 4180 for pipelines, Excel for EU spreadsheets, TSV for warehouses, Pipe for comma-heavy text.
  5. Click Swap direction and use the CSV to JSON converter with Infer types on to verify a Stringify round-trip.

Everything runs in your browser. PII, internal exports, and production secrets stay on the page; there are zero network requests after page load. That matters for sensitive data where uploading to a third-party site is not an option.

Common Pitfalls

Six failure modes show up over and over.

  1. Column-name explosion. Kubernetes specs and GitHub PR review threads produce hundreds of leaf paths. Fix: pre-project with jq or kubectl jsonpath, or stringify heavy sub-trees while flattening metadata.
  2. Array length mismatch. Row 1 has 3 items, row 2 has 5 items. Indexed arrays produce blank cells in items.3.sku and items.4.sku for row 1. Fix: switch to explode-rows.
  3. Index keys treated as strings on reverse. When CSV-to-JSON sees items.0.sku, the 0 is technically a string key. Some reverse converters reconstruct {"0": {"sku": "A"}} instead of [{"sku": "A"}]. Fix: use Stringify for round-trips.
  4. Keys that already contain the separator. GA4 events have keys like event_params.key containing literal dots; flattening with . produces ambiguous paths. Fix: use underscore, or rename the offending keys. See our JSON5 and JSONC formatting guide for background on JSON formats with extended key support.
  5. Mixed-level types. Some rows have address as an object, others as null. Flattening produces empty cells where the object was null. Our converter’s schema-notes warning flags this so you can verify the downstream consumer.
  6. Big integers truncated by Excel. A $oid Long, a Twitter snowflake ID, or a K8s resourceVersion exceeds JavaScript’s safe range (2^53 - 1) and gets rounded silently. Excel then renders them as 9.00719925474E+15. Fix: store IDs as strings in the source JSON, enable BOM, and use the Excel preset.

FAQ

What is the best way to flatten nested JSON to CSV?

The best way to flatten nested JSON to CSV depends on the downstream consumer. Use dot notation for Excel or Google Sheets. Use explode-rows when Pandas or BigQuery will aggregate the data. Use Stringify when the CSV must round-trip back to JSON without data loss. Match the strategy to the next reader.

How do I convert a JSON array to multiple CSV rows?

Convert a JSON array to multiple CSV rows using the explode strategy: duplicate the parent fields once per array element so each becomes its own row. In Pandas, pd.json_normalize(data, record_path='items', meta=['order_id']) does it in one call. In SQL, UNNEST(items) produces the same shape. Parent keys repeat across exploded rows.

Can I round-trip CSV back to the original nested JSON?

Round-tripping CSV back to the original nested JSON works only with Stringify mode. Dot notation, underscore, indexed arrays, and explode-rows are lossy one-way projections; the reverse converter cannot perfectly reconstruct the tree. Stringify preserves arrays and objects as JSON inside a single cell, so the full round-trip is byte-identical when Infer types is on.

Why does Excel show my flattened JSON as one long column?

Excel shows your flattened JSON as one long column when you are in a European locale (Germany, France, Italy, Spain) where the comma is reserved for decimals and Excel expects semicolons as the delimiter. The Excel preset on json-to-csv switches to ; + CRLF + UTF-8 BOM in one click.

Should I use dot notation or underscore for column names?

Use dot notation when the target is Excel, Google Sheets, or Pandas; dots are the json_normalize default and read naturally. Use underscore when the target is SQL: Postgres, BigQuery, and Snowflake require quoting around identifiers containing dots, while underscores are accepted unquoted everywhere.

How does pandas json_normalize handle arrays of objects?

Pandas json_normalize handles arrays of objects through record_path and meta arguments. pd.json_normalize(data, record_path='items', meta=['order_id']) explodes items into one row per element with order_id repeated. For nested objects without arrays, the simpler pd.json_normalize(data, sep='_') produces underscore-separated column names like customer_address_city. Use max_level= to cap depth on deep trees.

What’s the column limit when flattening deeply nested JSON?

The column limit when flattening deeply nested JSON is 16,384 in Excel and effectively unbounded in CSV itself, but past 500 columns the output becomes unmanageable. Kubernetes Pod specs or GraphQL responses easily exceed this. Stringify the heavy sub-trees with the JSON to CSV converter or pre-project with jq or kubectl jsonpath.

Is jq a good tool for flattening JSON before CSV conversion?

Yes, jq is the right tool for flattening JSON before CSV conversion. It handles pre-projection (map({id, name})), pre-explosion (.[] | {id, item: .items[]}), and shape normalization in one line. The jq pipeline runs before the CSV step and controls exactly which fields reach the converter. See our jq command-line cheat sheet for patterns.

Conclusion

Five takeaways:

  1. JSON-to-CSV is a geometry problem, not a syntax problem. A tree cannot fit in a grid without choosing how the branches collapse.
  2. Five strategies cover the practical universe (dot, underscore, indexed arrays, explode-rows, Stringify). Pick by consumer.
  3. Stringify is the only lossless path. Use it for pipeline round-trips.
  4. Excel-EU and BigQuery have presets for a reason. Use them.
  5. Real payloads (mongoexport, Kubernetes specs, GitHub responses) usually need a jq or kubectl jsonpath pre-projection step first.

Try your own payload in the JSON to CSV converter. It runs locally, handles all five strategies, and round-trips losslessly with Stringify. Nothing uploads and nothing leaves the page.

Related Articles

View all articles