How to Query Parquet Files with SQL (No Python Needed)
You have a .parquet file. You want to filter, aggregate, or join it — but you do not want to spin up a Python environment, install pandas, or configure a database. Good news: you can query Parquet files with plain SQL, right now, with zero setup. And with the right tool, you get a lot more than just a query editor.
This tutorial covers two approaches: Parquet Explorer (browser-based, full-featured) and DuckDB CLI (terminal-based, scriptable). Both use DuckDB under the hood and support full SQL syntax.
Option 1: Parquet Explorer (Browser)
Parquet Explorer is a complete Parquet platform that runs DuckDB-WASM directly in your browser. There is no server, no upload, and no account. Your data stays on your machine.
Getting Started
- Open parquetexplorer.com.
- Drag and drop your
.parquetfile onto the page, or click to browse. - The file loads instantly. You will see the schema (with nested types displayed in a tree view), row count, and a data preview in a virtualized table that handles millions of rows smoothly.
- Switch to the SQL editor and start writing queries.
That is it. No installation, no dependencies, no configuration files.
Beyond Just Querying
What makes Parquet Explorer more than a query tool is the workflow it wraps around the SQL engine:
- Query suggestions: Based on the schema and data in your loaded table, the editor offers contextual query suggestions — so you spend less time remembering column names and more time exploring.
- Query history: Every query you run is saved in your session history, making it easy to revisit and refine earlier analyses.
- Data profiler: Before you even write SQL, the profiler gives you per-column statistics, histograms, and semantic type detection. It automatically identifies columns containing emails, URLs, UUIDs, IP addresses, and phone numbers, and assigns a data quality score.
- Schema and metadata inspector: Browse the full type tree (including STRUCT, LIST, and MAP nesting), row group details, compression codecs, and column-level stats like min/max values, null counts, and distinct counts.
- Export results: Download query output as CSV, JSON, or Parquet (with Snappy compression).
Why This Works
Parquet Explorer embeds DuckDB-WASM — a full analytical database compiled to WebAssembly. When you drop a file, DuckDB reads it locally in your browser tab using the File System Access API. The query engine, optimizer, and storage layer all run client-side:
- Files up to several gigabytes work fine (limited by your browser’s memory).
- Query performance is surprisingly fast — DuckDB is one of the fastest analytical engines available.
- Your data never touches a network. It is fully private.
Option 2: DuckDB CLI (Terminal)
If you prefer working in the terminal, the DuckDB CLI is a lightweight, single-binary tool that speaks SQL natively with Parquet files.
Installation
# macOS
brew install duckdb
# Linux (x86_64)
curl -LO https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
# Windows — download from https://duckdb.org/docs/installation
Basic Usage
# Start an interactive session
duckdb
# Or query directly from the command line
duckdb -c "SELECT * FROM 'sales.parquet' LIMIT 10"
DuckDB treats Parquet files as tables automatically. Just reference the file path in your FROM clause.
Practical SQL Examples
The following examples work in both Parquet Explorer and DuckDB CLI. We will use a hypothetical orders.parquet file with columns: order_id, customer_id, product, category, quantity, price, order_date, and country.
Preview Data
SELECT * FROM orders LIMIT 20;
Always start here to understand your columns and data types. In Parquet Explorer, the virtualized results table lets you scroll through even massive result sets without lag.
Filter Rows
SELECT *
FROM orders
WHERE country = 'Germany'
AND order_date >= '2025-01-01';
DuckDB pushes these predicates into the Parquet reader, skipping row groups that do not match — making filtered queries extremely fast even on large files.
Aggregate with GROUP BY
SELECT
category,
COUNT(*) AS num_orders,
ROUND(SUM(price * quantity), 2) AS total_revenue,
ROUND(AVG(price), 2) AS avg_price
FROM orders
GROUP BY category
ORDER BY total_revenue DESC;
Date Functions
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders,
SUM(price * quantity) AS revenue
FROM orders
GROUP BY month
ORDER BY month;
DuckDB has extensive date/time functions: DATE_TRUNC, DATE_PART, DATE_DIFF, STRFTIME, and more.
Window Functions
SELECT
order_id,
customer_id,
price,
SUM(price) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;
Window functions work exactly as they do in PostgreSQL, making DuckDB feel familiar to anyone with SQL experience.
Top-N per Group
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY price * quantity DESC
) AS rn
FROM orders
)
WHERE rn <= 3;
This returns the three highest-revenue orders per category.
DESCRIBE and Schema Inspection
DESCRIBE SELECT * FROM orders;
This shows the column names and types. In Parquet Explorer, you also get this information visually in the schema panel — including nested type hierarchies rendered as an expandable tree.
Export Results
In DuckDB CLI, you can write query results to a new file:
COPY (
SELECT * FROM orders WHERE country = 'Germany'
) TO 'german_orders.parquet' (FORMAT PARQUET);
In Parquet Explorer, you can export query results as CSV, JSON, or Parquet directly from the UI — no SQL COPY syntax needed.
Common Patterns and Tips
Use CTEs for Readability
WITH monthly AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
category,
SUM(price * quantity) AS revenue
FROM orders
GROUP BY month, category
)
SELECT
month,
category,
revenue,
LAG(revenue) OVER (PARTITION BY category ORDER BY month) AS prev_month,
ROUND(100.0 * (revenue - LAG(revenue) OVER (PARTITION BY category ORDER BY month))
/ LAG(revenue) OVER (PARTITION BY category ORDER BY month), 1) AS pct_change
FROM monthly
ORDER BY category, month;
Sampling Large Files
SELECT * FROM orders USING SAMPLE 1%;
DuckDB supports TABLESAMPLE and USING SAMPLE for quick exploration of large datasets.
Check for Nulls
SELECT
COUNT(*) AS total,
COUNT(*) - COUNT(customer_id) AS null_customer_ids,
COUNT(*) - COUNT(price) AS null_prices
FROM orders;
This is useful, but the data profiler in Parquet Explorer does this automatically for every column — along with histograms, distinct counts, and semantic type detection — without writing a single query.
Performance Tips
- Only select the columns you need.
SELECT col1, col2is faster thanSELECT *because Parquet skips unneeded columns. - Filter early. Push
WHEREclauses as close to the base table as possible so DuckDB can use Parquet statistics to skip row groups. - Use Parquet over CSV. If you have a choice, always query Parquet files. DuckDB can query CSV too, but Parquet is 10-50x faster on analytical queries. If your source data is CSV, TSV, JSON, or JSONL, you can convert it to Parquet in seconds using Parquet Explorer’s format converter.
- In the browser, watch memory. Parquet Explorer runs in your browser tab, so very large files (multi-GB) may need more memory. For desktop-scale exploration, it handles most files effortlessly.
When SQL Is Not Enough
For some tasks — machine learning, complex transformations, visualization — you will eventually want Python, R, or another language. But for data exploration, quality checks, quick aggregations, and ad-hoc analysis, SQL on Parquet files is hard to beat. It is fast, expressive, and requires no setup.
Start with parquetexplorer.com for instant, private access with built-in profiling, query suggestions, and format conversion. If you need scripting and automation, install the DuckDB CLI. Either way, you can go from file to insight in seconds — no Python needed.