Using the Parquet file with duckdb
DuckDB is a great tool to query data in various formats using SQL. We can leverage it to only read parts of the huge Parquet file, in order to extract only the data we need.
You'll also need the spatial extension. It is installed with:
Filters
There are many ways to use the Parquet file, we'll show here some examples of how to handle the data.
The Parquet file contains almost all Panoramax data (extracted at the end of each week), so the resulting file is huge, the first step is often to filter it to only get the data you need.
Extract all the data in a bounding box
If we want to extract all the data in Paris' bounding box, we can use the following query:
COPY (
SELECT
*
FROM
'https://api.panoramax.xyz/data/geoparquet/panoramax.parquet'
WHERE
bbox.xmin > 2.2241
and bbox.ymin > 48.8155
and bbox.xmax < 2.4533
and bbox.ymax < 48.9023
)
TO 'paris.parquet'
WITH
(FORMAT 'parquet');
This will create a new paris.parquet file, in the current directory.
The export should be quite fast
The export should be quite fast (for a query on a 20G file) because Parquet chunks are made with pictures near each others, so the query can skip a lot of chunks.
Extract data on a precise boundary
In the previous example, we extract data on a rough bounding box, but we can have an export on a precise geometry to only get the data on a city/region/country.
This example is inspired by an Overture example
-- We use overturemaps divisions datasets to get the ID of the region/city/country we want to extract
SET variable division_id = (
SELECT
id
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-10-22.0/theme=divisions/type=division/*.parquet')
WHERE
names.primary = 'Paris' and country = 'FR' and capital_of_divisions is not null
LIMIT 1
);
-- Fetch the bounds and geometry of the Region, and store it in a table
CREATE OR REPLACE TABLE bounds AS (
SELECT
id AS division_id, names.primary, geometry, bbox
FROM
read_parquet('s3://overturemaps-us-west-2/release/2025-10-22.0/theme=divisions/type=division_area/*.parquet')
WHERE
division_id = getvariable('division_id')
);
-- Extract the bounds and geometry of the division into variables for faster table scan
SET variable xmin = (select bbox.xmin FROM bounds);
SET variable ymin = (select bbox.ymin FROM bounds);
SET variable xmax = (select bbox.xmax FROM bounds);
SET variable ymax = (select bbox.ymax FROM bounds);
SET variable boundary = (select geometry FROM bounds);
-- Create a local GeoParquet file with exactly the data contained in the geometry of the chosen division
COPY(
SELECT
*
FROM
'https://api.panoramax.xyz/data/geoparquet/panoramax.parquet'
WHERE
bbox.xmin > getvariable('xmin')
AND bbox.xmax < getvariable('xmax')
AND bbox.ymin > getvariable('ymin')
AND bbox.ymax < getvariable('ymax')
AND ST_INTERSECTS(
getvariable('boundary'),
geometry
)
) TO 'paris.parquet';
Filter based on the semantic
Filtering of the semantic is a bit more complex because the semantic field is an array.
Here is a query to filter the items that have a traffic sign tag (be it on the picture, or on an annotation (a specific part of the picture)):
SELECT
*
FROM
'https://api.panoramax.xyz/data/geoparquet/panoramax.parquet'
WHERE
length(list_filter(semantics, s -> s.key = 'osm|traffic_sign')) > 0
LIMIT 10;
We can also do some statistics on the used tags with a query like:
WITH all_semantics AS (
SELECT semantics FROM 'https://api.panoramax.xyz/data/geoparquet/panoramax.parquet'
)
, sem_tags AS (
-- We unnest the semantics, to get one row for each semantic tag
SELECT tags
FROM all_semantics
, UNNEST(semantics) AS semantic_struct(tags)
WHERE tags.key NOT LIKE '%[%]%' -- we do not want the qualifiers here
)
SELECT
tags.key, tags.value, count(*) AS nb_tags
FROM sem_tags
GROUP BY tags.key, tags.value;
Filter based on the users
We can get all the data of a given user with the query:
COPY(
SELECT
*
FROM
'https://api.panoramax.xyz/data/geoparquet/panoramax.parquet'
WHERE
providers[1].name = 'PanierAvide' -- The pictures can have several providers, and the first one is the Panoramax account
) TO 'panieravide.parquet' WITH (format 'parquet');
Conversion
GeoParquet to JSON
We can also convert the generated Parquet file (it's better to do it on smaller Parquet files, on the whole file, the resulting file will be huge) to a JSON file.
COPY(
SELECT
-- we can select only the fields we want, or use * to select all
id,
geometry,
datetime, -- time of the picture
assets[1].href, -- url of the picture
semantics
FROM 'extract.parquet'
) TO 'extract.json' WITH (FORMAT JSON);
GeoParquet to CSV
We can also convert easily to get a CSV file.