Install & run

The easiest way to run this project is by using Docker compose:

git clone https://gitlab.com/panoramax/server/meta-catalog.git
cd meta-catalog/
docker compose up -p meta-catalog --build

Then, the meta-catalog is accessible at localhost:9000 (without any data for the moment).

You can also run each service without docker, as is explained in each services documentation.

Note

If at some point you're lost or need help, you can contact us through issues or by email.

Database & migrations

Each service will need an accessible postgis database.

The database will be provided and its schema updated if you use the docker compose approach, else you need to provide it.

Info

In all the following examples, the database will be:

On localhost:5432
Named as panoramax
And accessed with username:password

Thus, the connection string will be postgresql://username:password@localhost:5432/panoramax.

Manual install Python package with uv

Docker Compose

cd /migrations

python -m pip install --upgrade virtualenv
virtualenv .venv
source .venv/bin/activate

pip install -e .
yoyo apply --database postgresql+psycopg://username:password@localhost:5432/panoramax

cd /migrations
uv sync
uv run yoyo apply --database postgresql+psycopg://username:password@localhost:5432/panoramax

When using docker compose, the database schema is updated automatically by the migrations service.

Loading data

The harvester directory contains to code needed to harvest the data from several instances into the meta-catalog.

Configuration instances

You need to add instances to the federation with the add-instance command.

Manual install Python package with uv

Docker Compose

cd harvester
python -m pip install --upgrade virtualenv
virtualenv .venv
source .venv/bin/activate

pip install -e .

stac-harvester add-instance --db "postgresql://username:password@localhost:5432/panoramax" my-instance --url https://your.panoramax.org

Tip

You can add a .env file with DB_URL=<cnx_string> to avoid to pass the --db parameter every time.

uv is a really efficient tool to handle python programs.

You just need to install uv, following their documentation.

Then you can just run:

cd harvester
uv sync

uv run stac-harvester add-instance --db "postgresql://username:password@localhost:5432/panoramax" my-instance --url https://your.panoramax.org

docker compose -p meta-catalog -f docker-compose.yml -f docker-compose-harvester.yml run --build --rm harvester -c "stac-harvester add-instance my-instance --url https://your.panoramax.org"

Harvest all instances

Manual install

Docker Compose Systemd From our data exports

Then you can harvest all instances with the harvest-all command:

stac-harvester harvest-all --db "postgresql://username:password@localhost:5432/panoramax"

The harvester will crawl all instances that have not been crawld since 5 minutes by default.

For production, this command should be run by a cron (every minutes seems a good default).

docker compose -p meta-catalog -f docker-compose.yml -f docker-compose-harvester.yml up -d

Warning

This has never been tested in production (the only production known is using systemd), use this only for testing.

If you want to run a systemd service to craw automatically, you can check the detailed documentation section.

If you want to give a try to our codebase without crawling all public instances, you can as well reuse our public data exports.

Harvest a specific instance

You can also crawl a specific instance with:

stac-harvester harvest --db "postgresql://username:password@localhost:5432/panoramax" <instance_name>

Harvest type

By default the harvest is incremental, meaning that only the data that have changed since the last harvest is crawled (using a filter on the collections endpoint like ?filter=updated > {last_harvest}).

Sometimes, it can be usefull to crawl all the data again, for example when there is a major update in the instance. This can be done with the --full-harvest option (or with INCREMENTAL_HARVEST=false environment variable).

stac-harvester harvest --db "postgresql://username:password@localhost:5432/panoramax" --full-harvest <instance_name>

Instance configuration

The instance configuration (from the instance's /configuration endpoint) is synchronized every day in the database.

It can also be updated manually by running:.

stac-harvester sync-configuration --db "postgresql://username:password@localhost:5432/panoramax" --instance instance_name --instance another_instance_name

Deleting an instance

Deleting an instance needs to be done in the database for the moment:

DELETE FROM instances WHERE name = 'instance_name';

This will cascade delete all the collections, items, providers (users) and harvests linked to it.

Install standalone API

The API is written in Rust.

Manual install

Docker Compose Systemd

Info

You need Rust to build the API. The best is to follow the official documentation, but if you want a quick way to do this:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Note

You should have an up to date Rust version, as least 1.81.0.

To build and run the API:

cd api
cargo run --release -- --db-url=postgres://username:password@localhost:5432/panoramax

The API is started as the api service in the docker compose file.

If you want to run the API as a systemd service, you can use and adapt the example configuration file.

Data export

If you want to publish exports of the data, you can use simple cron jobs to do so.

pg_dump

The easiest backup is done using the pg_dump binary, exporting the public and stats schemas.

Uses zstd compression with the level 5, since it seems a good balance between speed and compression.

Write the following script in a file, for example /usr/local/bin/panoramax-dump.sh:

#!/bin/bash

# Script to dump the panoramax database

BACKUP_DIR="/data/pano-dump/pg_dump"
DATE=$(date +\%Y\%m\%dT\%H\%M\%S)
FILENAME="$BACKUP_DIR/panoramax-backup-$DATE.dump"

echo "Backuping Panoramax to $FILENAME"

time pg_dump --clean --if-exists --format c --dbname panoramax --no-owner --no-privileges --no-comments --schema 'public' --schema 'stats' --compress zstd:5 --file $FILENAME

# Replace the base file
mv $FILENAME $BACKUP_DIR/panoramax.dump

echo "Panoramax backup completed"

and add cron like this to run it every week, at midnight on sunday:

0 0 * * 0 /usr/local/bin/panoramax-dump.sh

Backup restoration

To restore the backup, you can check the data section.

Parquet export

Apache Parquet is a powerful column-oriented data format, built as a modern alternative to CSV files. GeoParquet is an incubating Open Geospatial Consortium (OGC) standard that adds interoperable geospatial types (Point, Line, Polygon) to Parquet.

The federated catalog can export its data following the STAC GeoParquet specification (see tne data section for more details).

The GeoParquet can be generated by copying the geoparquet_export view.

Prerequisites

In order to do this, you need the PG Parquet extension installed on your database.

Write the following script in a file, for example /usr/local/bin/panoramax-parquet-export.sh:

#!/bin/bash
set -e -u -o pipefail

# Script to dump the panoramax database

BACKUP_DIR="/data/pano-dump/geoparquet"
DATE=$(date +\%Y\%m\%dT\%H\%M\%S)
FILENAME="$BACKUP_DIR/panoramax-$DATE.parquet"

echo "Exporting Panoramax to $FILENAME"

time psql -d panoramax -c "\COPY (select * from geoparquet_export) TO '$FILENAME' WITH (FORMAT 'parquet')";

# Replace the base file
mv $FILENAME $BACKUP_DIR/panoramax.parquet

echo "Panoramax export completed"

and add cron like this to run it every week, at 2am on sunday (on the postgres user if possible):

0 2 * * 0 /usr/local/bin/panoramax-parquet-export.sh