Skip to content

Performance Configuration

This guide covers how to tune the RangeReader for different workloads, such as random access (tiles), sequential processing (ETL), or high-latency environments.

Caching Layers

Caching is critical when reading from remote sources (S3, HTTP) to minimize latency and cost.

Memory Cache (CachingRangeReader)

Best for: Random access, Tile servers, Metadata headers.

URI bucket = URI.create("s3://bucket/");
URI leaf = URI.create("s3://bucket/key");
try (Storage storage = StorageFactory.open(bucket);
        RangeReader s3Reader = storage.openRangeReader(leaf);
        RangeReader cachedReader = CachingRangeReader.builder(s3Reader)
            // Strategy 1: Max entries (good for header/directory blocks)
            .maximumSize(1000)

            // Strategy 2: Max memory (e.g., 128MB)
            .maxSizeBytes(128L * 1024 * 1024)

            // Strategy 3: Expiration
            .expireAfterAccess(10, TimeUnit.MINUTES)

            .build()) {
    // ...
}

Disk Cache (DiskCachingRangeReader)

Best for: Large datasets, Repeated runs, Offline capability.

try (Storage storage = StorageFactory.open(bucket);
        RangeReader s3Reader = storage.openRangeReader(leaf);
        RangeReader diskCachedReader = DiskCachingRangeReader.builder(s3Reader)
            .cacheDirectory(Path.of("/mnt/fast-ssd/cache"))
            // Hard limit on disk usage (e.g., 10GB)
            .maxCacheSizeBytes(10L * 1024 * 1024 * 1024)
            .build()) {
    // ...
}

Read Optimization

Block Alignment

Cloud storage APIs (S3, GCS) often perform better (and cost less) when reading aligned blocks rather than many tiny, fragmented ranges.

// Align reads to 64KB boundaries
RangeReader alignedReader = BlockAlignedRangeReader.builder(reader)
    .blockSize(64 * 1024) 
    .build();

Impact: If you request bytes 100-200, the reader fetches 0-65536. If you essentially request 200-300, it's served from memory/disk cache immediately.

Provider-Specific Tuning

Amazon S3

For deep AWS SDK customization (custom HTTP client, retry policy, credentials provider), build the S3Client yourself and pass it through S3StorageProvider.open(URI, S3Client). The returned Storage borrows the client; closing the Storage does NOT close the client.

S3Client s3Client = S3Client.builder()
    .region(Region.US_WEST_2)
    .httpClient(ApacheHttpClient.builder()
        .maxConnections(50)
        .socketTimeout(Duration.ofSeconds(10))
        .build())
    .build();

URI bucket = URI.create("s3://maps/");
URI leaf = URI.create("s3://maps/planet.pmtiles");
try (Storage storage = S3StorageProvider.open(bucket, s3Client);
        RangeReader reader = storage.openRangeReader(leaf)) {
    // ...
}

HTTP / HTTPS

The connect timeout and trust-all-certificates flag are configurable via Properties:

Properties props = new Properties();
props.setProperty("storage.http.timeout-millis", "5000");
// dev/test only:
// props.setProperty("storage.http.trust-all-certificates", "true");

URI parent = URI.create("https://server.example/data/");
URI leaf = URI.create("https://server.example/data/file.bin");
try (Storage storage = StorageFactory.open(parent, props);
        RangeReader reader = storage.openRangeReader(leaf)) {
    // ...
}

For full HttpClient customization (custom proxy, executor, SSL context, request timeout per call), build the HttpClient yourself and pass it through HttpStorageProvider.open(URI, HttpClient[, HttpAuthentication]). The returned Storage borrows the client; closing the Storage does NOT close it. The Properties path (StorageFactory.open(uri, props)) instead acquires a refcounted lease from HttpClientCache, so identical configs across multiple Storage instances share one underlying client.

Global Properties

For environments where code changes are difficult, you can configure defaults via system properties:

Property Description Default
storage.http.timeout-millis Global HTTP timeout 5000
storage.http.trust-all-certificates Disable SSL verification (Dev only) false

Stack Recommendations

For Tile Servers

A tile server needs low latency. Stack memory caching on top of disk caching.

// 1. Base S3 Reader via Storage
URI bucket = URI.create("s3://bucket/");
URI leaf = URI.create("s3://bucket/tiles.pmtiles");
try (Storage storage = StorageFactory.open(bucket);
        RangeReader base = storage.openRangeReader(leaf);

        // 2. Disk Cache (Persistent L2)
        RangeReader l2 = DiskCachingRangeReader.builder(base)
            .cacheDirectory(cacheDir)
            .maxCacheSizeBytes(50_000_000_000L) // 50GB
            .build();

        // 3. Memory Cache (Fast L1)
        RangeReader reader = CachingRangeReader.builder(l2)
            .maximumSize(10_000) // Keep hot tiles in RAM
            .softValues()        // Let JVM reclaim memory if needed
            .build()) {
    // serve tiles from `reader`
}

For Data Pipelines (ETL)

ETL jobs often read large chunks sequentially. Memory caching is less useful; focusing on throughput is key.

try (Storage storage = StorageFactory.open(bucket);
        RangeReader base = storage.openRangeReader(leaf);
        RangeReader reader = DiskCachingRangeReader.builder(base)
            .maxCacheSizeBytes(1_000_000_000L) // 1GB buffer
            .deleteOnClose()                   // Clean up after job
            .build()) {
    // ...
}