Configuration¶
Optimize the Tileverse Range Reader for your specific use case with proper configuration.
Dependency Management¶
Using BOMs (Recommended)¶
The Tileverse provides Bills of Materials (BOMs) to simplify dependency management:
TileVerse BOM¶
Use this BOM in your application to manage Range Reader module versions:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.tileverse</groupId>
<artifactId>tileverse-bom</artifactId>
<version>1.0-SNAPSHOT</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Versions managed by the BOM -->
<dependency>
<groupId>io.tileverse.rangereader</groupId>
<artifactId>tileverse-rangereader-core</artifactId>
</dependency>
<dependency>
<groupId>io.tileverse.rangereader</groupId>
<artifactId>tileverse-rangereader-s3</artifactId>
</dependency>
</dependencies>
Dependencies BOM (For Library Developers)¶
If you're developing libraries that extend Range Reader or need specific third-party dependency versions:
<dependencyManagement>
<dependencies>
<!-- Import TileVerse dependencies BOM first -->
<dependency>
<groupId>io.tileverse</groupId>
<artifactId>tileverse-dependencies</artifactId>
<version>1.0-SNAPSHOT</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Then import TileVerse modules BOM -->
<dependency>
<groupId>io.tileverse</groupId>
<artifactId>tileverse-bom</artifactId>
<version>1.0-SNAPSHOT</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
This approach ensures: - Consistent dependency versions across all modules - Reduced version conflicts and dependency hell - Simplified dependency declaration (no version numbers needed) - Easier upgrades (change only the BOM version)
Performance Configuration¶
Caching Strategies¶
Memory Caching¶
Configure in-memory caching for frequently accessed ranges:
// Entry-based sizing
var reader = CachingRangeReader.builder(delegate)
.maximumSize(1000) // Limit number of cached ranges
.build();
// Memory-based sizing
var reader = CachingRangeReader.builder(delegate)
.maxSizeBytes(64 * 1024 * 1024) // 64MB memory limit
.build();
// Time-based expiration
var reader = CachingRangeReader.builder(delegate)
.maximumSize(500)
.expireAfterAccess(30, TimeUnit.MINUTES)
.build();
// Adaptive sizing (uses soft references)
var reader = CachingRangeReader.builder(delegate)
.softValues() // Let GC manage memory pressure
.build();
Disk Caching¶
Configure persistent disk caching for large datasets:
// Basic disk caching
var reader = DiskCachingRangeReader.builder(delegate)
.cacheDirectory("/tmp/rangereader-cache")
.maxCacheSizeBytes(10L * 1024 * 1024 * 1024) // 10GB
.build();
// Temporary caching (deleted on close)
var reader = DiskCachingRangeReader.builder(delegate)
.cacheDirectory("/tmp/temp-cache")
.deleteOnClose()
.build();
Read Optimization¶
Optimize read patterns for your specific use case:
// For random access patterns, use memory caching
var reader = CachingRangeReader.builder(delegate)
.maximumSize(1000) // Cache frequently accessed ranges
.build();
// For large sequential reads, use disk caching
var reader = DiskCachingRangeReader.builder(delegate)
.maxCacheSizeBytes(1024 * 1024 * 1024) // 1GB disk cache
.build();
Cloud Provider Configuration¶
Amazon S3¶
Basic Configuration¶
var reader = S3RangeReader.builder()
.uri(URI.create("s3://bucket/key"))
.region(Region.US_WEST_2)
.build();
Advanced S3 Configuration¶
// Custom S3 client configuration
var s3Client = S3Client.builder()
.region(Region.US_WEST_2)
.credentialsProvider(ProfileCredentialsProvider.create())
.overrideConfiguration(ClientOverrideConfiguration.builder()
.apiCallTimeout(Duration.ofSeconds(30))
.apiCallAttemptTimeout(Duration.ofSeconds(10))
.retryPolicy(RetryPolicy.builder()
.numRetries(3)
.build())
.build())
.build();
var reader = S3RangeReader.builder()
.client(s3Client)
.bucket("my-bucket")
.key("path/to/object")
.build();
S3-Compatible Storage¶
// MinIO or other S3-compatible services
var reader = S3RangeReader.builder()
.uri(URI.create("s3://bucket/key"))
.endpoint(URI.create("http://localhost:9000"))
.region(Region.US_EAST_1)
.forcePathStyle() // Required for MinIO
.credentialsProvider(StaticCredentialsProvider.create(
AwsBasicCredentials.create("minioadmin", "minioadmin")))
.build();
Azure Blob Storage¶
Connection String Configuration¶
var reader = AzureBlobRangeReader.builder()
.uri(URI.create("https://account.blob.core.windows.net/container/blob"))
.connectionString("DefaultEndpointsProtocol=https;AccountName=account;...")
.build();
SAS Token Configuration¶
var reader = AzureBlobRangeReader.builder()
.uri(URI.create("https://account.blob.core.windows.net/container/blob"))
.sasToken("sv=2020-08-04&ss=b&srt=sco&sp=r&se=...")
.build();
Google Cloud Storage¶
Default Credentials¶
var reader = GoogleCloudStorageRangeReader.builder()
.uri(URI.create("gs://bucket/object"))
.build(); // Uses Application Default Credentials
Service Account¶
var credentials = ServiceAccountCredentials.fromStream(
new FileInputStream("service-account.json"));
var reader = GoogleCloudStorageRangeReader.builder()
.uri(URI.create("gs://bucket/object"))
.credentials(credentials)
.build();
HTTP Configuration¶
Basic HTTP¶
HTTP with Authentication¶
Programmatic Configuration¶
Using the builder API directly:
// Basic authentication
var reader = HttpRangeReader.builder()
.uri(URI.create("https://example.com/data.bin"))
.basicAuth("username", "password")
.build();
// Bearer token
var reader = HttpRangeReader.builder()
.uri(URI.create("https://api.example.com/data.bin"))
.bearerToken("eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...")
.build();
// API key
var reader = HttpRangeReader.builder()
.uri(URI.create("https://api.example.com/data.bin"))
.apiKey("X-API-Key", "your-api-key", null)
.build();
// API key with prefix
var reader = HttpRangeReader.builder()
.uri(URI.create("https://api.example.com/data.bin"))
.apiKey("Authorization", "your-api-key", "ApiKey ")
.build();
Property-Based Configuration¶
Using RangeReaderFactory
with properties for declarative configuration:
// Basic authentication
Properties props = new Properties();
props.setProperty("io.tileverse.rangereader.http.username", "username");
props.setProperty("io.tileverse.rangereader.http.password", "password");
var reader = RangeReaderFactory.create(
URI.create("https://example.com/data.bin"),
props);
// Bearer token authentication
Properties props = new Properties();
props.setProperty("io.tileverse.rangereader.http.bearer-token",
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...");
var reader = RangeReaderFactory.create(
URI.create("https://api.example.com/data.bin"),
props);
// API key authentication
Properties props = new Properties();
props.setProperty("io.tileverse.rangereader.http.api-key-headername", "X-API-Key");
props.setProperty("io.tileverse.rangereader.http.api-key", "your-api-key-here");
var reader = RangeReaderFactory.create(
URI.create("https://api.example.com/data.bin"),
props);
// API key with prefix
Properties props = new Properties();
props.setProperty("io.tileverse.rangereader.http.api-key-headername", "Authorization");
props.setProperty("io.tileverse.rangereader.http.api-key", "your-api-key");
props.setProperty("io.tileverse.rangereader.http.api-key-value-prefix", "ApiKey ");
var reader = RangeReaderFactory.create(
URI.create("https://api.example.com/data.bin"),
props);
Available HTTP Configuration Parameters:
Parameter Key | Type | Description |
---|---|---|
io.tileverse.rangereader.http.timeout-millis | Integer | Connection timeout in milliseconds (default: 5000) |
io.tileverse.rangereader.http.trust-all-certificates | Boolean | Trust all SSL certificates, including self-signed (dev only) |
io.tileverse.rangereader.http.username | String | Username for HTTP Basic Authentication |
io.tileverse.rangereader.http.password | String | Password for HTTP Basic Authentication |
io.tileverse.rangereader.http.bearer-token | String | Bearer token for OAuth 2.0/JWT authentication |
io.tileverse.rangereader.http.api-key-headername | String | HTTP header name for API key (e.g., "X-API-Key") |
io.tileverse.rangereader.http.api-key | String | API key value |
io.tileverse.rangereader.http.api-key-value-prefix | String | Optional prefix for API key value (e.g., "ApiKey ") |
Environment Variables¶
HTTP configuration can also be provided via environment variables (useful for containerized deployments):
# Connection settings
export IO_TILEVERSE_RANGEREADER_HTTP_TIMEOUT_MILLIS=10000
export IO_TILEVERSE_RANGEREADER_HTTP_TRUST_ALL_CERTIFICATES=false
# Basic authentication
export IO_TILEVERSE_RANGEREADER_HTTP_USERNAME=myuser
export IO_TILEVERSE_RANGEREADER_HTTP_PASSWORD=mypassword
# Bearer token authentication
export IO_TILEVERSE_RANGEREADER_HTTP_BEARER_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
# API key authentication
export IO_TILEVERSE_RANGEREADER_HTTP_API_KEY_HEADERNAME=X-API-Key
export IO_TILEVERSE_RANGEREADER_HTTP_API_KEY=abc123xyz
HTTP Client Tuning¶
// For high-throughput scenarios
var reader = HttpRangeReader.builder()
.uri(URI.create("https://example.com/data.bin"))
.connectionTimeout(Duration.ofSeconds(10))
.build();
// Property-based configuration
Properties props = new Properties();
props.setProperty("io.tileverse.rangereader.http.timeout-millis", "10000");
var reader = RangeReaderFactory.create(
URI.create("https://example.com/data.bin"),
props);
Optimal Configurations by Use Case¶
High-Frequency Random Access¶
Optimize for applications that make many small, random reads:
// ✅ CORRECT: Optimize cloud storage for high-frequency random access
var reader = CachingRangeReader.builder(
S3RangeReader.builder()
.uri(URI.create("s3://bucket/data.bin"))
.region(Region.US_WEST_2)
.build())
.maximumSize(2000) // Large memory cache
.expireAfterAccess(1, TimeUnit.HOURS)
.build();
Note: Local files don't benefit from caching since the OS already provides efficient file caching.
Large Sequential Reads¶
Optimize for applications that read large chunks sequentially:
// ✅ CORRECT: Optimize HTTP for large sequential reads
var reader = DiskCachingRangeReader.builder(
HttpRangeReader.builder()
.uri(URI.create("https://cdn.example.com/large-dataset.bin"))
.withBearerToken(authToken)
.build())
.maxCacheSizeBytes(5L * 1024 * 1024 * 1024) // 5GB disk cache
.build();
Cloud Storage with Resilience¶
Optimize for cloud storage with network resilience:
// ✅ CORRECT: Multi-level caching with proper decorator order
var reader = CachingRangeReader.builder(
DiskCachingRangeReader.builder(
S3RangeReader.builder()
.uri(s3Uri)
.region(region)
.build())
.maxCacheSizeBytes(20L * 1024 * 1024 * 1024) // 20GB persistent cache
.build())
.maximumSize(1000) // Memory cache entries
.expireAfterAccess(2, TimeUnit.HOURS)
.build();
Memory-Constrained Environments¶
Optimize for environments with limited memory:
// ✅ CORRECT: Memory-constrained cloud storage optimization
var reader = CachingRangeReader.builder(
DiskCachingRangeReader.builder(
AzureBlobRangeReader.builder()
.uri(URI.create("https://account.blob.core.windows.net/container/data.bin"))
.sasToken(sasToken)
.build())
.maxCacheSizeBytes(1024 * 1024 * 1024) // 1GB disk cache
.deleteOnClose() // Clean up cache
.build())
.maximumSize(50) // Small memory cache
.softValues() // Allow GC to reclaim memory
.build();
Configuration Guidelines¶
Read Strategy Selection¶
Data Source | Recommended Strategy | Rationale |
---|---|---|
Local Files | Direct reads | OS caching is already optimal |
HTTP | Chunked reads (256KB - 1MB) | Reduce request overhead |
S3 | Large reads (1MB - 8MB) | Minimize API calls |
Azure Blob | Large reads (1MB - 4MB) | Optimize for Azure's performance |
Google Cloud | Large reads (1MB - 8MB) | Balance throughput and latency |
Cache Size Guidelines¶
Use Case | Memory Cache | Disk Cache |
---|---|---|
Interactive Applications | 50-200 entries | 1-5GB |
Batch Processing | 10-50 entries | 10-50GB |
Server Applications | 500-2000 entries | 5-20GB |
Mobile/Embedded | 10-50 entries | 100MB-1GB |
⚠️ CRITICAL: Decorator Stacking Order¶
CachingRangeReader should wrap DiskCachingRangeReader for optimal performance!
Application
↓
CachingRangeReader (memory - fast access)
↓
DiskCachingRangeReader (persistent storage)
↓
BaseReader (source-specific)
↓
Data Source
Why this order matters: - Fast access: Memory cache provides immediate results for recently accessed ranges - Persistence: Disk cache survives application restarts - Optimal fallback: Missing memory cache entries can be served from disk cache
Monitoring and Diagnostics¶
Cache Statistics¶
if (reader instanceof CachingRangeReader cachingReader) {
var stats = cachingReader.getCacheStats();
System.out.println("Hit rate: " + stats.hitRate());
System.out.println("Cache size: " + stats.estimatedSize());
System.out.println("Eviction count: " + stats.evictionCount());
}
Source Identification¶
// Useful for debugging and logging
String sourceId = reader.getSourceIdentifier();
System.out.println("Reading from: " + sourceId);
// Output examples:
// "file:///path/to/file.bin"
// "memory-cached:disk-cached:block-aligned:s3://bucket/key"
Environment Variables¶
Set these environment variables for global configuration:
# AWS credentials (for S3)
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_REGION=us-west-2
# Azure credentials
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;..."
# Google Cloud credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# HTTP proxy settings
export HTTP_PROXY=http://proxy.company.com:8080
export HTTPS_PROXY=https://proxy.company.com:8080
Extending RangeReader Functionality¶
The Tileverse Range Reader library is designed to be highly extensible, allowing you to easily add support for new data sources or custom protocols beyond the built-in file://
, http(s)://
, s3://
, and gs://
options (cloud providers like Azure are automatically detected from http(s)
URLs).
This is achieved through a powerful plugin mechanism. If you have a unique storage system or a specific way of accessing data that isn't covered by the standard implementations, you can create your own "custom reader" and integrate it seamlessly with the library.
Once your custom reader is developed and properly configured, the RangeReaderFactory
will automatically discover it and use it when you request a RangeReader
for a URI that your custom reader can handle. This means you can extend the library's capabilities without modifying its core code.
For detailed technical instructions on how to develop and register a new custom reader, please refer to the Extension Architecture section in the Developer Guide.
Resource Management Patterns¶
Short-term Usage¶
For processing individual files, batch jobs, or one-time operations, use try-with-resources:
// ✅ Short-term access pattern
public ByteBuffer processFile(URI dataSource, long offset, int length) throws IOException {
try (RangeReader reader = S3RangeReader.builder()
.uri(dataSource)
.withCaching()
.build()) {
// Validate parameters
if (offset < 0 || length < 0) {
throw new IllegalArgumentException("Offset and length must be non-negative");
}
// Check bounds
if (offset >= reader.size()) {
return ByteBuffer.allocate(0); // Empty buffer for reads beyond EOF
}
// Perform read and return
ByteBuffer result = reader.readRange(offset, length);
result.flip(); // Prepare buffer for reading
return result;
} catch (IOException e) {
logger.error("Failed to read range [{}:{}] from {}: {}",
offset, length, dataSource, e.getMessage());
throw e;
}
}
Best for: - Batch processing jobs - One-time file analysis - Command-line tools - Simple applications
Long-term Usage¶
For server applications, services, or long-running processes, manage lifecycle explicitly:
// ✅ Long-term access pattern
public class TileServer implements Closeable {
private final RangeReader reader;
private final ExecutorService executor;
public TileServer(URI dataSource) throws IOException {
// Create optimized reader for server use
this.reader = CachingRangeReader.builder(
DiskCachingRangeReader.builder(
S3RangeReader.builder()
.uri(dataSource)
.region(Region.US_WEST_2)
.build())
.maxCacheSizeBytes(5L * 1024 * 1024 * 1024) // 5GB cache
.build())
.maximumSize(2000) // Large memory cache
.recordStats() // Monitor performance
.build();
this.executor = Executors.newFixedThreadPool(10);
}
public CompletableFuture<byte[]> getTileAsync(long offset, int size) {
return CompletableFuture.supplyAsync(() -> {
try {
// Reader stays open for multiple requests
ByteBuffer buffer = reader.readRange(offset, size);
byte[] result = new byte[buffer.remaining()];
buffer.get(result);
return result;
} catch (IOException e) {
throw new RuntimeException(e);
}
}, executor);
}
@Override
public void close() throws IOException {
try {
executor.shutdown();
if (!executor.awaitTermination(30, TimeUnit.SECONDS)) {
executor.shutdownNow();
}
} catch (InterruptedException e) {
executor.shutdownNow();
Thread.currentThread().interrupt();
} finally {
reader.close(); // ⚠️ Essential: close when service shuts down
}
}
}
Best for: - Web servers - Microservices - Desktop applications - Data processing services
Key Principles¶
- Short-term: Use try-with-resources for automatic cleanup
- Long-term: Explicit lifecycle management with proper shutdown hooks
- Always close: Ensure readers are closed to release resources
- Thread safety: All readers support concurrent access safely
Next Steps¶
- Authentication: Detailed authentication setup
- Troubleshooting: Common configuration issues
- Performance Guide: Advanced optimization techniques