Skip to content

sdrobov/chain-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chain Index

Chain Index is a focused blockchain data product for teams that need reliable, queryable ERC-20 transfer data without building and operating a full custom indexing stack from scratch.

At a product level, it turns raw on-chain activity into an operational data service: transfers are ingested from an Ethereum-compatible RPC endpoint, normalized, stored in PostgreSQL, exposed over HTTP, and optionally published to Kafka for downstream consumers. At an engineering level, it is a small Go service built around finalized-block indexing, idempotent persistence, reorg handling, and production-friendly observability.

Why this exists

Direct RPC access is a poor fit for most product and data workflows. It is expensive to query repeatedly, difficult to paginate consistently, and awkward to connect to internal systems such as analytics pipelines, customer operations tooling, or risk engines.

Chain Index solves that gap by creating a compact indexing layer for a defined set of ERC-20 tokens.

It is useful when you need to:

  • power wallet, token, or treasury activity views in a product UI
  • support compliance, finance, or operations teams with searchable transfer history
  • feed internal event-driven systems from blockchain activity via Kafka
  • backfill historical token flows and then keep them updated in live mode
  • avoid coupling product features directly to RPC latency and rate limits

What the product does

For a configured list of token contract addresses, Chain Index:

  • reads finalized blocks from an Ethereum-compatible chain
  • fetches ERC-20 Transfer logs for those tokens
  • stores indexed blocks and transfer records in PostgreSQL
  • exposes transfer history and indexing stats over HTTP
  • emits transfer events to either an in-memory broker or Kafka

This makes it suitable both as a standalone indexing service and as a reusable data plane component inside a larger platform.

Product characteristics

  • Focused scope: indexes ERC-20 Transfer events for explicitly configured token addresses, not arbitrary contract events
  • Operationally simple: one Go binary with PostgreSQL, optional Kafka, and Prometheus-compatible metrics
  • Useful for both history and realtime: supports backfill mode and continuous live indexing
  • Designed for internal product teams: easy to query, easy to deploy, easy to integrate into downstream systems
  • Built for correctness over novelty: finalized-block strategy, reorg detection, and idempotent writes reduce data drift

How it works

The service follows a ports-and-adapters architecture.

  1. The indexer reads configuration from environment variables.
  2. On startup it opens PostgreSQL, applies SQL migrations automatically, and connects to the RPC endpoint.
  3. It determines the latest finalized block. If the chain does not expose finalized or safe tags, it falls back to latest minus a configurable confirmation depth.
  4. It resumes from the last stored block or from START_BLOCK for a fresh deployment.
  5. It fetches block headers in batches, validates parent-child continuity, and detects reorgs.
  6. For each batch, it fetches ERC-20 Transfer logs for the configured tokens.
  7. It upserts indexed block metadata and transfer records into PostgreSQL.
  8. It publishes each transfer to the configured broker.
  9. It updates metrics and serves query traffic over HTTP.

Reliability and data integrity

Chain Index is intentionally opinionated about correctness.

  • Finalized-first indexing: the reader prioritizes finalized or safe block tags before using a confirmation-based fallback
  • Reorg-aware processing: if a stored block hash no longer matches observed chain history, the service deletes affected rows from the divergence point and replays from there
  • Idempotent storage: blocks and transfers are written with upsert semantics, so restarts and retries do not create duplicates
  • Retry-aware runtime: transient RPC quota and timeout failures are retried with backoff by the indexer supervisor
  • Explicit readiness: readiness checks verify both PostgreSQL reachability and RPC access

This makes the service suitable for production workloads where downstream users care about stable, replayable historical data.

Architecture

Core runtime components:

  • Indexer service: orchestrates batching, finality tracking, reorg handling, persistence, and broker publishing
  • Ethereum RPC adapter: fetches block headers and transfer logs from an Ethereum-compatible node
  • PostgreSQL store: persists indexed blocks and transfers and serves query workloads
  • HTTP API adapter: exposes health, readiness, metrics, stats, and transfer search endpoints
  • Broker adapter: publishes transfer payloads to Kafka or stores them in memory for local development and tests
  • Observability package: exposes Prometheus metrics for RPC latency, RPC failures, and indexing lag

Data model

The product stores two core entities.

Indexed blocks:

  • block number
  • block hash
  • parent hash
  • block timestamp
  • created and updated timestamps

Transfers:

  • block number and block hash
  • transaction hash and log index
  • from and to addresses
  • token address
  • raw transfer value as a string
  • block timestamp
  • created and updated timestamps

Database indexes support common query patterns on sender, recipient, token, and reverse chronological block traversal.

API

Health and readiness

  • GET /healthz returns a simple process health response
  • GET /readyz verifies PostgreSQL and RPC connectivity
  • GET /metrics exposes Prometheus metrics

Product and operational queries

  • GET /stats returns aggregate indexing statistics
  • GET /transfers returns transfer history filtered by wallet address and/or token address

Example:

curl "http://localhost:8080/transfers?address=0xabc...&limit=50&offset=0"
curl "http://localhost:8080/transfers?token=0xdac17f958d2ee523a2206206994597c13d831ec7&limit=100"
curl "http://localhost:8080/stats"

The transfers endpoint requires at least one of:

  • address
  • token

Supported pagination parameters:

  • limit, default 100, max 500
  • offset, default 0

Broker output

Each indexed transfer can be published as JSON.

  • memory broker: useful for tests and local-only runs
  • Kafka broker: useful when transfer activity should feed data pipelines, alerting, enrichment, settlement, or product automations

Kafka messages use the transaction hash as the message key and the full transfer payload as the message value.

Metrics

Prometheus metrics include:

  • chain_index_rpc_latency_seconds
  • chain_index_rpc_failures_total
  • chain_index_indexing_lag_blocks
  • chain_index_indexing_lag_seconds

These cover the two most important operational questions:

  • is the RPC dependency healthy enough to sustain indexing?
  • how far behind realtime is the indexer?

Running locally

Prerequisites

  • Go 1.25+
  • Docker and Docker Compose
  • access to an Ethereum-compatible RPC endpoint

Option 1: run the full local stack with Docker Compose

  1. Copy the environment template.
  2. Fill in at least RPC_URL and TOKEN_ADDRESSES.
  3. Start the stack.
cp .env.example .env
docker compose up --build

The compose setup includes:

  • PostgreSQL
  • Redpanda acting as a Kafka-compatible broker
  • Redpanda Console
  • the Chain Index application

By default the app is exposed on port 8080 and the Redpanda Console on port 8081.

Option 2: run the binary directly

Start local dependencies first, then run:

go run ./cmd/indexer

Configuration

Environment variables:

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string
RPC_URL Yes Ethereum-compatible RPC endpoint
HTTP_ADDR No HTTP listen address, default :8080
LOG_LEVEL No debug, info, warn, or error
LOG_FORMAT No json or text
MODE Yes backfill or live
START_BLOCK Yes First block to index on a fresh deployment
END_BLOCK No Optional upper bound for backfill runs
BATCH_SIZE No Number of blocks processed per batch
TOKEN_ADDRESSES Yes Comma-separated ERC-20 token contract addresses
POLL_INTERVAL No How often live mode checks for new finalized blocks
RPC_TIMEOUT No Timeout per RPC request
RPC_RETRIES No Retry attempts for RPC calls
RPC_BACKOFF No Initial RPC retry backoff
RPC_FINALITY_FALLBACK_CONFIRMATIONS No Confirmation depth used if finalized or safe tags are unavailable
BROKER_KIND Yes memory or kafka
KAFKA_BROKERS No Comma-separated Kafka broker list
KAFKA_TOPIC No Kafka topic for transfer events
APP_PORT Compose only Host port mapping for the app container

Recommended deployment patterns

Typical ways product and platform teams use Chain Index:

  • As an internal data service behind dashboards or support tooling
  • As a transfer event source for stream-processing systems via Kafka
  • As a backfill worker that seeds PostgreSQL before a product launch
  • As a lightweight chain-ingestion component inside a broader fintech or web3 platform

Tradeoffs and current scope

This repository is deliberately narrow.

  • It indexes only ERC-20 Transfer events
  • Token coverage is allowlist-based through configuration
  • Query API is optimized for operational lookups, not broad analytical SQL replacement
  • It stores values as strings to preserve on-chain precision and avoid implicit decimal assumptions

That narrow scope is intentional: for many teams, a dependable and understandable indexing service is more valuable than a generic but harder-to-operate indexing platform.

Engineering notes

  • Written in Go with a small runtime footprint
  • Uses PostgreSQL migrations on startup via Goose
  • Ships as a single container image based on a distroless runtime
  • Exposes JSON logs and text logs depending on environment
  • Keeps infrastructure optional: Kafka can be switched off in favor of the in-memory broker for basic local flows

Who this is for

Chain Index is a good fit for:

  • product managers validating token-driven features and needing fast access to transfer history
  • backend engineers who need a dependable blockchain ingestion layer
  • data and analytics teams that want a normalized operational dataset
  • platform teams that need Kafka-ready blockchain events without maintaining a larger indexing platform

If you need a small, production-ready ERC-20 transfer indexing service that is understandable by both engineers and business stakeholders, this repository is built for that use case.

About

Chain Index is a focused blockchain data product for teams that need reliable, queryable ERC-20 transfer data without building and operating a full custom indexing stack from scratch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors