Work EV Infrastructure

VoltLync

A production OCPP 1.6 charging station management system with appless UPI-QR charging and automated franchisee payouts.

Services

  • Product strategy
  • Design
  • Protocol engineering (OCPP 1.6)
  • Full-stack engineering
  • Mobile (iOS + Android) engineering
  • Payments & settlements
  • DevOps & deployment

Stack

FastAPI · Python · Tortoise ORM · PostgreSQL · Redis · OCPP 1.6 (WebSocket) · Next.js · React · TypeScript · TanStack Query · Tailwind CSS · Shadcn/ui · Leaflet · Capacitor · Vite · Clerk · Razorpay · Razorpay Route · Sentry · New Relic · Docker · Nginx · AWS EC2

Outcomes

  • 7 Chargers live in production
  • 1M+ OCPP messages handled
  • 200+ UPI payments settled

The brief

VoltLync is building the intelligent-power layer for EV charging in India — their own EVSE hardware (the VOLTNOW line) plus the software that runs the chargers, takes the payment, and pays the franchisee who owns the site. The platform had to do three things most CSMS products don’t do together: speak OCPP 1.6 fluently to every charge point in the field, accept a payment from a rider who has no app and no account, and settle each session with the right franchisee the next day — with an audit trail clean enough to hand to a finance team.

VoltLync admin dashboard

The unlock: appless UPI QR charging

Every charger in the field has a printed UPI QR stuck to it. A rider with no app, no account, no onboarding walks up, scans the QR in any UPI app, and pays whatever they want. The session starts. When they unplug, the platform bills the real energy used (with GST), refunds whatever they didn’t consume, and — if the rider exceeded what they paid for — auto-stops the charger mid-session before the wallet goes negative.

The plumbing behind that flow is the most interesting part of the system:

  • Razorpay webhook → payment service. qr_code.credited events land on a webhook handler that’s idempotent against the Razorpay payment ID, runs staleness checks, resolves the user (phone → VPA → UPI_GUEST fallback), and triggers a RemoteStartTransaction against the charger.
  • Budget cached in Redis at StartTransaction. When the charger reports the session has started, the service links the QR payment to the transaction and caches the budget (the paid amount) in Redis so it’s available on every MeterValues tick without a database round-trip.
  • Budget check on every MeterValues. As kWh accumulates, the handler computes energy charge + GST + Razorpay platform fee against the cached budget. If the session has consumed what the rider paid for, the service schedules a RemoteStopTransaction — the charger stops before billing goes negative.
  • Automatic refund on StopTransaction. Final billing runs process_qr_session_billing: energy_charge = kWh × rate, gst = energy_charge × gst_percent / 100, refund = amount_paid − energy_charge − gst − platform_fee. The refund fires via Razorpay with a stable idempotency key (qr_payment_{id}) so retries are safe.
  • Actual Razorpay fee tracking. Platform fee resolution has three tiers — webhook payload → live API fetch → 2% estimated fallback — so VoltLync’s books reflect the real commission Razorpay took, not an estimate.

The rider never sees any of this. They pay, they plug in, they leave. The refund lands within minutes.

VoltLync QR code management with revenue per charger

OCPP 1.6: the protocol work that makes the rest possible

The platform speaks OCPP 1.6 over WebSocket to every charge point — a protocol built for a world of reliable networks and well-behaved firmware that, in practice, is neither. The engineering work here is less “implement the spec” and more “absorb the ways real chargers misbehave”:

Transaction state machine with resume semantics. Transactions aren’t a binary started / stopped. The state machine runs STARTED → RUNNING → SUSPENDED → RUNNING → STOPPED, with suspended_at / resumed_at / resume_count columns tracking every transition. When a charger reboots mid-session, BootNotification suspends the transaction instead of killing it. When the charger comes back with MeterValues, the transaction resumes from the last known kWh via a GetLastMeterValue DataTransfer exchange.

Stale-reconnect defense. If the chain of suspend/timeout handlers ever fails, a silent resume would overcharge the rider. A MAX_RESUME_GAP_SECONDS=900 guard at all three resume points (MeterValues auto-resume, BootNotification per-transaction, GetLastMeterValue DataTransfer) checks the gap since the most recent activity and falls through to STALE_RECONNECT finalization if the transaction is too old to safely resume.

Socket-charger grace period. AC-socket chargers don’t reliably report Charging status to the CSMS — they sit at Available while the car draws power. Type-2/CCS chargers go non-charging → fail immediately. Socket chargers get a 5-minute grace period via Redis during which MeterValues keep the transaction alive. charger_type_service.should_use_grace_period() enforces that only Available qualifies; Faulted / Unavailable / Reserved always fail, regardless of connector type.

Disconnect-aware suspension. When a charger’s WebSocket drops, a disconnect callback suspends every active transaction on it with a 180-second timeout. If the charger reconnects, the transaction resumes. If it doesn’t, a centralized transaction_finalizer — single source of truth for stopping a transaction — runs billing, issues QR refund if applicable, and audit-logs the stop reason.

Pathological-flap detector. A counter tracks consecutive disconnects-without-energy-progress per transaction. Zeroed when MeterValues show real kWh advancing. Trips after MAX_DISCONNECT_RESETS_WITHOUT_PROGRESS=3, so a charger stuck in a reconnect loop doesn’t accumulate suspended sessions forever.

VoltLync charger management list

Connection management with a tombstone. core/connection_manager.py centralizes every charger’s live WebSocket, heartbeat monitor (120s timeout), and OCPP command dispatch. When a command is issued to a charger that’s mid-reconnect, the tombstone mechanism prevents commands landing on a stale socket.

Invalid stop-reason sanitization. One real-world firmware we saw sends "AppStop" as a StopTransaction reason — not a valid OCPP enum value, which causes strict validation to reject the entire message. A route_message override sanitizes non-standard reasons to "Other" so the transaction still closes cleanly, and the original reason is captured in audit logs.

VoltLync charger detail with firmware and QR code

Firmware OTA, end to end

The admin uploads a .bin, picks chargers, and the platform takes it from there. Behind that one-click flow:

  • Firmware files land in a Docker named volume (backend_firmware_{env}) with MD5 checksums computed at upload time and served via a static /firmware/{filename} route.
  • Triggering an update constructs an OCPP UpdateFirmware message with a signed download URL, retrieve date, retry count, and retry interval — pre-validated (charger online, no active transaction).
  • FirmwareStatusNotification messages from the charger walk the update through Downloading → Downloaded → Installing → Installed, each step timestamped in the database. On success, charger.firmware_version is updated. On failure, the error is stored and surfaced in the dashboard.
  • Bulk updates fan out across chargers with independent state tracking so one stuck charger doesn’t hold up the fleet.
  • The Docker image handles a subtle permissions problem: the firmware volume’s root directory can be seeded under a different user across rebuilds. The entrypoint runs as root, chowns the directory to app:app, then exec gosu app drops privileges — healing volumes that would otherwise break uploads in staging.
VoltLync firmware management with OTA updates

Franchisee settlement via Razorpay Route

Each charger is owned by a franchisee — a local operator who put the hardware in the ground and takes a cut of every session. When money comes in via UPI QR, VoltLync needs to disburse the franchisee’s share automatically, with the right GST treatment, after the session settles.

We chose a nodal model over direct-QR ownership: every QR code is platform-owned, so payments always land in VoltLync’s nodal Razorpay balance first. Post-session, the platform creates a Route transfer to the franchisee’s linked account. This sidesteps the operational nightmare of every franchisee managing their own Razorpay KYC before a charger can take its first payment — and lets the platform apply commission and TDS cleanly at the ledger layer.

The settlement pipeline:

  • Linked-account creation. Admin fills business details (business type, PAN, GSTIN, registered address). franchisee_onboarding_service.create_linked_account() builds the Razorpay payload with reference_id=franchisee_{id}, category utilities / electric_vehicle_charging, and the mandatory profile.addresses.registered block. Razorpay emails the franchisee a KYC invite directly.
  • Webhook-driven status machine. account.activated, account.instantly_activated, account.activated_kyc_pending, account.under_review, account.needs_clarification, account.rejected — every KYC state transition advances the franchisee status without a human in the loop.
  • Commission ledger. When transaction_finalizer closes a session, franchisee_settlement_service.process_settlement() creates a CommissionLedgerEntry with a stable idempotency key (txn_{id}). The pure-math calculate_settlement() function computes net_excl_gst = gross − refund − pg_fee − gst_collected, deducts platform commission and TDS, and leaves the franchisee payout.
  • Route transfer with idempotency. initiate_transfer calls Razorpay with X-Transfer-Idempotency set to the ledger entry’s idempotency key. Retries dedupe server-side. Entries are skipped with ON_HOLD status when the franchisee is gated (transfers_enabled=False or funds_on_hold=True).
  • Settlement reconciliation. The settlement.processed webhook walks the settlement entity’s transfers list, flips each matching ledger entry to SETTLED, and captures the per-transfer fee Razorpay actually charged — so the books are reconciled to the paisa.
  • Retry service. A 30-minute background job picks up FAILED and ON_HOLD entries (below MAX_TRANSFER_RETRIES=3) and re-runs the transfer with the same idempotency key.

Operator-grade observability

VoltLync OCPP message log viewer

Running a live charging network means field engineers will call you and say “charger X hasn’t worked since yesterday.” You need an answer in under a minute. The platform has:

  • Full OCPP message log with correlation IDs for every request/response pair, IN/OUT direction, exportable to CSV — so any session can be replayed end-to-end.
  • Vendor-specific signal quality tracking. The firmware sends RSSI and BER via DataTransfer vendor messages; the CSMS validates the ranges (RSSI 0–31, 99=unknown), stores them in a signal_quality table, and surfaces a live color-coded badge on every charger’s detail page with 5-second auto-refresh.
  • Structured error history. StatusNotification error events land in a charger_error table with OCPP error code, vendor error code, vendor ID, and resolution state. Auto-resolved when NoError is received. 7-day error history is one click away on any charger.
VoltLync charger error history
  • Entity audit log. Every material change — charger connected/disconnected, transaction status transition, firmware update, QR code regenerated — is recorded with actor, action, trigger, and before/after state. Filterable by time range, action, and actor type.
VoltLync audit log
  • Failure-mode metrics. record_disconnect_suspended, record_disconnect_stopped, record_zero_energy_stopped, record_billing_failed, record_stale_suspended_swept — each paired with a New Relic counter and a Sentry structured event, each linked from a runbook. When a specific failure mode spikes, oncall knows exactly which playbook to open.
  • Background retry services. A 30-minute billing retry service picks up failed transactions, orphaned QR payments, and stale suspended transactions — so transient Razorpay outages or database blips self-heal without manual intervention.

The three user surfaces

Admin console (Next.js 15 + Shadcn)

Station CRUD with geographic coordinates, charger CRUD with OCPP credentials, transaction viewer with meter-value charts (Recharts), user and wallet management, firmware dashboard with real-time update status, QR code management with revenue-per-charger analytics, and an OCPP log explorer for deep debugging. TanStack Query drives every data surface with tuned stale times and auto-refresh intervals (5s for live signal quality, 10s for firmware update status, 30s for error history).

Rider mobile app (Capacitor + React 19 + Vite)

A native iOS/Android app built on a single React 19 codebase. Leaflet map with real-time station availability, Haversine distance calculation, color-coded markers, and deep-links to Google Maps for directions. QR scanner via @capacitor/barcode-scanner with a manual-entry fallback. Live charging screen with 2–3s refresh, real-time meter values (energy, power, voltage, current), and native Razorpay via capacitor-razorpay for wallet recharge.

Public appless surface

/my-charges — a no-auth page with the station map on top and a UPI-VPA lookup on the bottom, so an appless rider can pull up their own transaction history, refund status, and energy consumed by entering the UPI ID they paid from. No account needed.

Franchisee portal

Dashboard, stations, chargers, transactions, settlements, profile, and their own QR code management. Once a franchisee’s Razorpay KYC completes, they can regenerate each platform-owned QR into a franchisee-owned one — the retroactive compliance path that lets direct-to-franchisee flow take over from the nodal model.

Engineering decisions worth calling out

Auth is two providers, not one. Clerk handles admin and registered-user JWTs with RS256 signature verification against the JWKS endpoint (strict issuer validation, rotating keys). A second UPI_GUEST provider handles appless riders — they authenticate implicitly via the Razorpay payment that kicked off their session. The same API surface serves both, so the frontend, the mobile app, and the public /my-charges page all go through one auth layer.

One transaction finalizer. Stopping a transaction is fraught — wallet billing, QR refund, meter-value finalization, audit logging, and cache cleanup all have to happen in the right order, exactly once, across at least four different triggers (BootNotification timeout, disconnect timeout, startup sweep, three resume points). Consolidating into transaction_finalizer.finalize_stopped_transaction — idempotent, single source of truth — replaced duplicated stop-and-bill logic that had lived in three separate modules.

Razorpay idempotency keys are deliberate, not defensive. Every Razorpay mutation — refund, transfer, QR regenerate — carries a stable idempotency key tied to a domain identifier (the QR payment ID, the ledger entry ID). Retries after a network blip replay the original response instead of double-refunding or double-paying.

Database schema designed around the audit story, not the happy path. suspended_at, resumed_at, resume_count, energy_charge, gst_amount, total_billed are all first-class columns on transaction. fee_source on qr_payment records whether the platform fee came from the webhook, an API fetch, or an estimate. Every webhook event lands in webhook_event with the raw payload for replay.

Deployed on AWS EC2 with Docker Compose. Backend, frontend, nginx, Redis, and Postgres on a single box per environment (staging and production are separate EC2s, same compose topology). Non-root containers via gosu. Razorpay live keys in production — staging uses the same live keys because QR payments require live mode to actually move money, and the webhook handler skips “not found” transactions gracefully so cross-environment webhook events don’t raise errors.

How we approached it

  • Protocol-first, product-second. The OCPP message handlers and the transaction state machine were built against the 1.6 specification before any payment logic was wired in. That meant when appless UPI QR was later layered on top, it slotted into well-defined hooks at StartTransaction / MeterValues / StopTransaction — not into a pile of ad-hoc charging logic.
  • Real chargers, from day one. Every edge case in the CSMS — the sanitized stop reasons, the socket-charger grace period, the boot-notification transaction suspension, the pathological flap detector — came from field behaviour, not theory. The simulators in backend/simulators/ exercise the same message paths, but the production failure modes drove the design.
  • Nodal first, franchisee-owned later. Launching the QR flow as platform-owned meant chargers could take their first payment without any franchisee’s KYC blocking go-live. The “regenerate QR to franchisee-owned” path exists as a clean retroactive upgrade, not as the critical path.
  • Observability is a feature, not a layer. Sentry + New Relic + structured audit log + OCPP log explorer weren’t bolted on at the end — the @trace_transaction decorator, the failure-mode metric helpers, and the audit-event writes are in the same services as the business logic.

The team

From MakaraTech: a delivery manager, an architect, a senior engineer, and two engineers.

Building something like this?

Tell us where you are — idea, prototype, or funded pre-seed. 30 minutes, no pitch.