Open-Source Tools · Python Package · PyPI

GRAX

A lightweight Python library that transforms geospatial shapefiles into machine-learning-ready graph networks using NetworkX — bridging the gap between GIS data formats and graph-based ML workflows.

Architecture Python Package · PyPI
Tech Stack
Python NetworkX Fiona Shapely NumPy PyPI

Shapefile → Graph

Geospatial Data to ML-Ready Networks in One Call

NetworkX

Compatible with Graph ML Frameworks and Network Analysis Tools

PyPI

Published Package — MIT Licensed, Production-Ready

The Problem

Geospatial infrastructure data lives in GIS formats — but graph ML frameworks expect graph objects, and no standard bridge between the two existed

Road networks, utility grids, pipeline systems, and other spatial infrastructure datasets are stored as shapefiles — the standard format for GIS data. These shapefiles encode rich geometric and attribute information, but graph ML frameworks operate on NetworkX graphs, adjacency matrices, and node/edge feature tensors — not shapefiles. Converting between these formats requires understanding both GIS geometry parsing and graph construction conventions, involves non-trivial handling of spatial relationships, and must preserve shapefile attribute data as node and edge features for the resulting graph to be useful in ML tasks. Practitioners working at the intersection of geospatial data and graph ML have historically had to write this conversion logic from scratch for every project — an error-prone, time-consuming step that has nothing to do with the actual ML problem being solved.

The Solution

A lightweight library that parses shapefile geometry and attributes into NetworkX graph objects ready for graph ML pipelines in a single call

GRAX provides a minimal API that takes a shapefile as input and returns a NetworkX graph as output — handling all the geometry parsing, spatial relationship resolution, and attribute mapping in between. Fiona reads the shapefile geometry and attribute records, Shapely processes the geometric operations needed to establish spatial relationships between features, and NumPy handles array operations during feature extraction. The resulting NetworkX graph preserves all shapefile attributes as node and edge features, making it immediately compatible with downstream graph ML frameworks and network analysis tools. The entire conversion — from raw GIS data to a structured, feature-rich graph object — happens in a single library call with minimal dependencies and no GIS expertise required from the practitioner.

Key Outcome

A published Python package that eliminates the shapefile-to-graph conversion problem entirely — parsing geospatial infrastructure data into feature-rich NetworkX graph objects in a single call, with all shapefile attributes preserved as node and edge features, ready for immediate use in network analysis, graph ML, route optimization, and spatial risk modeling workflows.

Technical Deep Dive

Architecture & Design

Conversion Pipeline

Stage 1 — Shapefile Input

Geometry

Polylines & Polygons

Spatial features encoding roads, pipelines, utility networks, and other infrastructure geometries

Attributes

Feature Properties

Per-feature attribute records to be preserved as node and edge features in the output graph

Stage 2 — Geometry Parsing & Spatial Processing

Fiona

Shapefile Reading

Reads shapefile geometry records and attribute tables · Iterates features with full property access

Shapely

Geometric Operations

Processes spatial relationships between features · Resolves connectivity for graph edge construction

NumPy

Feature Processing

Array operations during attribute extraction · Numerical feature preparation for ML readiness

Stage 3 — Graph Construction

Nodes

Spatial Features as Nodes

Each shapefile feature mapped to a graph node · Shapefile attributes attached as node feature dictionary

Edges

Spatial Relationships as Edges

Connectivity derived from geometric relationships · Edge attributes from spatial properties

Stage 4 — ML-Ready Output

Output · NetworkX Graph Object

Feature-Rich Graph — Ready for Graph ML & Network Analysis

All shapefile attributes preserved as node and edge features · Compatible with downstream graph ML frameworks, route optimization, and spatial risk analysis tools

Stage 1

Shapefile Input

GRAX accepts any standard shapefile containing polyline or polygon geometry — the formats most commonly used to encode spatial infrastructure data such as road networks, pipeline systems, utility grids, and drainage networks. The input includes both the geometry component, which defines the spatial shape of each feature, and the attribute table, which stores the properties associated with each feature that will become node and edge features in the output graph.

Stage 2

Geometry Parsing & Spatial Processing

Fiona reads the shapefile and iterates over each feature, exposing its geometry and attribute record. Shapely performs the geometric operations needed to resolve spatial relationships between features — identifying which features share boundaries, intersect, or are connected — which determines graph edge construction. NumPy handles array operations during attribute extraction, preparing numerical feature values for ML-ready representation in the output graph's node and edge dictionaries.

Stage 3

Graph Construction

Each shapefile feature is mapped to a NetworkX node, with its attribute record attached as a node feature dictionary. Edges are constructed from the spatial relationships resolved in Stage 2 — features that are geometrically connected become linked nodes in the graph, with edge attributes derived from the spatial properties of their connection. The result is a NetworkX graph that faithfully represents both the topology of the spatial network and the attributes of every feature within it.

Stage 4

ML-Ready Output

The output NetworkX graph is immediately compatible with downstream graph ML frameworks, network analysis tools, and spatial ML pipelines. Node and edge feature dictionaries preserve all shapefile attributes, making the graph ready for graph neural network tasks, centrality analysis, route optimization, and network-based risk modeling without any additional feature engineering. The NetworkX format is the standard input accepted by PyTorch Geometric, DGL, and other major graph ML libraries.

Key Design Decisions

Bridging two isolated worlds — GIS and graph ML

GIS practitioners work with shapefiles, projections, and spatial operations. Graph ML practitioners work with adjacency matrices, node feature tensors, and graph objects. These two communities use fundamentally different data representations and toolchains, with no standard bridge between them. GRAX is purpose-built as that bridge — handling all the GIS-side complexity of geometry parsing and spatial relationship resolution so that the practitioner receives a graph object they can use immediately, without needing to understand either Fiona's geometry model or Shapely's spatial operation API.

Attribute preservation as a first-class requirement

A graph structure alone — nodes and edges without features — is rarely sufficient for ML tasks. The analytical value of geospatial infrastructure data lies in the attributes attached to each feature: capacity, material type, construction year, risk rating, flow rate. GRAX preserves all shapefile attribute records as node and edge feature dictionaries in the NetworkX output, ensuring no information is lost in the conversion. The resulting graph is not just topologically correct — it is feature-complete and immediately ready for supervised or unsupervised graph ML without additional data loading steps.

Minimal dependencies by design

GRAX depends on exactly four libraries — fiona, shapely, NumPy, and NetworkX — all of which are well-maintained, widely used, and already present in most geospatial or ML Python environments. This minimal footprint means the library installs quickly, introduces no dependency conflicts with existing project environments, and adds no maintenance burden. The lightweight API surface — a single conversion call — reflects the same philosophy: do one thing well, with the minimum complexity required to do it correctly.

Tech Stack

Technology Purpose
Python Core language and package implementation
Fiona Shapefile reading and geometry record iteration with full attribute access
Shapely Geometric operations and spatial relationship resolution for edge construction
NumPy Array operations and numerical feature processing during attribute extraction
NetworkX Graph construction and output representation — node/edge features, network analysis compatibility
PyPI Package distribution — pip install grax

Results & Metrics

What the package delivers

Shapefile → Graph

Single-Call Conversion

Full geometry parsing, spatial relationship resolution, and attribute mapping in one library call

NetworkX

Graph ML Compatible

Output accepted directly by PyTorch Geometric, DGL, and all major graph ML frameworks

PyPI

Published & MIT Licensed

Installable in one command · Open source · Free for any use

🗺️

Eliminates the shapefile-to-graph conversion problem entirely

Before GRAX, converting geospatial infrastructure data to a graph representation required understanding fiona's geometry model, shapely's spatial API, and NetworkX's graph construction conventions — and writing all the glue code that connects them. This boilerplate logic had to be reimplemented from scratch for every project. GRAX replaces all of that with a single library call, letting practitioners focus entirely on the network analysis or graph ML task rather than the data format conversion problem.

🔗

Feature-complete graphs — all shapefile attributes preserved as node and edge features

The output NetworkX graph preserves every shapefile attribute as a node or edge feature dictionary entry — capacity, material, age, risk rating, flow rate, or any other domain-specific property stored in the attribute table. This means the graph is not just structurally correct but analytically complete — ready for supervised graph ML, unsupervised clustering, centrality analysis, or risk propagation modeling without any additional data loading or feature engineering steps.

⚙️

Compatible with the full graph ML and network analysis ecosystem

NetworkX is the standard intermediate representation accepted by PyTorch Geometric, DGL, StellarGraph, and other major graph ML libraries — all of which provide utilities to convert NetworkX graphs into their native formats. GRAX output plugs directly into these conversion utilities, enabling geospatial infrastructure data to flow seamlessly into graph neural network training pipelines, link prediction models, community detection algorithms, and route optimization frameworks.

🏗️

Applied in research — enabling graph-based infrastructure risk modeling

GRAX was developed to support geospatial graph construction in infrastructure risk research — where road networks, utility grids, and pipeline systems represented as shapefiles needed to be converted into graph structures for ML-based risk propagation modeling and systemic risk analysis. The library directly enables the graph-theoretic component of that research by handling the data format challenge, allowing the research focus to remain on the risk modeling methodology rather than the data engineering required to support it.