GRAX
A lightweight Python library that transforms geospatial shapefiles into machine-learning-ready graph networks using NetworkX — bridging the gap between GIS data formats and graph-based ML workflows.
Shapefile → Graph
Geospatial Data to ML-Ready Networks in One Call
NetworkX
Compatible with Graph ML Frameworks and Network Analysis Tools
PyPI
Published Package — MIT Licensed, Production-Ready
The Problem
Geospatial infrastructure data lives in GIS formats — but graph ML frameworks expect graph objects, and no standard bridge between the two existed
Road networks, utility grids, pipeline systems, and other spatial infrastructure datasets are stored as shapefiles — the standard format for GIS data. These shapefiles encode rich geometric and attribute information, but graph ML frameworks operate on NetworkX graphs, adjacency matrices, and node/edge feature tensors — not shapefiles. Converting between these formats requires understanding both GIS geometry parsing and graph construction conventions, involves non-trivial handling of spatial relationships, and must preserve shapefile attribute data as node and edge features for the resulting graph to be useful in ML tasks. Practitioners working at the intersection of geospatial data and graph ML have historically had to write this conversion logic from scratch for every project — an error-prone, time-consuming step that has nothing to do with the actual ML problem being solved.
The Solution
A lightweight library that parses shapefile geometry and attributes into NetworkX graph objects ready for graph ML pipelines in a single call
GRAX provides a minimal API that takes a shapefile as input and returns a NetworkX graph as output — handling all the geometry parsing, spatial relationship resolution, and attribute mapping in between. Fiona reads the shapefile geometry and attribute records, Shapely processes the geometric operations needed to establish spatial relationships between features, and NumPy handles array operations during feature extraction. The resulting NetworkX graph preserves all shapefile attributes as node and edge features, making it immediately compatible with downstream graph ML frameworks and network analysis tools. The entire conversion — from raw GIS data to a structured, feature-rich graph object — happens in a single library call with minimal dependencies and no GIS expertise required from the practitioner.
Key Outcome
A published Python package that eliminates the shapefile-to-graph conversion problem entirely — parsing geospatial infrastructure data into feature-rich NetworkX graph objects in a single call, with all shapefile attributes preserved as node and edge features, ready for immediate use in network analysis, graph ML, route optimization, and spatial risk modeling workflows.
Technical Deep Dive
Architecture & Design
Conversion Pipeline
Stage 1 — Shapefile Input
Geometry
Polylines & Polygons
Spatial features encoding roads, pipelines, utility networks, and other infrastructure geometries
Attributes
Feature Properties
Per-feature attribute records to be preserved as node and edge features in the output graph
Stage 2 — Geometry Parsing & Spatial Processing
Fiona
Shapefile Reading
Reads shapefile geometry records and attribute tables · Iterates features with full property access
Shapely
Geometric Operations
Processes spatial relationships between features · Resolves connectivity for graph edge construction
NumPy
Feature Processing
Array operations during attribute extraction · Numerical feature preparation for ML readiness
Stage 3 — Graph Construction
Nodes
Spatial Features as Nodes
Each shapefile feature mapped to a graph node · Shapefile attributes attached as node feature dictionary
Edges
Spatial Relationships as Edges
Connectivity derived from geometric relationships · Edge attributes from spatial properties
Stage 4 — ML-Ready Output
Output · NetworkX Graph Object
Feature-Rich Graph — Ready for Graph ML & Network Analysis
All shapefile attributes preserved as node and edge features · Compatible with downstream graph ML frameworks, route optimization, and spatial risk analysis tools
Stage 1
Shapefile Input
GRAX accepts any standard shapefile containing polyline or polygon geometry — the formats most commonly used to encode spatial infrastructure data such as road networks, pipeline systems, utility grids, and drainage networks. The input includes both the geometry component, which defines the spatial shape of each feature, and the attribute table, which stores the properties associated with each feature that will become node and edge features in the output graph.
Stage 2
Geometry Parsing & Spatial Processing
Fiona reads the shapefile and iterates over each feature, exposing its geometry and attribute record. Shapely performs the geometric operations needed to resolve spatial relationships between features — identifying which features share boundaries, intersect, or are connected — which determines graph edge construction. NumPy handles array operations during attribute extraction, preparing numerical feature values for ML-ready representation in the output graph's node and edge dictionaries.
Stage 3
Graph Construction
Each shapefile feature is mapped to a NetworkX node, with its attribute record attached as a node feature dictionary. Edges are constructed from the spatial relationships resolved in Stage 2 — features that are geometrically connected become linked nodes in the graph, with edge attributes derived from the spatial properties of their connection. The result is a NetworkX graph that faithfully represents both the topology of the spatial network and the attributes of every feature within it.
Stage 4
ML-Ready Output
The output NetworkX graph is immediately compatible with downstream graph ML frameworks, network analysis tools, and spatial ML pipelines. Node and edge feature dictionaries preserve all shapefile attributes, making the graph ready for graph neural network tasks, centrality analysis, route optimization, and network-based risk modeling without any additional feature engineering. The NetworkX format is the standard input accepted by PyTorch Geometric, DGL, and other major graph ML libraries.
Key Design Decisions
Bridging two isolated worlds — GIS and graph ML
GIS practitioners work with shapefiles, projections, and spatial operations. Graph ML practitioners work with adjacency matrices, node feature tensors, and graph objects. These two communities use fundamentally different data representations and toolchains, with no standard bridge between them. GRAX is purpose-built as that bridge — handling all the GIS-side complexity of geometry parsing and spatial relationship resolution so that the practitioner receives a graph object they can use immediately, without needing to understand either Fiona's geometry model or Shapely's spatial operation API.
Attribute preservation as a first-class requirement
A graph structure alone — nodes and edges without features — is rarely sufficient for ML tasks. The analytical value of geospatial infrastructure data lies in the attributes attached to each feature: capacity, material type, construction year, risk rating, flow rate. GRAX preserves all shapefile attribute records as node and edge feature dictionaries in the NetworkX output, ensuring no information is lost in the conversion. The resulting graph is not just topologically correct — it is feature-complete and immediately ready for supervised or unsupervised graph ML without additional data loading steps.
Minimal dependencies by design
GRAX depends on exactly four libraries — fiona, shapely, NumPy, and NetworkX — all of which are well-maintained, widely used, and already present in most geospatial or ML Python environments. This minimal footprint means the library installs quickly, introduces no dependency conflicts with existing project environments, and adds no maintenance burden. The lightweight API surface — a single conversion call — reflects the same philosophy: do one thing well, with the minimum complexity required to do it correctly.
Tech Stack
| Technology | Purpose |
|---|---|
| Python | Core language and package implementation |
| Fiona | Shapefile reading and geometry record iteration with full attribute access |
| Shapely | Geometric operations and spatial relationship resolution for edge construction |
| NumPy | Array operations and numerical feature processing during attribute extraction |
| NetworkX | Graph construction and output representation — node/edge features, network analysis compatibility |
| PyPI | Package distribution — pip install grax |
Results & Metrics
What the package delivers
Shapefile → Graph
Single-Call Conversion
Full geometry parsing, spatial relationship resolution, and attribute mapping in one library call
NetworkX
Graph ML Compatible
Output accepted directly by PyTorch Geometric, DGL, and all major graph ML frameworks
PyPI
Published & MIT Licensed
Installable in one command · Open source · Free for any use
Eliminates the shapefile-to-graph conversion problem entirely
Before GRAX, converting geospatial infrastructure data to a graph representation required understanding fiona's geometry model, shapely's spatial API, and NetworkX's graph construction conventions — and writing all the glue code that connects them. This boilerplate logic had to be reimplemented from scratch for every project. GRAX replaces all of that with a single library call, letting practitioners focus entirely on the network analysis or graph ML task rather than the data format conversion problem.
Feature-complete graphs — all shapefile attributes preserved as node and edge features
The output NetworkX graph preserves every shapefile attribute as a node or edge feature dictionary entry — capacity, material, age, risk rating, flow rate, or any other domain-specific property stored in the attribute table. This means the graph is not just structurally correct but analytically complete — ready for supervised graph ML, unsupervised clustering, centrality analysis, or risk propagation modeling without any additional data loading or feature engineering steps.
Compatible with the full graph ML and network analysis ecosystem
NetworkX is the standard intermediate representation accepted by PyTorch Geometric, DGL, StellarGraph, and other major graph ML libraries — all of which provide utilities to convert NetworkX graphs into their native formats. GRAX output plugs directly into these conversion utilities, enabling geospatial infrastructure data to flow seamlessly into graph neural network training pipelines, link prediction models, community detection algorithms, and route optimization frameworks.
Applied in research — enabling graph-based infrastructure risk modeling
GRAX was developed to support geospatial graph construction in infrastructure risk research — where road networks, utility grids, and pipeline systems represented as shapefiles needed to be converted into graph structures for ML-based risk propagation modeling and systemic risk analysis. The library directly enables the graph-theoretic component of that research by handling the data format challenge, allowing the research focus to remain on the risk modeling methodology rather than the data engineering required to support it.