Development/AWS

Why Did Australia Choose RDF (Graph) for Their New Address Model? A Discovery from GitHub

kozylife 2025. 8. 6. 19:49

1. Prologue: An Interesting Encounter with a GitHub Repository

While browsing GitHub recently, I discovered an intriguing public repository called icsm-au/addr-model. This repository is an official project managed by ICSM (Intergovernmental Committee on Surveying and Mapping for Australia and New Zealand), aimed at establishing new address data standards for all of Australia, including Queensland. What caught my attention was its completely different approach from traditional address management systems.

 

The most striking aspect was that this new model defines addresses using RDF (Resource Description Framework), essentially a graph data format. Particularly, the _all.ttl file shows concrete examples of RDF data using the Turtle format.

qld-addr:ADR001 a addr:Address ;
    addr:hasStreet qld-addr:street001 ;
    addr:hasGeocode qld-addr:geo001 ;
    addr:postcode "4000" .

qld-addr:street001 a addr:Street ;
    addr:streetName "Queen Street" ;
    addr:streetType "Street" .

This raised an intriguing question: Why did a government agency choose RDF, a relatively unfamiliar graph data model, instead of the familiar relational database (RDB)? I wanted to explore the pros and cons of these two database approaches using this new address model as a case study.

 

All content in this article is based on my personal analysis and speculation drawn from publicly available GitHub repository information.

2. Approach 1: What if We Convert Turtle Data to Relational Database (RDB)?

Let's first examine a hypothetical scenario of converting Australia's Turtle data into a traditional RDB schema.

Conversion Method

Table Design

  • addresses table: Basic address information
  • streets table: Street names and types
  • localities table: Locality information
  • geocodes table: Geographic coordinate information

Column Definition

CREATE TABLE addresses (
    id BIGINT PRIMARY KEY,
    street_id BIGINT,
    locality_id BIGINT,  
    geocode_id BIGINT,
    postcode VARCHAR(10),
    FOREIGN KEY (street_id) REFERENCES streets(id),
    FOREIGN KEY (locality_id) REFERENCES localities(id),
    FOREIGN KEY (geocode_id) REFERENCES geocodes(id)
);

CREATE TABLE streets (
    id BIGINT PRIMARY KEY,
    street_name VARCHAR(255),
    street_type VARCHAR(50)
);

Data Migration Each predicate from the Turtle file (addr:streetName, addr:postcode, etc.) would need to be mapped to appropriate table columns, requiring complex conversion scripts to parse RDF triples and INSERT them into relational tables.

Advantages (Pros)

Familiarity and Maturity

SQL and RDB are very familiar technologies to most developers and DBAs. The ecosystem of related tools, libraries, and communities is highly mature, making human resource acquisition and maintenance relatively easy.

 

Strict Data Integrity

Predefined schemas and constraints can strongly guarantee data consistency. Invalid postcode formats or references to non-existent streets can be prevented in advance.

 

Robust Transaction Processing

Very strong in transaction processing that guarantees ACID (Atomicity, Consistency, Isolation, Durability). Even large-scale address data updates can be processed safely.

Disadvantages (Cons)

Schema Rigidity

When new elements are added to the address system (e.g., 'building numbers within complexes', 'underground shopping mall floor numbers'), schema changes like ALTER TABLE are required, which can be cumbersome and risky.

 

Difficulty in Expressing Complex Relationships

Complex many-to-many relationships like 'Building A and Building B belong to the same complex' or 'This road spans across two regions' require multiple JOINs.

 

For example, finding 'all addresses that belong to a specific Street, are included in a specific Locality, and have a specific type of Geocode' would require a complex query like this:

SELECT a.* 
FROM addresses a
JOIN streets s ON a.street_id = s.id
JOIN localities l ON a.locality_id = l.id  
JOIN geocodes g ON a.geocode_id = g.id
WHERE s.street_name = 'Queen Street'
  AND l.locality_name = 'Brisbane'
  AND g.coordinate_type = 'centroid';

As relationships become more complex, the number of JOINs increases exponentially, making queries complex and causing performance degradation.

 

Loss of Semantic Information

Rich semantic relationships that RDF possesses, such as addr:Address being rdfs:subClassOf geo:Feature, are reduced to simple table and column names, losing their semantic meaning.

3. Approach 2: What if We Implement Turtle Data in a Graph Database?

Now let's examine a scenario where Turtle data is natively stored in a graph database.

Implementation Method

Data Loading

Choose a graph database that supports RDF (e.g., Amazon Neptune, Stardog, GraphDB, etc.).

 

Direct Import

Since the _all.ttl file is already graph data, it can be imported directly using the database's loader without any conversion process.

 

Model Mapping

  • RDF Subjects and Objects (e.g., qld-addr:ADR001, qld-addr:street001) become nodes (vertices) in the graph
  • RDF Predicates (e.g., addr:hasStreet, addr:hasGeocode) become edges (relationships) connecting the nodes

Advantages (Pros)

Perfect Data Model Compatibility

The data structure is preserved without conversion, so relationships like 'an address has a street' are clearly maintained. The data model itself becomes the database structure with no information loss.

 

Flexible and Scalable Structure

When new address attributes or relationship types emerge, you simply add new nodes and edges without schema changes. It's very flexible in adapting to data model evolution.

 

Powerful Relationship-based Queries

Complex relationship traversal queries like 'find all addresses that belong to a specific postcode and have a specific type of geocode' can be performed intuitively and quickly without SQL's JOIN hell.

SELECT ?address WHERE {
    ?address addr:postcode "4000" ;
             addr:hasGeocode ?geocode .
    ?geocode a addr:GeographicCoordinate .
}

Inference Capability

The power of RDF and graph databases is revealed in their inference functionality. For example, if you define in the model that 'addr:Street is a subclass of geo:Feature' (addr:Street rdfs:subClassOf geo:Feature), then when you query for geo:Feature, you automatically get all Street data in the results without explicitly querying for addr:Street. This is a feature that's very difficult to implement in RDB.

Disadvantages (Cons)

Relatively High Learning Curve

Graph database query languages (e.g., SPARQL, Cypher) are relatively unfamiliar compared to SQL for RDB, requiring learning.

 

Ecosystem Maturity

Compared to RDB, the selection of related tools, libraries, and managed services may still be limited.

 

Specific Purpose Fit

Not optimal for all types of data. Much more suitable for domains where 'relationships' between data are important, rather than simple list aggregation or large-scale transaction processing.

4. Conclusion: My Thoughts - Why Did Australian ICSM Choose Graph?

All content in this section is based on my personal opinion drawn from information in the addr-model repository.

 

Address data is as much about connections and relationships (belonging, adjacency, composition) between data as it is about individual data points (address names, postcodes). Therefore, I believe graph databases that can express the inherent structure of data as-is are a much more natural and powerful choice than RDB.

 

Speculation on Australian ICSM's Choice

Future-Proofing

Address systems continue to change and become more complex over time. Flexible graph models can adapt to future changes much more easily.

 

Data Integration and Connectivity (Linked Data Ecosystem)

Using RDF goes beyond simply choosing database technology; it appears to be an intention to build a Linked Data ecosystem. Address data is core hub data that connects to numerous other administrative data such as land, public facilities, electoral districts, and disaster management.

 

By assigning unique web addresses (URIs) to each address, this address data can be easily 'linked' to other government datasets like land, buildings, and demographic data. This breaks down data silos within government and becomes the core foundation for implementing truly data-driven administration. RDF (Linked Data) technology is optimized for easily connecting and integrating such distributed data in standardized ways.

 

Semantic Clarity

'Address', 'road', and 'region' are managed as 'concepts' with clear definitions and hierarchical structures, rather than simple text. This improves data quality and enables machines to better understand and reason about the data.

 

Complex Spatial/Relational Queries

They likely chose the graph model to handle composite queries like "all commercial addresses within a 5km radius of Fire Station A that are also included in flood risk areas."

5. Conclusion

Database selection should start with the question "What is the inherent form of the data I'm dealing with?" rather than simply being a matter of technology stack.

 

The Australian ICSM's address model case is an excellent example of deeply understanding the structural characteristics of data and choosing the optimal technology accordingly. The adoption of new technology by a government agency, departing from traditional approaches, is particularly meaningful.

 

I encourage readers to consider similar questions for their own data projects. Is a table format more natural for the data you're dealing with, or is a graph format where connections and relationships are more important?

 

This article is based on personal analysis and speculation drawn from publicly available GitHub repository information and may differ from the actual decision-making process of Australian ICSM.