Anchor Modelling: Sixth Normal Form databases

December 31, 2015
Steph
DataOps, Microsoft Data Platform

About Anchor Modelling

Anchor Modelling moves you beyond third normal form and into sixth normal form. What does this mean?

Not sure about the normal forms? See the normalization process in actions with this normalisation example

Essentially it means that an attribute is stored independently against the key, not in a big table with other attributes. This means you can easily store metadata about that attribute and do full change tracking with ease. The historical problem with this methodology is that it makes writing queries a real pain. Anchor Modelling overcomes this by providing views that combine all the attribute data together.

This system means that it’s super easy to manage new or changing schema requirements as every table is independent, so no deployment takes a table offline or impacts other data. The storage of just data that is present reduces storage requirements. The pervasive auditing makes full change tracking, including changes to the schema, a breeze.

There is a small performance overhead of the system, despite materialised joins, so I wouldn’t recommend it as the source for reports / queries that require super response times. This performance hit is however vastly outweighed in the agility and auditing of the system for a lot of my use cases.

The neat thing is that you can put an anchor model alongside your existing data warehouse and slowly convert, or use it to prototype a model and then switch over when it’s stable if you don’t need all the anchor features or concepts.

The GUI

The Anchor Modelling guys built a fantastic GUI to develop and explore Anchor models in. You build a fantastic representation, and then use the export to get your DDL SQL, an XML representation, or even HTML documentation.

The urge to go “weeeeeeeeee!” as I play with GUI may, or may not, have been a key deciding factor in my usage of the system 😉

MeDriAnchor

The Anchor Model system is a fantastic GUI and outputs the SQL needed for DDL – defining the database – but it doesn’t help with ETL. At the time, there wasn’t an existing ETL framework for an anchor datawarehouse and I needed to be able to integrate new data really quickly and I had significant time pressures in the day job. Along with James Skipwith, over 45 working days we built the Metadata Driven Anchor Model or MeDriAnchor for short. This enabled us to store the relationships between source systems and our schema, use the Anchor Model’s sisulator to build the DDL and then generate the DML stored procedures to pull data from one system to another.

I’ve had pretty limited time so this has been on the back-burner for a while, but it’s a nifty system even though it’s still v1.

UPDATE: Nowadays, I’d recommend using the system built by the Anchor modeling team: sisula

Why not temporal tables?

In 2016, temporal tables will provide a historisation facility that stores old row versions in another table and gives extra predicates for identifying rows at specific points in time. This is certainly a massive step forward for people and will cover a reasonable amount of use cases but it stores more data than necessary, still suffers from problems amending a third normal form schema, and doesn’t include other concepts of time that an anchor model can.

Presentation

You can check out the slides I put together on Anchor Models on Sway.

About the presentation

I used Sway to produce the presentation and it was great how easy it was to build a highly visual presentation in it. One thing that’s important to note though is it will adjust based on the output screen so the interactions can differ between a high-res widescreen laptop, a low res square projector, and a mobile screen.