Anchor Modelling: Sixth Normal Form databases

About Anchor Modelling

Anchor Modelling moves you beyond third normal form and into sixth normal form. What does this mean? Essentially it means that an attribute is stored independently against the key, not in a big table with other attributes. This means you can easily store metadata about that attribute and do full change tracking with ease. The historical problem with this methodology is that it makes writing queries a real pain. Anchor Modelling overcomes this by providing views that combine all the attribute data together.

This system means that it’s super easy to manage new or changing schema requirements as every table is independent, so no deployment takes a table offline or impacts other data. The storage of just data that is present reduces storage requirements. The pervasive auditing makes full change tracking, including changes to the schema, a breeze.

There is a small performance overhead of the system, despite materialised joins, so I wouldn’t recommend it as the source for reports / queries that require super response times. This performance hit is however vastly outweighed in the agility and auditing of the system for a lot of my use cases.

The neat thing is that you can put an anchor model alongside your existing data warehouse and slowly convert, or use it to prototype a model and then switch over when it’s stable if you don’t need all the anchor features or concepts.

The GUI

The Anchor Modelling guys built a fantastic GUI to develop and explore Anchor models in. You build a fantastic representation, and then use the export to get your DDL SQL, an XML representation, or even HTML documentation.

The urge to go “weeeeeeeeee!” as I play with GUI may, or may not, have been a key deciding factor in my usage of the system 😉

MeDriAnchor

The Anchor Model system is a fantastic GUI and outputs the SQL needed for DDL – defining the database – but it doesn’t help with ETL. At the time, there wasn’t an existing ETL framework for an anchor datawarehouse and I needed to be able to integrate new data really quickly and I had significant time pressures in the day job. Along with James Skipwith, over 45 working days we built the Metadata Driven Anchor Model or MeDriAnchor for short. This enabled us to store the relationships between source systems and our schema, use the Anchor Model’s sisulator to build the DDL and then generate the DML stored procedures to pull data from one system to another.

I’ve had pretty limited time so this has been on the back-burner for a while, but it’s a nifty system even though it’s still v1.

Why not temporal tables?

In 2016, temporal tables will provide a historisation facility that stores old row versions in another table and gives extra predicates for identifying rows at specific points in time. This is certainly a massive step forward for people and will cover a reasonable amount of use cases but it stores more data than necessary, still suffers from problems amending a third normal form schema, and doesn’t include other concepts of time that an anchor model can.

Presentation

You can check out the slides I put together on Anchor Models on Sway.

About the presentation

I used Sway to produce the presentation and it was great how easy it was to build a highly visual presentation in it. One thing that’s important to note though is it will adjust based on the output screen so the interactions can differ between a high-res widescreen laptop, a low res square projector, and a mobile screen.

6 Replies to “Anchor Modelling: Sixth Normal Form databases”

  1. This looks cool. So the headline is: A database with a true rewind button.

    I’d like to see how its usability fares in practice though. You’re basically giving the ability to ask ‘Give me the universe as it was 3 years ago’. What I’d wonder is how many gotchas and tricky stuff does that introduce when trying to say give me a continuous load of stuff over x few years? Surely the payoff for the old way – shoehorning old data into a new schema – is that you get to write simple, smooth queries. I guess it makes me think that you should probably have a good reason to use this model rather than it be the new default. That said I haven’t delved in and fully understood yet.

    1. I used it when I was dropping in new columns, new tables, new relations, every few hours / day. By doing incremental loads, I was able to rapidly build things up, backfill automatically, and pull the info into reports. It gave me one heck of a responsive information facility, but at the price of slower queries. I think my next step would be to start building 3NF sinks for well-defined reports and stable areas of data to put those queries onto faster layers. eVeryhting would feed from the 6NF but some areas would be served from a 3NF “cache”.

      1. I can see how adding things would be mostly unproblematic but what about changes to existing schemata?

        If I have a big query that is characterised by something like WHERE year > 2014 how can that include a column that moved or was further normalised during that period?

        1. “It depends” but I certainly refactored a number of columns – you get new columns and you can either backfill the data (if available) or union data from schema version X with data from schema version Y. There’s no impediment to backfilling, it’s just audit history that is potentially lost.

          1. Ok, so I guess the advantage is that yes, you do need to do a similar amount of migration work to do the backfilling but doing that backfilling is optional unlike most 3NF use cases.

          2. Most of my backfills happened automatically. Part of the logic we had in MeDriAnchor was that if a field was empty it would populate all the results. But that backfills with only the latest value for each row. That doesn’t have any historic values unless you do a manual backfill from your original column. If you don’t do that backfill you would get NULLs for the column if you wrote a query for a point in time before the date of the backfill.

What do you think?