The problem:

A system we need to report on that is form based.  Whenever there is a new form, there is a new table, and whenever there is a new or amended* field on the form, there is a new column in the table.  Maintaining the imports of this data into a staging environment would require a lot of code and time to build manually from scratch.

What is required is something that goes through the two schema for all relevant objects and updates our staging area’s schema accordingly.

Points for consideration:

  • Due to the level of change in source system, all loads are dynamically generated SQL
  • Loads run from a data dictionary table, which needs to be updated when we update the schema
  • Loads occur daily

Group_concat

On top of existing mysql requirements, we need group_concat – a brilliant function and very useful in this situation.  It is set so that tables with lots of measures can have all the names collapsed into one string without truncation

schemasync - pic1

Second up: list of key objects

schemasync - pic2

schemasync - pic3

Enumerate through key objects

schemasync - pic4

schemasync - pic5

 

Drop existing table

Because it’s easier to drop and recreate a table rather than alter it, I proceed to drop and recreate the table – don’t do this on tables you report directly from! schemasync - pic6

Use a variable to make sql dynamic

schemasync - pic7

DropTableSQL <– “drop table if exists worksmart_”+ @[User::TableToCheck]

Data flow task: update data dictionary

schemasync - pic8

In Source tasks, dummy sql is required:

SELECT 
ltrim(lower(COLUMN_NAME)) as systemfield, 
DATA_TYPE
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='TableToCheck' 
AND data_type<>'image' 
AND COLUMN_NAME NOT IN ('function','pref_name') 
ORDER BY replace(column_name,'_','')

You will then need to add an expression on the data flow task for the sql to actually pull from a variable:
schemasync - pic9
WorksmartColumnDataSQL <-

"SELECT 
ltrim(lower(COLUMN_NAME)) as systemfield, 
DATA_TYPE
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='TableToCheck' 
AND data_type<>'image' 
AND COLUMN_NAME NOT IN ('function','pref_name') 
ORDER BY replace(column_name,'_','')"

Left join data to identify what is already present

schemasync - pic10

Only get new entries

schemasync - pic11

Nb – At this point I go onto add some more details but this is a similar step to above so I won’t cover it

Add extra columns as required

schemasync - pic12

Insert into destination

Create table based on new metadata

Same mechanism as dropping a table – a sql task, and a variable
TableCreationSQL <-

"select 
concat(' create table ', table,' (',
GROUP_CONCAT(concat(Field, ' ', fieldtype,' comment ''',coalesce(fielddescription,''),'''')),
coalesce(concat(', primary key(',(
             select group_concat(field) 
             from _data_dictionary d2 
             where d2. table=d. able 
             and d2.Is_PrimaryKey='y' 
             group by table),')'),''), ') 
ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci') 
FROM _data_dictionary d 
where systemtable='"+ @[User::TableToCheck] +
"' group by table"

The Result Set is then passed to the next variable to be executed

Perform table creation

As above but no mapping, and ActualTableCreationSQL is used

schemasync - pic13

Synchronising schema between MSSQL & MySQL with SSIS
Tagged on:             

What do you think?