Synchronising schema between MSSQL & MySQL with SSIS

May 29, 2013
Steph
Microsoft Data Platform

The problem:

A system we need to report on that is form based. Whenever there is a new form, there is a new table, and whenever there is a new or amended* field on the form, there is a new column in the table. Maintaining the imports of this data into a staging environment would require a lot of code and time to build manually from scratch.

What is required is something that goes through the two schema for all relevant objects and updates our staging area’s schema accordingly.

Points for consideration:

Due to the level of change in source system, all loads are dynamically generated SQL
Loads run from a data dictionary table, which needs to be updated when we update the schema
Loads occur daily

Group_concat

On top of existing mysql requirements, we need group_concat – a brilliant function and very useful in this situation. It is set so that tables with lots of measures can have all the names collapsed into one string without truncation

Second up: list of key objects

Enumerate through key objects

Drop existing table

Because it’s easier to drop and recreate a table rather than alter it, I proceed to drop and recreate the table – don’t do this on tables you report directly from!

Use a variable to make sql dynamic

DropTableSQL <– “drop table if exists worksmart_”+ @[User::TableToCheck]

Data flow task: update data dictionary

In Source tasks, dummy sql is required:

SELECT 
ltrim(lower(COLUMN_NAME)) as systemfield, 
DATA_TYPE
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='TableToCheck' 
AND data_type<>'image' 
AND COLUMN_NAME NOT IN ('function','pref_name') 
ORDER BY replace(column_name,'_','')

You will then need to add an expression on the data flow task for the sql to actually pull from a variable:

WorksmartColumnDataSQL <-

"SELECT 
ltrim(lower(COLUMN_NAME)) as systemfield, 
DATA_TYPE
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='TableToCheck' 
AND data_type<>'image' 
AND COLUMN_NAME NOT IN ('function','pref_name') 
ORDER BY replace(column_name,'_','')"

Left join data to identify what is already present

Only get new entries

Nb – At this point I go onto add some more details but this is a similar step to above so I won’t cover it

Add extra columns as required

Insert into destination

Create table based on new metadata

Same mechanism as dropping a table – a sql task, and a variable

TableCreationSQL <-

"select 
concat(' create table ', table,' (',
GROUP_CONCAT(concat(Field, ' ', fieldtype,' comment ''',coalesce(fielddescription,''),'''')),
coalesce(concat(', primary key(',(
             select group_concat(field) 
             from _data_dictionary d2 
             where d2. table=d. able 
             and d2.Is_PrimaryKey='y' 
             group by table),')'),''), ') 
ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci') 
FROM _data_dictionary d 
where systemtable='"+ @[User::TableToCheck] +
"' group by table"

The Result Set is then passed to the next variable to be executed

Perform table creation

As above but no mapping, and ActualTableCreationSQL is used