Synchronising schema between MSSQL & MySQL with SSIS
The problem:
A system we need to report on that is form based. Whenever there is a new form, there is a new table, and whenever there is a new or amended* field on the form, there is a new column in the table. Maintaining the imports of this data into a staging environment would require a lot of code and time to build manually from scratch.
What is required is something that goes through the two schema for all relevant objects and updates our staging area’s schema accordingly.
Points for consideration:
- Due to the level of change in source system, all loads are dynamically generated SQL
- Loads run from a data dictionary table, which needs to be updated when we update the schema
- Loads occur daily
Group_concat
On top of existing mysql requirements, we need group_concat – a brilliant function and very useful in this situation. It is set so that tables with lots of measures can have all the names collapsed into one string without truncation
Second up: list of key objects
Enumerate through key objects
Drop existing table
Because it’s easier to drop and recreate a table rather than alter it, I proceed to drop and recreate the table – don’t do this on tables you report directly from!
Use a variable to make sql dynamic
DropTableSQL <– “drop table if exists worksmart_”+ @[User::TableToCheck]
Data flow task: update data dictionary
In Source tasks, dummy sql is required:
SELECT ltrim(lower(COLUMN_NAME)) as systemfield, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME ='TableToCheck' AND data_type<>'image' AND COLUMN_NAME NOT IN ('function','pref_name') ORDER BY replace(column_name,'_','')
You will then need to add an expression on the data flow task for the sql to actually pull from a variable:
WorksmartColumnDataSQL <-
"SELECT ltrim(lower(COLUMN_NAME)) as systemfield, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME ='TableToCheck' AND data_type<>'image' AND COLUMN_NAME NOT IN ('function','pref_name') ORDER BY replace(column_name,'_','')"
Left join data to identify what is already present
Only get new entries
Nb – At this point I go onto add some more details but this is a similar step to above so I won’t cover it
Add extra columns as required
Insert into destination
Create table based on new metadata
Same mechanism as dropping a table – a sql task, and a variable
TableCreationSQL <-
"select concat(' create table ', table,' (', GROUP_CONCAT(concat(Field, ' ', fieldtype,' comment ''',coalesce(fielddescription,''),'''')), coalesce(concat(', primary key(',( select group_concat(field) from _data_dictionary d2 where d2. table=d. able and d2.Is_PrimaryKey='y' group by table),')'),''), ') ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci') FROM _data_dictionary d where systemtable='"+ @[User::TableToCheck] + "' group by table"
The Result Set is then passed to the next variable to be executed
Perform table creation
As above but no mapping, and ActualTableCreationSQL is used