We've got the requirement where we need to load data (usually dirty of course) from a flat file, and for every column, if the value is invalid, I'm putting a -1 in the field as it flows through the data stream - no problem. (I'm converting all invalids to -1 so they'll go into my eventual int column)
My question though is - at the end of the data flow, is there any smooth way to capture the % invalid values in each column. An example: If my table is as follows...where the # 1 signifies "clean data" and -1 signifies dirty.
col1 col2 col3 col4
1 1 1 1
1 -1 -1 1
-1 1 -1 1
-1 1 -1 1
I would want my % distribution query/ssis result to return:
col1 col2 col3 col4
.5 .25 .75 0
I do not want to do a select/ssis process for each column. Any way to do this all in one shot?
You could write a script component to track the number of good and bad values, and output them to an asynchronous output. Or execute a SQL statement after the data flow to get the statistics.|||
jwelch wrote:
execute a SQL statement after the data flow to get the statistics.
That's what I would try first....just to keep it simpler
No comments:
Post a Comment