Wednesday, March 21, 2012
Generating reports on very large and "live" databases
I am working on a application which has a table with 100 million
records.
A background process keeps inserting 5 rows per second into this
table.
My client is generating reports from that table. They are interested
in seeing reports with real time data. This means that as soon as a
record is inserted into the table, the reports should process that
record as well.
At the max around 20K records are processed to generate a report. When
this happens it takes simply 30 min to generate a report. Upon
analysis I find that since the table is so huge, the indexes on those
table is also very huge.
What can I do to fix this, I have thought about denormalizing the
table. But the some programmers say that picking up data from one
table is better because doing joins on multiple tables will be even
slower.
Another approach is to bring in a data warehouse but I don't know much
about this except what I learnt in MSDN session about creating of data
cubes. But I suppose cube can be created only when data is static. but
in my case new records are inserted every second and they are to be
included in the report.
The 3rd approach is that I create report specific tables and create a
trigger (or a C programm which polls for changes in main table) and
every time new records are inserted into the main table, I preprocess
them. Then when the users make a request for the report I generate my
report from the preprossed table.
But I feel the trigger will be fired to many times and if the number
of reports are significant (> 35) then trigger/C program could become
a bottleneck itself.
What should I do? it is such a tricky problem.
Please help me and give me some advice. Thank you for your help.
regards,
Abhishektable size (or row count) is irrelevent unless a
table/index scan is involved.
5 rows/sec insert is a negligible load on the system.
however, a report that involves 20k rows with good indexes
should not take 30min to run.
1) what is the query,
2) what are the indexes on this table
3) what does the execution plan show?
(indexes used, type of operation, rows involved, costs)
>--Original Message--
>Hello All,
>I am working on a application which has a table with 100
million
>records.
>A background process keeps inserting 5 rows per second
into this
>table.
>My client is generating reports from that table. They are
interested
>in seeing reports with real time data. This means that as
soon as a
>record is inserted into the table, the reports should
process that
>record as well.
>At the max around 20K records are processed to generate a
report. When
>this happens it takes simply 30 min to generate a report.
Upon
>analysis I find that since the table is so huge, the
indexes on those
>table is also very huge.
>What can I do to fix this, I have thought about
denormalizing the
>table. But the some programmers say that picking up data
from one
>table is better because doing joins on multiple tables
will be even
>slower.
>Another approach is to bring in a data warehouse but I
don't know much
>about this except what I learnt in MSDN session about
creating of data
>cubes. But I suppose cube can be created only when data
is static. but
>in my case new records are inserted every second and they
are to be
>included in the report.
>The 3rd approach is that I create report specific tables
and create a
>trigger (or a C programm which polls for changes in main
table) and
>every time new records are inserted into the main table,
I preprocess
>them. Then when the users make a request for the report
I generate my
>report from the preprossed table.
>But I feel the trigger will be fired to many times and if
the number
>of reports are significant (> 35) then trigger/C program
could become
>a bottleneck itself.
>What should I do? it is such a tricky problem.
>Please help me and give me some advice. Thank you for
your help.
>regards,
>Abhishek
>.
>
Generating report using batch process
Hi all,
I have tried asking the same question in other forums. All i get is links
Please help me. I have the following requirement:
I have the following tables:
Theater - TheaterId, TheaterName, Revenues,locationid, stateid
State - StateId, StateName
Location - LocationId, LocationName, StateId
I want to generate reports that will tell me the revenue generated for each theater in each location in a state. I want to run a batch process which will loop through the 3 tables and will passing the location and state id as parameters one by one. I want each report to be generated as a pdf and stored in a location. How do I do this?
Thanks.
Two methods come to mind:
1. You can run reports from the command line using the rs command (documentation) as described in the following SQLJunkies article:
http://sqljunkies.com/Article/B197B9E7-EF3D-4D70-9B5E-47B74E57EF38.scuk
which doesn't seem to be up for me so here is the Google cache link:
http://64.233.167.104/search?q=cache:XvoyQYWNs50J
qljunkies.com/Article/B197B9E7-EF3D-4D70-9B5E-47B74E57EF38.scuk&hl=en&gl=us&strip=1
Within the VB script that you run, query your tables and feed the parameters.
2. You could use a variant of a Data Driven Subscription. You will need the Enterprise version of SSRS to use Data Driven Subscriptions. Here is a tutorial on building one:
http://msdn2.microsoft.com/en-us/library/ms169673.aspx
The change that you will make to the standard script is in the Define a query to retrieve subscriber data section. You will just add to the standard query a join to your three tables and use the query results as parameters to the report.
I came accross this article by Jason Selburg which claims to allow you to build your own data driven subscriptions using Standard Edition:
http://www.sqlservercentral.com/columnists/jselburg/datadrivensubscriptions.asp
and his follow up:
http://www.sqlservercentral.com/columnists/jselburg/2824.asp
The reason that you are getting links is that people are willing take the time to give assistance and pointers but not specific solutions. The problem you describe is an interesting one and one that will I will now probably try and cover in my blog in the near future. When I do, I will post the link here and you should be able to get code that is closer to a complete solution.
Larry Smithmier
|||dear larry,
thanks for the link.
vidkshi
Monday, March 19, 2012
Generating File names on the fly
I want to create a package that can process a flat file based on the current data. i.e. name of the file contains current date and some predefined characters.
What is the best way to process it?Use a property expression on the ConnectionString property of your FlatFile connection manager to set it to the correct filename (containing the date).
-Jamie
Generating Custom Messages to User
i'm using a foreach loop container to read a group of excel files and pass their information to a Sql Server database. The process runs well but sometimes there could be some excel files that may not be processed correctly so i'm using transacctions to continue to process on the other files, But i'd like to generate a message everytime an excel file is or not processed. I thought that i could generate a flat file to do it, but is there any other way to accomplish this? I'm also generating a log file (on xml format), but It seems too much information for an end-user.
any suggestions?
regards.
You can generate messages that automatically get captured by your log provider. Is this what you want to do?
-Jamie
|||Hello JamieI'm actually using a log provider, so i get the log of the progress.
My solution was to get into the log (which is an xml output file) two kind of messages: the first one, an script that Logs a new "Sucess file" message and another script task which says "Failure process file". So at the end of the tasks i need to run, i get just one of the script tasks to log into my file.
But is there any other way to do it?.
regards