Wednesday, March 21, 2012

Generating reports on very large and "live" databases

Hello All,
I am working on a application which has a table with 100 million
records.
A background process keeps inserting 5 rows per second into this
table.
My client is generating reports from that table. They are interested
in seeing reports with real time data. This means that as soon as a
record is inserted into the table, the reports should process that
record as well.
At the max around 20K records are processed to generate a report. When
this happens it takes simply 30 min to generate a report. Upon
analysis I find that since the table is so huge, the indexes on those
table is also very huge.
What can I do to fix this, I have thought about denormalizing the
table. But the some programmers say that picking up data from one
table is better because doing joins on multiple tables will be even
slower.
Another approach is to bring in a data warehouse but I don't know much
about this except what I learnt in MSDN session about creating of data
cubes. But I suppose cube can be created only when data is static. but
in my case new records are inserted every second and they are to be
included in the report.
The 3rd approach is that I create report specific tables and create a
trigger (or a C programm which polls for changes in main table) and
every time new records are inserted into the main table, I preprocess
them. Then when the users make a request for the report I generate my
report from the preprossed table.
But I feel the trigger will be fired to many times and if the number
of reports are significant (> 35) then trigger/C program could become
a bottleneck itself.
What should I do? it is such a tricky problem.
Please help me and give me some advice. Thank you for your help.
regards,
Abhishektable size (or row count) is irrelevent unless a
table/index scan is involved.
5 rows/sec insert is a negligible load on the system.
however, a report that involves 20k rows with good indexes
should not take 30min to run.
1) what is the query,
2) what are the indexes on this table
3) what does the execution plan show?
(indexes used, type of operation, rows involved, costs)
>--Original Message--
>Hello All,
>I am working on a application which has a table with 100
million
>records.
>A background process keeps inserting 5 rows per second
into this
>table.
>My client is generating reports from that table. They are
interested
>in seeing reports with real time data. This means that as
soon as a
>record is inserted into the table, the reports should
process that
>record as well.
>At the max around 20K records are processed to generate a
report. When
>this happens it takes simply 30 min to generate a report.
Upon
>analysis I find that since the table is so huge, the
indexes on those
>table is also very huge.
>What can I do to fix this, I have thought about
denormalizing the
>table. But the some programmers say that picking up data
from one
>table is better because doing joins on multiple tables
will be even
>slower.
>Another approach is to bring in a data warehouse but I
don't know much
>about this except what I learnt in MSDN session about
creating of data
>cubes. But I suppose cube can be created only when data
is static. but
>in my case new records are inserted every second and they
are to be
>included in the report.
>The 3rd approach is that I create report specific tables
and create a
>trigger (or a C programm which polls for changes in main
table) and
>every time new records are inserted into the main table,
I preprocess
>them. Then when the users make a request for the report
I generate my
>report from the preprossed table.
>But I feel the trigger will be fired to many times and if
the number
>of reports are significant (> 35) then trigger/C program
could become
>a bottleneck itself.
>What should I do? it is such a tricky problem.
>Please help me and give me some advice. Thank you for
your help.
>regards,
>Abhishek
>.
>

No comments:

Post a Comment