Showing posts with label million. Show all posts
Showing posts with label million. Show all posts

Friday, March 23, 2012

generating seq no with in a main column

Hi all,
please see if some one can suggest a better approach for this.
i need to move 5 million rows from one table to another.
LoanID is the only clumn in Source table
Source table structure
=========-->
==============================
loanID
1
1
2
2
2
3
4
=====================================
target or detination table should look like this
=========================================
loanID Sqeuence_number_with_in_LoanID
1 1
1 2
2 1
2 2
2 3
3 1
4 1
=========================================

this is what i have tried so far
1)there are about 4 & 1/2 million out of 5 million which will appear
only once (this is one tine load) in source table.
so if i do this
----
sleect loanID
from SourceTable
group by loanID
having count(*)=1

and then i will hard code Sqeuence_number_with_in_LoanID to 1 .because
i know they will only come once.this is set based operation so this is
fast.

2)now,problem is with remainder of the 1/2 million rows.
i had to use cursor for this which is very-very slow.
other option i was thinking of is to pull it in the front end
(.net)and do it there.i am hoping that will be faster.
Database i am working on is sql server 2000.
if some could suggest me some better approach of handling this
remainder
1/2 milion rows.
thanks in advance
-siddYour target table doesn't look very meaningful as you've described it. If
you're not recording any extra information on each row then what's the point
of populating the table with lots of redundant data? Much more efficient
just to add a count column:

CREATE TABLE TargetTable (loanid INTEGER PRIMARY KEY, loancount INTEGER NOT
NULL CHECK (loancount>0))

INSERT INTO TargetTable (loanid, loancount)
SELECT loanid, COUNT(*)
FROM SourceTable
GROUP BY loanid

--
David Portas
----
Please reply only to the newsgroup
--|||Below is one method.

--create table of sequence numbers
DECLARE @.MaxCount int
SELECT @.MaxCount = MAX(loanCount)
FROM(
SELECT COUNT(*) AS loanCount
FROM SourceTable
GROUP BY loanID
) AS loanCounts
CREATE TABLE #SequenceNumbers
(
Sequence_number_with_in_LoanID int NOT NULL
PRIMARY KEY
)
WHILE @.MaxCount > 0
BEGIN
INSERT INTO #SequenceNumbers VALUES(@.MaxCount)
SET @.MaxCount = @.MaxCount - 1
END

-- load target table
SELECT loanID, Sequence_number_with_in_LoanID
INTO TargetTable
FROM (
SELECT loanID, COUNT(*) AS loanCount
FROM SourceTable
GROUP BY loanID) AS SourceTableCounts
JOIN #SequenceNumbers ON
#SequenceNumbers.Sequence_number_with_in_LoanID <=
SourceTableCounts.loanCount
DROP TABLE #SequenceNumbers
GO

--
Hope this helps.

Dan Guzman
SQL Server MVP

"sidd" <siddharthkhare@.hotmail.com> wrote in message
news:af96a2d7.0312270944.2dd0ca8f@.posting.google.c om...
> Hi all,
> please see if some one can suggest a better approach for this.
> i need to move 5 million rows from one table to another.
> LoanID is the only clumn in Source table
> Source table structure
> =========-->
> ==============================
> loanID
> 1
> 1
> 2
> 2
> 2
> 3
> 4
> =====================================
> target or detination table should look like this
> =========================================
> loanID Sqeuence_number_with_in_LoanID
> 1 1
> 1 2
> 2 1
> 2 2
> 2 3
> 3 1
> 4 1
> =========================================
> this is what i have tried so far
> 1)there are about 4 & 1/2 million out of 5 million which will appear
> only once (this is one tine load) in source table.
> so if i do this
> ----
> sleect loanID
> from SourceTable
> group by loanID
> having count(*)=1
> and then i will hard code Sqeuence_number_with_in_LoanID to 1 .because
> i know they will only come once.this is set based operation so this is
> fast.
> 2)now,problem is with remainder of the 1/2 million rows.
> i had to use cursor for this which is very-very slow.
> other option i was thinking of is to pull it in the front end
> (.net)and do it there.i am hoping that will be faster.
> Database i am working on is sql server 2000.
> if some could suggest me some better approach of handling this
> remainder
> 1/2 milion rows.
> thanks in advance
> -sidd|||>> LoanID is the only column in Source table <<

Then this is not a table by definition. Please post DDL in the
future, so we do not have to make guesses. Build a table of
sequential numbers; this is a standard SQL programming trick.

CREATE TABLE Foobar (loan_id, seq_nbr)
SELECT DISTINCT S1.loan_id, seq
FROM Source S1, Sequence
WHERE seq <= (SELECT COUNT(*)
FROM SOurce AS S2
WHERE S1.loan_id = S2.loan_id);

But this is not a good design.|||Dan,
it works great!!
thanks
siddsql

Wednesday, March 21, 2012

Generating reports on very large and "live" databases

Hello All,
I am working on a application which has a table with 100 million
records.
A background process keeps inserting 5 rows per second into this
table.
My client is generating reports from that table. They are interested
in seeing reports with real time data. This means that as soon as a
record is inserted into the table, the reports should process that
record as well.
At the max around 20K records are processed to generate a report. When
this happens it takes simply 30 min to generate a report. Upon
analysis I find that since the table is so huge, the indexes on those
table is also very huge.
What can I do to fix this, I have thought about denormalizing the
table. But the some programmers say that picking up data from one
table is better because doing joins on multiple tables will be even
slower.
Another approach is to bring in a data warehouse but I don't know much
about this except what I learnt in MSDN session about creating of data
cubes. But I suppose cube can be created only when data is static. but
in my case new records are inserted every second and they are to be
included in the report.
The 3rd approach is that I create report specific tables and create a
trigger (or a C programm which polls for changes in main table) and
every time new records are inserted into the main table, I preprocess
them. Then when the users make a request for the report I generate my
report from the preprossed table.
But I feel the trigger will be fired to many times and if the number
of reports are significant (> 35) then trigger/C program could become
a bottleneck itself.
What should I do? it is such a tricky problem.
Please help me and give me some advice. Thank you for your help.
regards,
Abhishektable size (or row count) is irrelevent unless a
table/index scan is involved.
5 rows/sec insert is a negligible load on the system.
however, a report that involves 20k rows with good indexes
should not take 30min to run.
1) what is the query,
2) what are the indexes on this table
3) what does the execution plan show?
(indexes used, type of operation, rows involved, costs)
>--Original Message--
>Hello All,
>I am working on a application which has a table with 100
million
>records.
>A background process keeps inserting 5 rows per second
into this
>table.
>My client is generating reports from that table. They are
interested
>in seeing reports with real time data. This means that as
soon as a
>record is inserted into the table, the reports should
process that
>record as well.
>At the max around 20K records are processed to generate a
report. When
>this happens it takes simply 30 min to generate a report.
Upon
>analysis I find that since the table is so huge, the
indexes on those
>table is also very huge.
>What can I do to fix this, I have thought about
denormalizing the
>table. But the some programmers say that picking up data
from one
>table is better because doing joins on multiple tables
will be even
>slower.
>Another approach is to bring in a data warehouse but I
don't know much
>about this except what I learnt in MSDN session about
creating of data
>cubes. But I suppose cube can be created only when data
is static. but
>in my case new records are inserted every second and they
are to be
>included in the report.
>The 3rd approach is that I create report specific tables
and create a
>trigger (or a C programm which polls for changes in main
table) and
>every time new records are inserted into the main table,
I preprocess
>them. Then when the users make a request for the report
I generate my
>report from the preprossed table.
>But I feel the trigger will be fired to many times and if
the number
>of reports are significant (> 35) then trigger/C program
could become
>a bottleneck itself.
>What should I do? it is such a tricky problem.
>Please help me and give me some advice. Thank you for
your help.
>regards,
>Abhishek
>.
>