Showing posts with label keys. Show all posts
Showing posts with label keys. Show all posts

Friday, March 23, 2012

Generating Surrogate Keys

I've been playing with a couple of ways of doing this, based on the ideas in
the article "The Cost of GUID's as Primary Keys" :
http://www.informit.com/articles/ar...2&seqNum=1&rl=1 and a
couple of other online sources
and I've put together the following:
create view builtins as
select getdate() the_date, newid() the_id
go
create function comb_id() returns uniqueidentifier
as
begin
declare @.out uniqueidentifier
select @.out = CAST(CAST(builtins.the_id AS BINARY(10))
+ CAST(builtins.the_date AS BINARY(6)) AS UNIQUEIDENTIFIER)
from builtins
return @.out
end
GO
create table x(a int, b uniqueidentifier DEFAULT ([dbo].[comb_id]()))
GO
Now, inserts to x result in a new "comb guid" being generated, which I can
trap in an insert trigger from the inserted table. I can also, if I wish,
generate ready-made IDs in the application, by reimplementing the same
algorithm.
Are there any "gotchas" of which I should be aware when using this approach?
Enviroment is SQL 2000, no immediate plans to upgrade.
Any observations welcome,
GCWhy not just use an auto-incrementing identity column?
"GC" <reply_to_group@.please> wrote in message
news:O8RKZr9XFHA.1044@.TK2MSFTNGP10.phx.gbl...
> I've been playing with a couple of ways of doing this, based on the ideas
in
> the article "The Cost of GUID's as Primary Keys" :
> http://www.informit.com/articles/ar...2&seqNum=1&rl=1 and a
> couple of other online sources
> and I've put together the following:
> create view builtins as
> select getdate() the_date, newid() the_id
> go
> create function comb_id() returns uniqueidentifier
> as
> begin
> declare @.out uniqueidentifier
> select @.out = CAST(CAST(builtins.the_id AS BINARY(10))
> + CAST(builtins.the_date AS BINARY(6)) AS UNIQUEIDENTIFIER)
> from builtins
> return @.out
> end
> GO
> create table x(a int, b uniqueidentifier DEFAULT ([dbo].[comb_id]()))
> GO
> Now, inserts to x result in a new "comb guid" being generated, which I can
> trap in an insert trigger from the inserted table. I can also, if I wish,
> generate ready-made IDs in the application, by reimplementing the same
> algorithm.
> Are there any "gotchas" of which I should be aware when using this
approach?
> Enviroment is SQL 2000, no immediate plans to upgrade.
> Any observations welcome,
> GC
>|||Why not create use a generic function to which you pass the current DateTime
and
a new Id. Like so:
Create Function dbo.CombID(@.SeedId UniqueIdentifier, @.CurDate DateTime)
As
Begin
Return Cast(Cast(@.SeedId As Binary(10))
+ Cast(@.CurDate As Binary(6)) As UniqueIdentifier)
End
Thus, you would do something like so in a table:
Create Table TableName
(
AColumn UniqueIdentifier Default (dbo.CombId(NewId(), Current_Timestamp))
)
Beyond the general gotchas of using a Guid, the only gotcha to this approach
would be that you are sacrificing a little "uniqueness" for better indexing
performance. That means that the odds are higher of a collision. Realistical
ly,
the odds are incredibly low even using the full eight bytes of the current d
ate
time.
Thomas
"GC" <reply_to_group@.please> wrote in message
news:O8RKZr9XFHA.1044@.TK2MSFTNGP10.phx.gbl...
> I've been playing with a couple of ways of doing this, based on the ideas
in
> the article "The Cost of GUID's as Primary Keys" :
> http://www.informit.com/articles/ar...2&seqNum=1&rl=1 and a
> couple of other online sources
> and I've put together the following:
> create view builtins as
> select getdate() the_date, newid() the_id
> go
> create function comb_id() returns uniqueidentifier
> as
> begin
> declare @.out uniqueidentifier
> select @.out = CAST(CAST(builtins.the_id AS BINARY(10))
> + CAST(builtins.the_date AS BINARY(6)) AS UNIQUEIDENTIFIER)
> from builtins
> return @.out
> end
> GO
> create table x(a int, b uniqueidentifier DEFAULT ([dbo].[comb_id]()))
> GO
> Now, inserts to x result in a new "comb guid" being generated, which I can
> trap in an insert trigger from the inserted table. I can also, if I wish,
> generate ready-made IDs in the application, by reimplementing the same
> algorithm.
> Are there any "gotchas" of which I should be aware when using this approac
h?
> Enviroment is SQL 2000, no immediate plans to upgrade.
> Any observations welcome,
> GC
>|||A couple of reasons, really. I figured a GUID-type is more "meaningless",
and less likely to be the subject of hard-coded assumptions in the future,
and I also quite liked the idea of being able to generate the GUID in the
application if I wanted to.
"JT" <someone@.microsoft.com> wrote in message
news:%23Ubds49XFHA.3712@.TK2MSFTNGP09.phx.gbl...
> Why not just use an auto-incrementing identity column?
> "GC" <reply_to_group@.please> wrote in message
> news:O8RKZr9XFHA.1044@.TK2MSFTNGP10.phx.gbl...
> in
> approach?
>|||Thomas Coleman wrote:
> Why not create use a generic function to which you pass the current
> DateTime and a new Id. Like so:
Just because, for no particularly good reason, I felt the function should be
niladic.

> Beyond the general gotchas of using a Guid, the only gotcha to this
> approach would be that you are sacrificing a little "uniqueness" for
> better indexing performance. That means that the odds are higher of a
> collision. Realistically, the odds are incredibly low even using the
> full eight bytes of the current date time.
I guess I could test for this in @.@.ERROR and try again if there's been a
collision.
Thanks for your responses,
GC|||>> Why not create use a generic function to which you pass the current
> Just because, for no particularly good reason, I felt the function should
be
> niladic.
Understand. But wouldn't every call need to populate your little temp table
with
a guid and a datetime? Otherwise, you'll get duplicate results.

> I guess I could test for this in @.@.ERROR and try again if there's been a
> collision.
If the column is the primary key, of course this will not be necessary. If t
he
column in question is not the primary key, then you can simply create a uniq
ue
index on that column to ensure uniqueness.
Thomas|||Thomas Coleman wrote:
> Understand. But wouldn't every call need to populate your little temp
> table with a guid and a datetime? Otherwise, you'll get duplicate
> results.
It was a view, so newid() and getdate() would be called each time it was
selected from.

>
> If the column is the primary key, of course this will not be
> necessary. If the column in question is not the primary key, then you
> can simply create a unique index on that column to ensure uniqueness.
>
I was thinking about what to do in the (unlikely) event of the algorithm
giving rise to a collision.
If @.@.ERROR is 2627 (PK violation), then try again, on the basis that you'd
have to be seriously
unlucky to get two collisions in a row.

>
> Thomas|||An Integer is only 4 bytes long. GUIDs are 36 bytes long, and even a self
generated GUID would be at least twice as long as an integer. Not very
efficient for storage or performance reasons. Why is it important to
generate the IDs by the application, is this some sort of offline
application that periodically merges data to the database?
"GC" <reply_to_group@.please> wrote in message
news:ujmllH%23XFHA.2684@.TK2MSFTNGP09.phx.gbl...
> A couple of reasons, really. I figured a GUID-type is more "meaningless",
> and less likely to be the subject of hard-coded assumptions in the future,
> and I also quite liked the idea of being able to generate the GUID in the
> application if I wanted to.
> "JT" <someone@.microsoft.com> wrote in message
> news:%23Ubds49XFHA.3712@.TK2MSFTNGP09.phx.gbl...
ideas
a
wish,
>|||First of all this is not a surrogate key accordignt o Dr. Codd's
definition. It is an artifacial key. And it happens to be long,
unverifiable and a XXXXX to validate.
Here is an implementation of the additive congruential method of
generating values in pseudo-random order and is due to Roy Hann of
Rational Commerce Limited, a CA-Ingres consulting firm. It is based on
a shift-register and an XOR-gate, and it has its origins in
cryptography. While there are other ways to do this, this code is nice
because:
1) The algorithm can be written in C or another low level language for
speed. But math is fairly simple even in base ten.
2) The algorithm tends to generate successive values that are (usually)
"far apart", which is handy for improving the performance of tree
indexes. You will tend to put data on separate physical data pages in
storage.
3) The algorithm does not cycle until it has generated every possible
value, so we don't have to worry about duplicates. Just count how many
calls have been made to the generator.
4) The algorithm produces uniformly distributed values, which is a nice
mathematical property to have. It also does not include zero.
Generalizing the algorithm to arbitrary binary word sizes, and
therefore longer number sequences, is not as easy as you might think.
Finding the "tap" positions where bits are extracted for feedback
varies according to the word-size in an extremely non-obvious way.
Choosing incorrect tap positions results in an incomplete and usually
very short cycle, which is unusable. If you want the details and tap
positions for words of one to 100 bits, see E. J. Watson, "Primitive
Polynomials (Mod 2)", Mathematics of Computation, v.16, 1962,
p.368-369.
We need to tap bits 0 and 3 to construct the 31-bit random-order
generated value Generator (which is the one most people would want to
use in practice):
UPDATE Generator31
SET keyval =
keyval/2 + MOD(MOD(keyval, 2) + MOD(keyval/8, 2), 2)*2^30;
Or if you prefer, the algorithm in C:
int Generator31 ()
{static int n = 1;
n = n >> 1 | ((n^n >> 3) & 1) << 30;
return n;
}
Do not forget to add a check digit.|||Hello, JT
A GUID is stored on 16 bytes, not 36 bytes.
Sometimes it's better to generate the IDs by the application to avoid
the round trips (necessary to execute "SELECT SCOPE_IDENTITY()").
Sometimes a "less than perfect" environment (like Microsoft Access
ADP-s) is used to develop the application, that has "SELECT @.@.IDENTITY"
hardcoded (instead of "SELECT SCOPE_IDENTITY()"). Sometimes GUID-s are
needed for replication. Sometimes...
Regarding the performance reasons, did you read the article quoted by
GC in the first post? The author suggests that there is only a 10%
performance impact due to the 4 times increase of size (the bigger
impact in that test scenario is due to the randomness of GUIDs, and
this impact can be minimized with the converting of current date trick,
according to the article).
Razvan

Wednesday, March 21, 2012

Generating primary keys

What's the best way to generate primary keys for inserting a new row in a
table. Currently I'm using code create a datareader which gets the highest
number from the primary key column and adds 1 for the new row insertion.
There is a lot of overhead in this as it requires a separate round trip to
the database each time row/rows are inserted.
I'm sure there is a much better method... Any suggestions
ThxA common method in the SLQ Server world is to define the PK column with the
IDENTITY attribute.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thx|||Then you use the SCOPE_IDENTITY() or @.@.IDENTITY system functions to reteive
the value within the same batch that executes the INSERT statement.
Sincerely,
Anthony Thomas
"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in
message news:%23Mg2rh8HFHA.1172@.TK2MSFTNGP12.phx.gbl...
A common method in the SLQ Server world is to define the PK column with the
IDENTITY attribute.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thx

Generating primary keys

What's the best way to generate primary keys for inserting a new row in a
table. Currently I'm using code create a datareader which gets the highest
number from the primary key column and adds 1 for the new row insertion.
There is a lot of overhead in this as it requires a separate round trip to
the database each time row/rows are inserted.
I'm sure there is a much better method... Any suggestions
Thx
A common method in the SLQ Server world is to define the PK column with the IDENTITY attribute.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thx
|||Then you use the SCOPE_IDENTITY() or @.@.IDENTITY system functions to reteive
the value within the same batch that executes the INSERT statement.
Sincerely,
Anthony Thomas

"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in
message news:%23Mg2rh8HFHA.1172@.TK2MSFTNGP12.phx.gbl...
A common method in the SLQ Server world is to define the PK column with the
IDENTITY attribute.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thx

Generating primary keys

What's the best way to generate primary keys for inserting a new row in a
table. Currently I'm using code create a datareader which gets the highest
number from the primary key column and adds 1 for the new row insertion.
There is a lot of overhead in this as it requires a separate round trip to
the database each time row/rows are inserted.
I'm sure there is a much better method... Any suggestions
ThxA common method in the SLQ Server world is to define the PK column with the IDENTITY attribute.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thx|||Then you use the SCOPE_IDENTITY() or @.@.IDENTITY system functions to reteive
the value within the same batch that executes the INSERT statement.
Sincerely,
Anthony Thomas
"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in
message news:%23Mg2rh8HFHA.1172@.TK2MSFTNGP12.phx.gbl...
A common method in the SLQ Server world is to define the PK column with the
IDENTITY attribute.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
http://www.sqlug.se/
"jwk" <jwk@.discussions.microsoft.com> wrote in message
news:036BF33E-EC48-47B1-8B98-F78ECF70C6A6@.microsoft.com...
> What's the best way to generate primary keys for inserting a new row in a
> table. Currently I'm using code create a datareader which gets the highest
> number from the primary key column and adds 1 for the new row insertion.
> There is a lot of overhead in this as it requires a separate round trip to
> the database each time row/rows are inserted.
> I'm sure there is a much better method... Any suggestions
> Thxsql

Monday, March 12, 2012

GENERATE(): Is there an alternative?

For the following fact table (3 keys, 2 measures)

Assessment_Date_PK

Machine_PK

Patch_PK

PatchApplied (tinyint 0 or 1)

PatchNotApplied (tinyint 0 or 1)

the following Named Set has been created in the cube script (SSAS 2005) utilizing the GENERATE() function:

CREATE HIDDEN SET CURRENTCUBE.[Last Patched]

AS GENERATE(

{NONEMPTY(

([Machine].[Machine Name].[Machine Name],

[Patch].[Patch Name].[Patch Name])

)},

CROSSJOIN(

{

([Machine].[Machine Name].CurrentMember,

[Patch].[Patch Name].CurrentMember)

},

{(NONEMPTY([Date].[Date].Members)).Item(COUNT(NONEMPTY([Date].[Date].Members))-1).Item(0)}

)

);

Introduction of this MDX has adversely impacted performance both 1) when the cube is processed and 2) when the cube is browsed (ie in XL07). Are there any alternatives to GENERATE that we might be able to try?

The purpose of the Named Set [Last Patched] is to provide, for each Machine/Patch combination, the last time it was assessed. With this Named Set, other calculations can be applied to answer questions regarding the last know patch status of the machines being assessed.

Thanks

+ Ron

Hi Ron,

Looks like you could try the approach discussed here in Chris Webb's blog:

Optimising GENERATE() type operations

...

Take the following Adventure Works query, which is an approximation of the one in the thread:

WITH SET MYROWS AS

GENERATE(

NONEMPTY([Customer].[Customer Geography].[Full Name].MEMBERS, [Measures].[Internet Sales Amount])

,TAIL(

NONEMPTY([Customer].[Customer Geography].CURRENTMEMBER * [Date].[Date].[Date].MEMBERS, [Measures].[Internet Sales Amount])

,1)

)

SELECT [Measures].[Internet Sales Amount] ON 0,

MYROWS ON 1

FROM

[Adventure Works]

What we're doing here is finding the last date that each customer bought something. Using the TAIL function within a GENERATE might be the obvious thing to do here, but in fact it isn't the most efficient way of solving the problem: on my machine, with a warm cache, it runs in 16 seconds whereas the query below which does the same thing only takes 6 seconds:

WITH SET MYROWS AS

FILTER(

NONEMPTY(

[Customer].[Customer Geography].[Full Name].MEMBERS

* [Date].[Date].[Date].MEMBERS

, [Measures].[Internet Sales Amount])

AS MYSET,

NOT(MYSET.CURRENT.ITEM(0) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(0))

)

SELECT [Measures].[Internet Sales Amount] ON 0,

MYROWS ON 1

FROM

[Adventure Works]

What I'm doing differently here is rather than iterating through each Customer finding the set of dates when each Customer bought something and then finding the last one, I'm saying give me a set of tuples containing all Customers and the Dates they bought stuff on and then using a FILTER to go through and find the last Date for each Customer by checking to see if the Customer mentioned in the current tuple is the same as the Customer in the next tuple in the set - if it isn't, then we've got the last Date a Customer bought something. Obviously operations like this within a GENERATE are something to be avoided if you can.

|||

I have the following MDX using the above example as a template

WITH SET MYROWS AS

FILTER(

NONEMPTY(

([Machine].[Machine Name].[Machine Name],

[Patch].[Patch Name].[Patch Name])

* [Date].[Date].MEMBERS

) AS MYSET,

NOT(MYSET.CURRENT.ITEM(0) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(0))

)

SELECT

[Measures].[Patch Applied] ON 0,

MYROWS ON 1

FROM [MyCube]

Notice the tuple (Machine Name, Patch Name)

The results of the above query are quite different from the similar query below where the tuple order is reversed.

WITH SET MYROWS AS

FILTER(

NONEMPTY(

([Patch].[Patch Name].[Patch Name],

[Machine].[Machine Name].[Machine Name])

* [Date].[Date].MEMBERS

) AS MYSET,

NOT(MYSET.CURRENT.ITEM(0) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(0))

)

SELECT

[Measures].[Patch Applied] ON 0,

MYROWS ON 1

FROM [MyCube]

Unfortunately, neither result from the above queries is desired.

Anyone have a suggestion on how to modify the FILTER's search condition to work with a tuple?

|||

WITH SET MYROWS AS

FILTER( NONEMPTY([Machine].[Machine Name].MEMBERS * [Date].[Date].MEMBERS * [Patch].[Patch Name].MEMBERS, [Measures].[Patch Applied]) AS MYSET,

NOT(MYSET.CURRENT.ITEM(0) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(0))

)

SELECT MYROWS ON 0

FROM [MyCube]

Try the above it should list the last patch applied per Machine. Just with the Machine and Patch if you and the patch per Machine, which is not the same thing J.

I run this same sort of quest on my cubes and verified the result in my DW.I would recommend you do the same.

|||

CaveM,

Thanks for the reply but this isn't what I'm looking for.

As you mentioned, the MDX you provided will list the last patch applied per Machine. What I need answered is when a machine was last assessed and the status for each patch in order to answer questions such as "Are my machines 100% patched?".

|||

WITH SET MYROWS AS

FILTER( NONEMPTY([Machine].[Machine Name].MEMBERS * [Patch].[Patch Name].MEMBERS * [Date].[Date].MEMBERS, [Measures].[Patch Applied]) AS MYSET,

NOT(MYSET.CURRENT.ITEM(1) IS MYSET.ITEM(RANK(MYSET.CURRENT, MYSET)).ITEM(1))

)

SELECT MYROWS ON 0

FROM [MyCube]

The above should tell you what you have installed per Machine. If not I'll just have to shut up Smile I tested it in my CUBE and verified it to DW, please do the same. I am still learning MDX Smile so don't take my word for it, test it Smile

|||

Thanks Deepak and CaveM,

I still need to test the performance against a production system but with the combination of your answers I have an alternative.

CaveM,

No need to shut up. Keep answering these!

Sunday, February 26, 2012

Generate Script

Hi
I am looking for a script that will generate all the foreign keys, defaults
and check constraints only from all the tables in my database.
Can anyone point me in the right direction?Right click on the tables in EM, All Tasks--> Generate Script
Go to the options tab and select what you want
http://sqlservercode.blogspot.com/
"Jaco" wrote:

> Hi
> I am looking for a script that will generate all the foreign keys, default
s
> and check constraints only from all the tables in my database.
> Can anyone point me in the right direction?|||sorry, should have said I want to do this without EM
"SQL" wrote:
> Right click on the tables in EM, All Tasks--> Generate Script
> Go to the options tab and select what you want
> http://sqlservercode.blogspot.com/
> "Jaco" wrote:
>|||Jaco,
SQL-DMO? Check out the Script method in the SQL BOL.
HTH
Jerry
"Jaco" <Jaco@.discussions.microsoft.com> wrote in message
news:3E4BAF12-3CFC-49E3-95ED-43B2E193FDDF@.microsoft.com...
> sorry, should have said I want to do this without EM
> "SQL" wrote:
>