I have an application to deal with a file and fragment it to multiple segments, then save the result into sql server database. There are many duplicated file (maybe with different file path), so first I go through all these files and compute the Md5 hash for each file, and mark duplicated file by using the [Duplicated] column.
Then everyday, I'll run this application and save the results into the [Result] table. The db schema is as below:
CREATE TABLE [dbo].[FilePath]
(
[FilePath] NVARCHAR(256) NOT NULL PRIMARY KEY,
[FileMd5Hash] binay(16) NOT NULL,
[Duplicated] BIT NOT NULL DEFAULT 0,
[LastRunBuild] NVARCHAR(30) NOT NULL DEFAULT 0
)
CREATE TABLE [dbo].[Result]
(
[Build] NVARCHAR(30) NOT NULL,
[FileMd5Hash] binay(16) NOT NULL ,
[SegmentId] INT NOT NULL,
[SegmentContent] text NOT NULL
PRIMARY KEY ([FileMd5Hash], [Build], [SegmentId])
)
And I have a requirement to join these 2 table on FileMd5Hash.
Since the number of rows of [Result] is very large, I'd like to add an int Identity column to join these to tables as below:
CREATE TABLE [dbo].[FilePath]
(
[FilePath] NVARCHAR(256) NOT NULL PRIMARY KEY,
[FileMd5Hash] binay(16) NOT NULL,
**[Id] INT NOT NULL IDENTITY,**
[Duplicated] BIT NOT NULL DEFAULT 0,
[LastRunBuild] NVARCHAR(30) NOT NULL DEFAULT 0
)
CREATE TABLE [dbo].[Result]
(
[Build] NVARCHAR(30) NOT NULL,
**[Id] INT NOT NULL,**
[SegmentId] INT NOT NULL,
[SegmentContent] text NOT NULL
PRIMARY KEY ([FileMd5Hash], [Build], [SegmentId])
)
So What's the Pros and cons of these 2 ways?
int
id is better, as it will be indexed more efficiently – Bookman