How to insert Huge dummy data to Sql server
Asked Answered
C

4

7

Currently development team is done their application, and as a tester needs to insert 1000000 records into the 20 tables, for performance testing.

I gone through the tables and there is relationship between all the tables actually.

To insert that much dummy data into the tables, I need to understand the application completely in very short span so that I don't have the dummy data also by this time.

In SQL server is there any way to insert this much data insertion possibility.

please share the approaches.

  1. Currently I am planning with the possibilities to create dummy data in excel, but here I am not sure the relationships between the tables.
  2. Found in Google that SQL profiler will provide the order of execution, but waiting for the access to analyze this.
  3. One more thing I found in Google is red-gate tool can be used.

Is there any script or any other solution to perform this tasks in simple way.

I am very sorry if this is a common question, I am working first time in SQL real time scenario. but I have the knowledge on SQL.

Charmion answered 12/2, 2015 at 15:44 Comment(6)
looking for BulkInsert?Minutiae
@yog241: its a bulk insert but data has to genarate automaticallyCharmion
Depending on how much time and knowledge you have I would either create a unit test method which is calling same apis of the application to create new entities and do that in a loop for your 1.000.000 times, or I would use SQL Profiler to capture the inserts generated by the application and then make those inserts abstract from IDs and then put those inserts in a while loop in SQL Server. I have also been searching for such data population tool in the past and I think I did find something online but not sure how it works with complex db schemas having lots of relationships between tables...Junco
Have you look at msdn.microsoft.com/en-us/library/aa833267(v=vs.100).aspx and at msdn.microsoft.com/en-us/library/dd193262(v=vs.100).aspx. Data generation tool is part of Database Projects in Visual Studio 2010Wadsworth
@jesus: seems this tool is looking good, but in vs professional it is not possible.Charmion
tools is really good, but client needs some realistic data..but thanks for the above tool Jesus, great learning for me.Charmion
D
14

Why You don't generate those records in SQL Server. Here is a script to generate table with 1000000 rows:

DECLARE @values TABLE (DataValue int, RandValue INT)

;WITH mycte AS
(
SELECT 1 DataValue
UNION all
SELECT DataValue + 1
FROM    mycte   
WHERE   DataValue + 1 <= 1000000
)
INSERT INTO @values(DataValue,RandValue)
SELECT 
        DataValue,
        convert(int, convert (varbinary(4), NEWID(), 1)) AS RandValue
FROM mycte m 
OPTION (MAXRECURSION 0)


SELECT 
        v.DataValue,
        v.RandValue,
        (SELECT TOP 1 [User_ID] FROM tblUsers ORDER BY NEWID())
FROM    @values v

In table @values You will have some random int value(column RandValue) which can be used to generate values for other columns. Also You have example of getting random foreign key.

Downstate answered 9/3, 2015 at 9:45 Comment(0)
G
0

Below is a simple procedure I wrote to insert millions of dummy records into the table, I know its not the most efficient one but serves the purpose for a million records it takes around 5 minutes. You need to pass the no of records you need to generate while executing the procedure.

IF  EXISTS (SELECT 1 FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[DUMMY_INSERT]') AND type in (N'P', N'PC'))
BEGIN
    DROP PROCEDURE  DUMMY_INSERT
END
GO
CREATE PROCEDURE DUMMY_INSERT (
@noOfRecords INT
)
AS
BEGIN

DECLARE @count int
SET @count = 1;

WHILE (@count < @noOfRecords)
BEGIN
    INSERT INTO [dbo].[LogTable] ([UserId],[UserName],[Priority],[CmdName],[Message],[Success],[StartTime],[EndTime],[RemoteAddress],[TId])
     VALUES(1,'user_'+CAST(@count AS VARCHAR(256)),1,'dummy command','dummy message.',0,convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),'160.200.45.1',1);

     SET @count = @count + 1;
END
END
Gena answered 17/3, 2016 at 9:12 Comment(0)
C
0

you can use the cursor for repeat data:

for example this simple code:

Declare @SYMBOL nchar(255), --sample V
         @SY_ID     int         --sample V
Declare R2 Cursor
    For SELECT  [ColumnsName]
        FROM    [TableName]
    For Read Only;
Open R2
Fetch  Next From R2 INTO @SYMBOL,@SY_ID
    While (@@FETCH_STATUS <>-1 )
        Begin
            Insert INTO [TableName] ([ColumnsName])
                Values              (@SYMBOL,@SY_ID)
            Fetch  Next From R2 INTO @SYMBOL,@SY_ID
        End
Close R2
Deallocate R2
/*wait a ... moment*/
SELECT COUNT(*)                  --check result
        FROM [TableName]
Chat answered 9/4, 2019 at 11:41 Comment(0)
A
0

I doubt the original poster cares anymore, but came up with this while trying to produce bulk dummy data. Hopefully it is useful for someone else.

/* == Seed Data ==*/
if(OBJECT_ID('tempdb..#dummyData') is not null)
    DROP TABLE #dummyData

CREATE TABLE #dummyData ([id] INT IDENTITY PRIMARY KEY, [value] nvarchar(256));
INSERT #dummyData ([value]) VALUES
('Itßaly Terrazzo Sacra Nero 24x24 Honed'),
('Nero Marquina 1x3 Herringbone'),
('Myorka Blue 2x8'),
('Chrysler Driftwood Hickory Sea 12mil Wear Layer Glue Down 6.3x48.4'),
('Nero Marquina 1x3 Herringbone'),
('Broadway Lvt Ash 9x72 Rigid Core ä Click- 6.0mm/28mil Wear Layer'),
('Minetta Concreto Pearl 2.5mm/28mil Wear Layer Glue Down 18x36'),
('Speak Leather Black 24 x 24'),
('Accent Montana White 12 x 36 Glossy'),
('Baroque Crackled 1/2 X 6 Pencil Capri'),
('Banza Green Cielo 8" Hex'),
('Masia $550 Torello Nero Brillo 3/4x12 Pencil'),
('Baroque 1.000.95€ Crackled 1 3/4 X 6 Chair Rail Blanco');

/* == Main Loop ==*/
DECLARE @insertNum int = 1000000
DECLARE @top int = 0

DECLARE @seedCount int = (SELECT COUNT(*) FROM #dummyData)
DECLARE @currentCount int = @seedCount

WHILE(@currentCount < @insertNum)
BEGIN
    SET @top = @insertNum - @currentCount
    IF(@top < @insertNum)
        INSERT INTO #dummyData([value])
        SELECT TOP(@top) [value]
        FROM #dummyData

    SET @currentCount = (SELECT COUNT(*) FROM #dummyData)
END

/* == Update Step ==*/
UPDATE #dummyData
SET value = cast(id as nvarchar) + value
WHERE id > @seedCount

Seed data Set up as needed -- manual or from another source (Note: you must seed at least one row) Using a temp table for demo purposes.

Main Loop Set @insertNum desired number of rows

Update Step Skip if you don't need to transform data -- I could have skipped this by inserting a new id in the loop, however it was slower by ~2 seconds.

This produced 1,000,000 unique values for parse testing. The @seedCount kept the first rows unaltered; a form of control I suppose.

Example result

SELECT * 
FROM #dummyData
WHERE ID between 10 and 15
ORDER BY id
id value
10 Baroque Crackled 1/2 X 6 Pencil Capri
11 Banza Green Cielo 8" Hex
12 Masia $550 Torello Nero Brillo 3/4x12 Pencil
13 Baroque 1.000.95€ Crackled 1 3/4 X 6 Chair Rail Blanco
14 14Itßaly Terrazzo Sacra Nero 24x24 Honed
15 15Nero Marquina 1x3 Herringbone

The optimizer is fairly smart -- adding the id to the value (rows 14+) allowed them to all be unique. That stopped the optimizer from recognizing 13 repeated strings and giving me far faster performance than could be expected from unique data.

Performance Although it is a while loop, it batches into itself so it multiplies data quickly, and it cuts off at the number you set it to. A seed of 1 row hits 1,000,000 at iteration 19.

I set up two dummy data tests, and I ran each through a CTE and this while loop. The while loop ran quicker, with 1,000,000 rows.

From my machine (Client Results: Average 10 runs):

Query CTE While
This Example* 14619 ms 6825 ms
random value generation 11831 ms 6393 ms

*Perhaps I could have joined the seed data to the CTE in a more performant way.

Ayn answered 10/1 at 7:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.