Avg of float inconsistency

The select returns right at 23,000 rows
The except will return between 60 to 200 rows (and not the same rows)
The except should return 0 as it is select a except select a

PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]

[tf] is a float and and I get float is not exact
But I naively thought avg(float) would be repeatable
Avg(float) does appear to be repeatable

What is the solution?
TF is between 0 and 1 and I only need like 5 significant digits
I just need avg(TF) to be the same number run to run
Decimal(9,8) gives me enough precision and if I cast to decimal(9,8) the except properly returns 0
I can change [TF] to decimal(9,8) but it will be bit of work and lot of regression testing as some of the test that use [tf] take over a day to run
Is change [TF] to decimal(9,8) the best solution?

  SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
       , avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
    FROM [docSVenum1] 
    JOIN [docFieldLock] 
           ON [docFieldLock].[sID] = [docSVenum1].[sID] 
          AND [docFieldLock].[fieldID] = [docSVenum1].[enumID] 
          AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */ 
    JOIN [FTSindexWordOnce] 
           ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]

except 

  SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
       , avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
    FROM [docSVenum1] 
    JOIN [docFieldLock] 
           ON [docFieldLock].[sID] = [docSVenum1].[sID] 
          AND [docFieldLock].[fieldID] = [docSVenum1].[enumID] 
          AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */ 
    JOIN [FTSindexWordOnce] 
           ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID] 

order by [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]

In this case tf is term frequency of tf-idf
tf normalization is subjective and does not require much precision
Avg(tf) needs to be consistent from select to select or the results are not consistent
In a single select with joins I need a consistent avg(tf)
Going with decimal and a low precision for tf got consistent results

DECLARE @fl FLOAT = 100000000000000000000 DECLARE @i SMALLINT = 0 WHILE (@i < 100) BEGIN SET @fl = @fl + CONVERT(float, 5000) SET @i = @i + 1 END SET @fl = @fl - 100000000000000000000 SELECT CONVERT(NVARCHAR(40), @fl, 2) -- 0.000000000000000e+000 DECLARE @fl FLOAT = 0 DECLARE @i SMALLINT = 0 WHILE (@i < 100) BEGIN SET @fl = @fl + CONVERT(float, 5000) SET @i = @i + 1 END SET @fl = @fl + 100000000000000000000 SET @fl = @fl - 100000000000000000000 SELECT @fl -- 507904

Recommended topics

Hot tags