I currently have a query merging two tables to create a new one for analysis. After getting some funny results when trying to chart it for presentation, I learned that some of it is fake data that was never cleaned up. I've been able to identify the data causing the problems and, for the sake of time, would like to exclude it within the query so I can move ahead with the analysis.
This fake data matches ALL these criteria:
- rate_type = Standard
- client_net_cleared = 0
- program is blank (not Null)
I identified these in SELECT with a CASE statement, but realized that to make any use of that I'd have to do another table querying everything in this one minus what's identified as meeting the above criteria based on the CASE statement. There has to be a better solution than that.
I'm currently trying to exclude these as part of the WHERE statement, but read around other question topics and found out WHERE is not very good at managing multiple sub-conditions.
What I have:
SELECT *
, CASE WHEN tad.rate_type = 'Standard'
AND tad.client_net_cleared = '0'
AND program= '' THEN 1
ELSE '0'
END AS noise
FROM tableau.km_tv_ad_data_import tad
JOIN tableau.km_tv_ad_report ga
ON ga.session_timestamp >= tad.timestamp - INTERVAL '4 minute'
AND ga.session_timestamp <= tad.timestamp + INTERVAL '5 minute'
AND ga.session_timestamp != tad.timestamp
WHERE tad.timestamp >= '2016-09-01'
AND (tad.rate_type != 'Standard'
AND tad.client_net_cleared != '0'
AND tad.program != '')
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
Sample data set:
timestamp | rate_type | program | client_net_cleared | noise
---------------------|-----------|-----------------|--------------------|-------
2016-11-01 18:00:00 | Standard | Diving | 50 | 0
2016-12-01 21:00:00 | Holiday | Classic Albums | 100 | 0
2016-11-01 09:00:00 | FireSale | Panorama | 0 | 0
2016-10-01 12:00:00 | Standard | | 0 | 1
2016-12-01 15:00:00 | Holiday | MythBusters | 100 | 0
2016-10-01 13:00:00 | FireSale | House | 200 | 0
What I need:
Exclude rows matching ALL three criteria: rate_type = Standard, client_net_cleared = 0, program is blank (not Null).
select * from ( -- Put your query here ) a where rate_type = Standard and client_net_cleared = 0 and program is not Null
– AppendixSELECT *
. That makes it dependent on the order of the columns in theCREATE TABLE
statement. I have a feeling that he's actually grouping by all the columns, so it should just beSELECT DISTINCT *
. – Flossieflossy