Have the following sample of data in an Oracle table called MY_ATTRIBUTES:
Date Emp_No Attribute
-----------------------------------------------------
01/04/2012 1234567 APPLE
01/04/2012 1234567 ORANGE
01/04/2012 1234567 PINEAPPLE
01/04/2012 1234567 BANANA
01/04/2012 8888888 APPLE
01/04/2012 8888888 ORANGE
01/04/2012 2222222 APPLE
01/04/2012 2222222 ORANGE
01/04/2012 2222222 PINEAPPLE
Based on the above sample data, I need to return only the distinct records using only Date and Emp_No where these records contains data that have both the Attributes of ‘APPLE’ and ‘PINEAPPLE’
So based on this criteria, I would expect the following two result records only, i.e.:
01/04/2012 1234567
01/04/2012 2222222
Hopefully someone can assist with an Oracle SQL query that will return this result set based on the criteria described.
Sorry, should have also pointed out that I will need this SQL query to be used as a sub-query off the main SELECT.
Thanks.
try
SELECT DISTINCT A.Date, A.Emp_no
FROM (SELECT * FROM MY_ATTRIBUTES WHERE ATTRIBUTE = 'APPLE') A
INNER JOIN (SELECT * FROM MY_ATTRIBUTES WHERE ATTRIBUTE = 'PINEAPPLE' ) B ON B.Date = A.Date AND B.Emp_No = A.Emp_No
There may be a more efficient way, but this should work:
SELECT DISTINCT DATE, EMP_NO
FROM (
SELECT DATE, EMP_NO
FROM MY_ATTRIBUTES
WHERE ATTRIBUTE = 'APPLE'
INTERSECT
SELECT DATE, EMP_NO
FROM MY_ATTRIBUTES
WHERE ATTRIBUTE = 'PINEAPPLE'
);
If there are no duplicates (ex. 2 APPLES on the same date and emp) this might be more efficient as there are no joins:
SELECT DATE, EMP_NO
FROM MY_ATTRIBUTES
WHERE ATTRIBUTE = 'APPLE' OR ATTRIBUTE = 'PINEAPPLE'
GROUP BY DATE, EMP_NO
HAVING COUNT(*) = 2
A variant on a theme:
with cte as
( select date, empno, attribute
from my_attributes
where attribute in ('PINEAPPLE', 'APPLE') )
select *
from ( select date, emp_no
from cte
where attribute = 'PINEAPPLE'
intersect
select date, emp_no
from cte
where attribute = 'APPLE' )
The performance of any of these solutions will depend on the selectivity of ATTRIBUTE. If you only have half a dozen fruits in your bowl you're looking at a full table scain and there are only limited options for tuning those. On the other hand if the column is a greengrocer's full of fruity delights (say 150+ distinct values) then you should get return from an index. But that still depends on distribution and skew: you might not see any benefits from an index if ATTRIBUTE is 90% apples and pineapples: in that case a full table scan is still the better option.
tl;dr
Tuning is hard
You can try the following:
SELECT DATE, EMP_NO
FROM YOUR_TABLE
WHERE ATTRIBUTE = 'APPLE' OR ATTRIBUTE = 'PINEAPPLE'
GROUP BY DATE, EMP_NO
HAVING COUNT(*) = 2;
SELECT DISTINCT DATE, EMP_NO
WHERE ATTRIBUTE IN('APPLE','PINEAPPLE')
SELECT DISTINCT Emp_No, Date
FROM (
SELECT *
FROM MY_ATTRIBUTES m
WHERE m.Attribute IN ('Apple','Pineapple')
)
LIMIT 2
ORDER BY Emp_No