Given 2 tables called "table1" and "table1_hist" that structurally resemble this:
TABLE1
id status date_this_status
1 open 2008-12-12
2 closed 2009-01-01
3 pending 2009-05-05
4 pending 2009-05-06
5 open 2009-06-01
TABLE1_hist
id status date_this_status
2 open 2008-12-24
2 pending 2008-12-26
3 open 2009-04-24
4 open 2009-05-04
With table1 being the current status and table1_hist being a history table of table1, how can I return the rows for each id that has the earliest date. In other words, for each id, I need to know it's earliest status and date.
EXAMPLE:
For id 1 earliest status and date is open and 2008-12-12.
For id 2 earliest status and date is open and 2008-12-24.
I've tried using MIN(datetime), unions, dynamic SQL, etc. I've just reached tsql writers block today and I'm stuck.
Edited to add: Ugh. This is for a SQL2000 database, so Alex Martelli's answer won't work. ROW_NUMBER wasn't introduced until SQL2005.
The following code sample is completely self-sufficient, just copy and paste it into a management studio query and hit F5 =)
DECLARE @TABLE1 TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
DECLARE @TABLE1_hist TABLE
(
id INT,
status VARCHAR(50),
date_this_status DATETIME
)
--TABLE1
INSERT @TABLE1
SELECT 1, 'open', '2008-12-12' UNION ALL
SELECT 2, 'closed', '2009-01-01' UNION ALL
SELECT 3, 'pending', '2009-05-05' UNION ALL
SELECT 4, 'pending', '2009-05-06' UNION ALL
SELECT 5, 'open', '2009-06-01'
--TABLE1_hist
INSERT @TABLE1_hist
SELECT 2, 'open', '2008-12-24' UNION ALL
SELECT 2, 'pending', '2008-12-26' UNION ALL
SELECT 3, 'open', '2009-04-24' UNION ALL
SELECT 4, 'open', '2009-05-04'
SELECT x.id,
ISNULL(y.[status], x.[status]) AS [status],
ISNULL(y.date_this_status, x.date_this_status) AS date_this_status
FROM @TABLE1 x
LEFT JOIN (
SELECT a.*
FROM @TABLE1_hist a
INNER JOIN (
SELECT id,
MIN(date_this_status) AS date_this_status
FROM @TABLE1_hist
GROUP BY id
) b
ON a.id = b.id
AND a.date_this_status = b.date_this_status
) y
ON x.id = y.id
SQL Server 2005 and later support an interesting (relatively recent) aspect of SQL Standards, "ranking/windowing functions", allowing:
WITH AllRows AS (
SELECT id, status, date_this_status,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY date_this_status ASC) AS row,
FROM (SELECT * FROM Table1 UNION SELECT * FROM Table1_hist) Both_tables
)
SELECT id, status, date_this_status
FROM AllRows
WHERE row = 1
ORDER BY id;
where I'm also using the nice (and equally "new") WITH
syntax to avoid nesting the sub-query in the main SELECT
.
This article shows how one could hack the equivalent of ROW_NUMBER
(and also RANK
and DENSE_RANK
, the other two "new" ranking/windowing functions) in SQL Server 2000 -- but that's not necessarily pretty nor especially well-performing, alas.
SELECT id,
status,
date_this_status
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) a
WHERE date_this_status = ( SELECT MIN(date_this_status)
FROM ( SELECT *
FROM Table1
UNION
SELECT *
from TABLE1_hist
) t
WHERE id = a.id
)
This is a bit ugly, but seems to work in MS SQL Server 2005.
You can do this with an exclusive self join. Join on the history table, and then another time on all earlier history entries. In the where statement, you specify that there are not allowed to be any earlier entries.
select t1.id,
isnull(hist.status, t1.status),
isnull(hist.date_this_status, t1.date_this_status)
from table1 t1
left join (
select h1.id, h1.status, h1.date_this_status
from table1_hist h1
left join table1_hist h2
on h2.id = h1.id
and h2.date_this_status < h1.date_this_status
where h2.date_this_status is null
) hist on hist.id = t1.id
A bit of a mind-binder, but fairly flexible and efficient!
This assumes there are no two history entries with the exact same date. If there are, write the self join like:
left join table1_hist h2
on h2.id = h1.id
and (
h2.date_this_status < h1.date_this_status
or (h2.date_this_status = h1.date_this_status and h2.id < h1.id)
)
If I understand the OP correctly, a given ID may appear in TABLE1 or TABLE1_HISTORY or both.
In your result set, you want back each distinct ID and the oldest status/date associated with that ID, regardless which table the oldest one happens to be in.
So, look in BOTH tables and return any record where there is no record in either table for it's ID that has a smaller date_this_status.
Try this:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = ta.id
AND tb.date_this_status < ta.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
AND tbh.date_this_status < ta.date_this_status)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1 tb WHERE
tb.id = tah.id
AND tb.date_this_status < tah.date_this_status)
AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)
Three underlying assumptions here:
There are two slight optimizations we can make:
So:
SELECT ID, status, date_this_status FROM table1 ta WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = ta.id
)
UNION ALL
SELECT ID, status, date_this_status FROM table1_history tah WHERE
NOT EXISTS(SELECT null FROM table1_history tbh WHERE
tbh.id = tah.id
AND tbh.date_this_status < tah.date_this_status)
If that is the actual structure of your tables, you can't get a 100% accurate answer, the issue being that you can have 2 different statuses for the same (earliest) date for any given record and you would not know which one was entered first, because you don't have a primary key on the history table
Ignoring the "two tables" issues for a moment, I'd use the following logic...
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
(EDIT: As per BlackTigerX's comment, this assumes no id can have more than one status with the same datetime.)
The simple way to extrapolate this to two tables is to use breitak67's answer. Replace all instances of "my_table" with subqueries that UNION the two tables together. A potential issue here is that of performance, as you may find that indexes become unusable.
One method of speeding this up could be to use implied knowledge:
1. The main table always has a record for each id.
2. The history table doesn't always have a record.
3. Any record in the history table is always 'older' than the one in main table.
SELECT
[main].id,
ISNULL([hist].status, [main].status),
ISNULL([hist].date, [main].date)
FROM
Table1 AS [main]
LEFT JOIN
(
SELECT
id, status, date
FROM
Table1_hist AS [data]
WHERE
[data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
)
AS [hist]
ON [hist].id = [main].id