TSQL: Return row(s) with earliest dates - 【StackMirror】|sql|tsql

Given 2 tables called "table1" and "table1_hist" that structurally resemble this:

TABLE1
id  status  date_this_status
1   open    2008-12-12
2   closed  2009-01-01
3   pending 2009-05-05
4   pending 2009-05-06
5   open    2009-06-01


TABLE1_hist
id  status  date_this_status
2   open    2008-12-24
2   pending 2008-12-26
3   open    2009-04-24
4   open    2009-05-04

With table1 being the current status and table1_hist being a history table of table1, how can I return the rows for each id that has the earliest date. In other words, for each id, I need to know it's earliest status and date.

EXAMPLE:

For id 1 earliest status and date is open and 2008-12-12.
For id 2 earliest status and date is open and 2008-12-24.

I've tried using MIN(datetime), unions, dynamic SQL, etc. I've just reached tsql writers block today and I'm stuck.

Edited to add: Ugh. This is for a SQL2000 database, so Alex Martelli's answer won't work. ROW_NUMBER wasn't introduced until SQL2005.

sql
tsql

2009-06-16 18:28
by GregD

The following code sample is completely self-sufficient, just copy and paste it into a management studio query and hit F5 =)

DECLARE @TABLE1 TABLE
        (
        id                  INT,
        status              VARCHAR(50),
        date_this_status    DATETIME
        )

DECLARE @TABLE1_hist TABLE
        (
        id                  INT,
        status              VARCHAR(50),
        date_this_status    DATETIME
        )

--TABLE1
INSERT  @TABLE1
SELECT  1,  'open',     '2008-12-12'    UNION ALL
SELECT  2,  'closed',   '2009-01-01'    UNION ALL
SELECT  3,  'pending',  '2009-05-05'    UNION ALL
SELECT  4,  'pending',  '2009-05-06'    UNION ALL
SELECT  5,  'open',     '2009-06-01'

--TABLE1_hist
INSERT  @TABLE1_hist
SELECT  2,  'open',     '2008-12-24'    UNION ALL
SELECT  2,  'pending',  '2008-12-26'    UNION ALL
SELECT  3,  'open',     '2009-04-24'    UNION ALL
SELECT  4,  'open',     '2009-05-04'

SELECT      x.id,
            ISNULL(y.[status], x.[status])                  AS [status],
            ISNULL(y.date_this_status, x.date_this_status)  AS date_this_status
FROM        @TABLE1 x
LEFT JOIN   (
            SELECT      a.*
            FROM        @TABLE1_hist a
            INNER JOIN  (
                        SELECT      id,
                                    MIN(date_this_status) AS date_this_status
                        FROM        @TABLE1_hist
                        GROUP BY    id
                        ) b
                    ON  a.id = b.id
                    AND a.date_this_status = b.date_this_status
            ) y
        ON  x.id = y.id

2009-06-16 20:58
by Jon Erickson

let me know if you want some further explanation on parts of this query - Jon Erickson 2009-06-16 21:04

SQL Server 2005 and later support an interesting (relatively recent) aspect of SQL Standards, "ranking/windowing functions", allowing:

WITH AllRows AS (
  SELECT id, status, date_this_status,
    ROW_NUMBER() OVER(PARTITION BY id ORDER BY date_this_status ASC) AS row,
  FROM (SELECT * FROM Table1 UNION SELECT * FROM Table1_hist) Both_tables
)
SELECT id, status, date_this_status
FROM AllRows
WHERE row = 1
ORDER BY id;

where I'm also using the nice (and equally "new") WITH syntax to avoid nesting the sub-query in the main SELECT.

This article shows how one could hack the equivalent of ROW_NUMBER (and also RANK and DENSE_RANK, the other two "new" ranking/windowing functions) in SQL Server 2000 -- but that's not necessarily pretty nor especially well-performing, alas.

2009-06-16 18:37
by Alex Martelli

+1 looks like you read the question better than I did : - Andomar 2009-06-16 18:40

Btw I think in T-Sql you'd need a subquery before you can refer to "row" in the WHERE clause - Andomar 2009-06-16 18:55

Didn't know about this syntax, nice... it does need a sub quer - Scott Nichols 2009-06-16 19:14

Ugh. This query won't work because it's SQL2000. ROW_NUMBER doesn't exist in SQL2000 - GregD 2009-06-16 20:03

Alas, @GregD, sorry, and thanks for editing your question to clarify -- I'll edit the answer to clarify a relatively recent SQL Server is needed. @Scott, the subquery (just a UNION, so pretty fast!) could I think be eliminated via much trickery of outer joins and coalescing, but the result would be slower and harder to read, so I'm not sure what would be gained thereby - Alex Martelli 2009-06-17 01:38

@Scott and @Andomar, sorry, I see what you mean now, and you're right -- sorry @Scott, thought you were referring to the sub query with the UNION!- - Alex Martelli 2009-06-17 01:47

OK, editing the answer to do it right...;- - Alex Martelli 2009-06-17 01:51

Alex: Utilizing this method I was able to solve a common problem for me (i.e. getting the first row in each group in a series of groups). Thanks, + - sindre j 2010-04-07 09:44

@sindre, always glad to help -- indeed, this feature of the SQL standard is quite handy, and, as an added bonus, you can use it in all the major relational DBs (Microsoft's, Oracle's, IBM's, open-source Postgres, etc) except MySql - Alex Martelli 2010-04-07 14:14

SELECT  id,
        status,
        date_this_status
FROM    ( SELECT    *
          FROM      Table1
          UNION
          SELECT    *
          from      TABLE1_hist
        ) a
WHERE   date_this_status = ( SELECT MIN(date_this_status)
                             FROM   ( SELECT    *
                                      FROM      Table1
                                      UNION
                                      SELECT    *
                                      from      TABLE1_hist
                                    ) t
                             WHERE  id = a.id
                           )

This is a bit ugly, but seems to work in MS SQL Server 2005.

2009-06-16 19:55
by breitak67

You can do this with an exclusive self join. Join on the history table, and then another time on all earlier history entries. In the where statement, you specify that there are not allowed to be any earlier entries.

select t1.id,
    isnull(hist.status, t1.status),
    isnull(hist.date_this_status, t1.date_this_status)
from table1 t1
left join (
    select h1.id, h1.status, h1.date_this_status
    from table1_hist h1
    left join table1_hist h2
        on h2.id = h1.id
        and h2.date_this_status < h1.date_this_status
    where h2.date_this_status is null
) hist on hist.id = t1.id

A bit of a mind-binder, but fairly flexible and efficient!

This assumes there are no two history entries with the exact same date. If there are, write the self join like:

left join table1_hist h2
    on h2.id = h1.id
    and (
        h2.date_this_status < h1.date_this_status
        or (h2.date_this_status = h1.date_this_status and h2.id < h1.id)
    )

2009-06-16 18:33
by Andomar

If I understand the OP correctly, a given ID may appear in TABLE1 or TABLE1_HISTORY or both.

In your result set, you want back each distinct ID and the oldest status/date associated with that ID, regardless which table the oldest one happens to be in.

So, look in BOTH tables and return any record where there is no record in either table for it's ID that has a smaller date_this_status.

Try this:

SELECT ID, status, date_this_status FROM table1 ta WHERE
     NOT EXISTS(SELECT null FROM table1 tb WHERE
         tb.id = ta.id
         AND tb.date_this_status < ta.date_this_status)
     AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
         tbh.id = ta.id
         AND tbh.date_this_status < ta.date_this_status)

UNION ALL

SELECT ID, status, date_this_status FROM table1_history tah WHERE
     NOT EXISTS(SELECT null FROM table1 tb WHERE
         tb.id = tah.id
         AND tb.date_this_status < tah.date_this_status)
     AND NOT EXISTS(SELECT null FROM table1_history tbh WHERE
         tbh.id = tah.id
         AND tbh.date_this_status < tah.date_this_status)

Three underlying assumptions here:

Every ID you want back will have at least one record in at least one of the tables.
There won't be multiple records for the same ID in the same table with the same date_this_status value (can be mitigated by using DISTINCT)
There won't be records for the same ID in the other table with the same date_this_status value (can be mitigated by using UNION instead of UNION ALL)

There are two slight optimizations we can make:

If an ID has a record in TABLE1_HISTORY, it will always be older than the record in TABLE1 for that ID.
TABLE1 will never contain multiple records for the same ID (but the history table may).

So:

SELECT ID, status, date_this_status FROM table1 ta WHERE
     NOT EXISTS(SELECT null FROM table1_history tbh WHERE
         tbh.id = ta.id
         )

UNION ALL

SELECT ID, status, date_this_status FROM table1_history tah WHERE
     NOT EXISTS(SELECT null FROM table1_history tbh WHERE
         tbh.id = tah.id
         AND tbh.date_this_status < tah.date_this_status)

2009-06-16 20:43
by richardtallent

If that is the actual structure of your tables, you can't get a 100% accurate answer, the issue being that you can have 2 different statuses for the same (earliest) date for any given record and you would not know which one was entered first, because you don't have a primary key on the history table

2009-06-16 20:57
by BlackTigerX

Sorry. There is a primary key on the history table. I just didn't include it in my original sample - GregD 2009-06-16 21:14

Ignoring the "two tables" issues for a moment, I'd use the following logic...

SELECT
   id, status, date
FROM
   Table1_hist AS [data]
WHERE
   [data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)

(EDIT: As per BlackTigerX's comment, this assumes no id can have more than one status with the same datetime.)

The simple way to extrapolate this to two tables is to use breitak67's answer. Replace all instances of "my_table" with subqueries that UNION the two tables together. A potential issue here is that of performance, as you may find that indexes become unusable.

One method of speeding this up could be to use implied knowledge:
1. The main table always has a record for each id.
2. The history table doesn't always have a record.
3. Any record in the history table is always 'older' than the one in main table.

SELECT
   [main].id,
   ISNULL([hist].status, [main].status),
   ISNULL([hist].date, [main].date)
FROM
   Table1          AS [main]
LEFT JOIN
(
   SELECT
      id, status, date
   FROM
      Table1_hist AS [data]
   WHERE
      [data].date = (SELECT MIN(date) FROM Table1_hist WHERE id = [data].id)
)
   AS [hist]
      ON [hist].id = [main].id

Find the oldest status for each id in the history table. (Can use its indexes)
LEFT JOIN that to the main table (which always has exactly one record for each id)
If [hist] contains a value, it's the older by definition
If the [hist] doesn't have a value, use the [main] value

2009-06-16 21:12
by MatBailie