SQL self join pairwise
Asked Answered
M

6

8

Suppose I have a table consisting of entries like

ID    Arrival Date    Arrival City    Departure Date    Departure City
1     Jun 27 2015     Berlin          Jun 20 2015       Paris
1     Jul 1 2015      Rome            Jun 29 2015       Berlin
1     Jul 30 2015     Vienna          Jul 15 2015       Rome
2     Jun 28 2015     Prague          Jun 23 2015       Vienna
2     Jul 1 2015      Rome            Jun 29 2015       Prague
2     Jul 30 2015     Vienna          Jul 15 2015       Moscow
...

and for each ID I want to join this data on itself such that observations with subsequent Departure Date and Arrival Date are grouped pairwise - i.e. a departure is paired with the previous arrival for each ID.

In the example above (where the observations are sorted for convenience) the 2nd row would be appended to the 1st, the 3rd to the 2nd, the 5th to the 4th and the 6th to the 5th (thus producing 4 rows with fields ID Arrival Date Arrival City Departure Date Departure City Arrival Date2 Arrival City2 Departure Date2 Departure City2).

There could potentially be more than three departures for each ID so a general approach is required. Also please note that there can be holes in the data where Arrival City and Departure City does not match - e.g. the Arrival City of the 5th row is not the Departure City of the 6th row but they should still be merged. In fact a major goal is to get a better view of how many holes there are in the data.

Mores answered 15/1, 2016 at 16:45 Comment(4)
could you please show what your expected result should look like?Taut
"There could potentially be more than three departures for each ID so a general approach is required" any particular query produces result sets with a fixed "shape" - the number, names and types of the columns. But it seems like you're asking for a query that produces a different number of columns depending on the input data (and also, what happens when different IDs have different numbers of departures?). It may be better to do this kind of output processing at a higher level than SQL.Ariella
What is the version of SQL Server? Please add the corresponding tag to the question instead of join. Also, based on your sample data, please show how the final result should look like.Nab
If you're using sql-server-2012 or sql-server-2014, give a look at LEAD() analytics function. This is exactly what you're looking for.Promontory
I
8

A solution is to use a CTE and consider that the difference between two consecutive rows (identified by the rowno) is 1 all the time (and also consider the dates):

;WITH CTE AS (
SELECT
    rownum = ROW_NUMBER() OVER (ORDER BY t.ID, t.arrivalDate),
    t.ID,
    t.arrivalDate, 
    t.arrivalCity, 
    t.departureDate, 
    t.departureCity
FROM #test t
)
SELECT *
FROM CTE c1
JOIN CTE c2
ON c1.ID = c2.ID 
    AND c2.departureDate > c1.arrivalDate
    AND c2.rownum - c1.rownum = 1
GO

-- structure of the #test table
CREATE TABLE #test (
    ID int,
    arrivalDate date,
    arrivalCity varchar(30),
    departureDate date,
    departureCity varchar(30)
)

SQL fiddle here: SQLFiddle

Imaginary answered 18/1, 2016 at 9:31 Comment(1)
And adding WHERE c1.arrivalCity <> c2.departureCity definitely helps *to get a better view of how many holes there are in the data *:-)Equitation
M
5

Try this:

SELECT a.id
    ,a.arrival_date
    ,a.arrival_city
    ,a.departure_date
    ,a.departure_city
    ,b.arrival_date arrival_date_2
    ,b.arrival_city arrival_city_2
    ,b.departure_date departure_date_2
    ,b.departure_city departure_city_2
FROM triptable a
JOIN triptable b ON a.id = b.id
    AND a.departure_date = (SELECT min(departure_date) FROM so34815894 x WHERE x.departure_date > b.arrival_date AND x.id = b.id)

Edited based on your comment to:

  • find the record with the earliest departure date after the previous record's arrival date, and
  • ignore the fact that the sample data's 6th record has a different departure city than the 5th record's arrival city.
Marin answered 15/1, 2016 at 19:44 Comment(1)
Thank you for your answer. In fact I do want the 6th record appended to the 5th - the mistake here was deliberate (as there can be holes in the data). I will update the OP with this clarificationMores
L
4

Not entirely sure what result set your looking for.. but I thought I'd give this a shot and see if any of these help you out.

drop table #t1 
create table #t1 (id int, ArrivalDate datetime, ArrivalCity varchar(50), Departuredate datetime, DepartureCity varchar(50))

insert into #t1 
values (1, 'Jun 27 2015', 'Berlin', 'Jun 20 2015','Paris'), 
       (1, 'Jul 1 2015', 'Rome','Jun 29 2015','Berlin'), 
       (1, 'Jul 30 2015', 'Vienna','Jul 15 2015','Rome'), 
       (2, 'Jun 28 2015','Prague','Jun 23 2015','Vienna'),
       (2, 'Jul 1 2015','Rome','Jun 29 2015','Prague'), 
       (2, 'Jul 30 2015','Vienna','Jul 15 2015','Moscow') 

select *, case when lead(departurecity) over (partition by id order by Arrivaldate) = ArrivalCity or lead(departurecity) over (partition by id order by Arrivaldate) is null then 1 else 0 end as PairID into #t2 from #t1 

update #t2 
set PairID = id 
where pairid != id 
and pairid != 0 

This is the code to start up..

select * from #t2 

will result in:

id  ArrivalDate ArrivalCity Departuredate   DepartureCity   PairID
1   2015-06-27  Berlin      2015-06-20      Paris           1
1   2015-07-01  Rome        2015-06-29      Berlin          1
1   2015-07-30  Vienna      2015-07-15      Rome            1
2   2015-06-28  Prague      2015-06-23      Vienna          2
2   2015-07-01  Rome        2015-06-29      Prague          0
2   2015-07-30  Vienna      2015-07-15      Moscow          2

Any location where the pair id = 0 ... you have a gap/baddata however you want to put it..

You could also:

select *, lead(departurecity) over (partition by ID order by ArrivalDate) as PreviousDepartureCity, lead(Departuredate) over (partition by ID order by ArrivalDate) as PreviousDepartureDate from #t2 

This will add previous departure city and date.. and you can do what you want with the nulls.. they will signify the first flight.. or a gap if the subsequent pair id = 0 ...

The select options become endless.... if null and lag(pairid) = 0 then you have the row with the gap.. if null and pair id = id.. and lag(pairid) = id then you have your first flight..

I mean I can keep going.. and give you more details but I'm not sure this is what your looking for.. Hope it helped anyway..

Good luck!

P.S Didn't see why you needed to join the table to itself .. maybe I missed the whole point..lol..sorry if that's the case..

Lull answered 18/1, 2016 at 22:5 Comment(0)
M
4

It sounds to me like you're wanting to pivot the results and put the results in additional columns. I used ROW_NUMBER() for the ordering. I concatenated the columns in a row prior to the pivot, pivoted, then used a function to reverse the concatenation.

SELECT
    p.ID,
    dbo.SplitString(p.[1], CHAR(13), 1) AS arrivalDate1,
    dbo.SplitString(p.[1], CHAR(13), 2) AS arrivalCity1,
    dbo.SplitString(p.[1], CHAR(13), 3) AS departureDate1,
    dbo.SplitString(p.[1], CHAR(13), 4) AS departureCity1,
    *
FROM
    (
        SELECT *
        FROM
        (
            SELECT
                ID,
                ROW_NUMBER() OVER (PARTITION BY ID ORDER BY arrivalDate) RowNum,
                CAST(arrivalDate AS VARCHAR(MAX)) + CHAR(13) 
                + arrivalCity + CHAR(13)
                + CAST(departureDate AS VARCHAR(MAX)) + CHAR(13)
                + departureCity TripDetails
            FROM trip t
        ) t
        PIVOT (MIN(t.TripDetails) FOR t.RowNum IN ([1], [2], [3], [4], [5] /* , ... */)) p
    ) p;

using this SplitString function

CREATE FUNCTION dbo.SplitString ( 
    @stringToSplit VARCHAR(MAX),
    @delim VARCHAR(255),
    @occurence INT )
RETURNS VARCHAR(MAX) AS
BEGIN

 DECLARE @name NVARCHAR(255);

 DECLARE @pos INT;

 DECLARE @orderNum INT;

 SET @orderNum=0;

 WHILE CHARINDEX(@delim, @stringToSplit) > 0

 BEGIN
    SELECT @orderNum=@orderNum+1;
  SELECT @pos  = CHARINDEX(@delim, @stringToSplit)  ;
  SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1);

  IF @orderNum = @occurence
  BEGIN
    RETURN @name;
  END

  SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
 END

    SELECT @orderNum=@orderNum+1;

  IF @orderNum = @occurence
  BEGIN
    RETURN @stringToSplit;
  END

  RETURN NULL;
END
Mown answered 19/1, 2016 at 18:53 Comment(0)
K
3

This should work:

with cte as(select *, row_number() over(partition by id order by date) rn from table)
select * from cte c1
join cte c2 on c1.id = c2.id and c1.rn = c2.rn - 1
Keniakenilworth answered 19/1, 2016 at 18:9 Comment(0)
S
3

try this,

declare @t table(ID int,ArrivalDate datetime, ArrivalCity varchar(50)
,DepartureDate datetime,DepartureCity varchar(50))
insert into  @t values
(1,     'Jun 27 2015',     'Berlin',          'Jun 20 2015',       'Paris  ')
,(1,     'Jul 1 2015 ',     'Rome  ',          'Jun 29 2015',       'Berlin ')
,(1,     'Jul 30 2015',     'Vienna',          'Jul 15 2015',       'Rome    ')
,(2,     'Jun 28 2015',     'Prague',          'Jun 23 2015',       'Vienna ')
,(2,     'Jul 1 2015 ',     'Rome  ',          'Jun 29 2015',       'Prague ')
,(2  ,   'Jul 30 2015',     'Vienna',          'Jul 15 2015',       'Moscow ')

;WITH CTE
AS (
    SELECT *
        ,ROW_NUMBER() OVER (
            ORDER BY id
                ,arrivaldate
            ) rn
    FROM @t
    )
SELECT A.arrivaldate
    ,a.arrivalcity
    ,a.DepartureDate
    ,a.DepartureCity
    ,b.arrivaldate
    ,b.arrivalcity
    ,b.DepartureDate
    ,b.DepartureCity
FROM CTE A
LEFT JOIN CTE b ON a.rn + 1 = b.rn
Sporogony answered 21/1, 2016 at 3:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.