Join SQL Server tables on a like statement

Go To StackoverFlow.com

12

I am hoping this isn't a repeat. I've checked the searches and I can't seem to find a clear answer to this.

I have a table that has it's primary key set to be a UniqueIdentifier. I also have another table that has a varchar column that basically contains a url with a query string that contains guids from my first table.

So my 2 tables are like:

StateTable

StateID                                  StateName
EB06F84C-15B9-4397-98AD-4A63DA2A238E     Active

URLTable

URL
page.aspx?id=EB06F84C-15B9-4397-98AD-4A63DA2A238E

What I'm trying to do is join together URLTable and StateTable ON the value of StateID is contained in URL of URL table. I haven't really figured out the join. I've even tried just selecting the one table and tried to filter by the values in StateTable. I've tried doing something like this:

SELECT  *
FROM URLTable
WHERE     EXISTS
    (SELECT  *
     FROM  StateTable
     WHERE URL LIKE '%' + StateID + '%')

Even that doesn't work because it says I'm comparing uniqueidentifier and varchar.

Is there any way to join 2 tables using a like command and where the like command isn't comparing 2 incompatible variables?

Thank you!!

UPDATE: Let me add some additional things I should have mentioned. The query is for the purposes of building analytics reports. The tables are part of a CMS analytics package... so updating or changing the table structure is not an option.

Secondly, these tables see a very high amount of traffic since they're capturing site analytics... so performance is very much an issue. The 3rd thing is that in my example, I said id= but there may be multiple values such as id=guid&user=guid&date=date.

UPDATE 2: One more thing I just realized to my horror is that sometimes the query string has the dashes removed from the GUID.. and sometimes not.. so unless I"m mistaken, I can't cast the substring to Uniqueidentifier. Can anyone confirm? sigh. I did get it to work using

REPLACE('-','',CONVERT(varchar(50), a.AutomationStateId))

but now I'm very much worried about performance issues with this since the URL's table is very large. This might be the nature of the beast, though, unless there's anything I can do.

2012-04-04 21:31
by divamatrix


17

Cast StateID to a compatible type, e.g.

WHERE URL LIKE '%' + CONVERT(varchar(50), StateID) + '%'

or

WHERE URL LIKE N'%' + CONVERT(nvarchar(50), StateID) + N'%'

if URL is nvarchar(...)

EDIT

As pointed out in another answer, this could result in poor performance on large tables. The LIKE combined with a CONVERT will result in a table scan. This may not be a problem for small tables, but you should consider splitting the URL into two columns if performance becomes a problem. One column would contain 'page.aspx?id=' and the other the UNIQUEIDENTIFIER. Your query could then be optimized much more easily.

2012-04-04 21:35
by Phil


5

Do you know that the = is always there and always is a UNIQUEIDENTIFIER. Then you can do this:

WHERE CAST(SUBSTRING(URL, CHARINDEX('=',URL)+1,LEN(URL)) AS UNIQUEIDENTIFIER)=StateID

EDIT

As part of the comment you can also so it with a JOIN. Like this:

select 
   u.* 
from 
   urltable
join statetable s 
   on CAST(SUBSTRING(URL, CHARINDEX('=',URL)+1,LEN(URL)) AS UNIQUEIDENTIFIER)=StateID
2012-04-04 21:42
by Arion
+1 and I think you could use that syntax in a joi - paparazzo 2012-04-04 21:55
Updated the answe - Arion 2012-04-04 21:57
Would that work if I were to use something where knowing that I'm looking for will always follow the string "ID=" and it will always be the following 36 characters? Maybe something like CAST(SUBSTRING(URL, CHARINDEX('ID=',URL)+1,36) AS UNIQUEIDENTIFIER)=StateID and what is performance like on that - divamatrix 2012-04-04 22:37
I think that doing alot of string operation is not god for the performance. That all depends on how much data you have in your tables. You have some suggestions in all the answers. Run them and compare the query plan. But I still think like Phil that create a new column will be desirable in this case that contains the unique i - Arion 2012-04-04 22:55
LEN(URL) is not a lot of overhead compared to 36. If there is not 36 then the query will fail - paparazzo 2012-04-04 23:09
Thanks for your input Arion. This is one of those things where a client has specific requirements for analytics reports and I'm stuck using the table structure as it stands. Changing or updating it would break their ability to update the CMS. Thanks for your ideas though. All the answers here have definitely provided me with answers to a other issues I'm running in to as well. : - divamatrix 2012-04-04 23:10
No problem. Glad to help :- - Arion 2012-04-05 08:45
Would they object to an extra table and a trigger. If you could have a ReportURL table with the guid extracted and an index of this task would run like hot snot - Tony Hopkinson 2012-04-05 11:31


4

select u.* from urltable
join statetable s on url like N'%' + (convert(varchar(50),s.stateid) + N'%'

performance is likely to be awful

2012-04-04 21:43
by Tony Hopkinson
Thanks for the suggestion on the extra table. Actually, I think that is a great idea and would definitely work without interfering with the existing structure. Now.. I'll be honest and say I'm not a DBA. Any advice on the best way to go about building and maintaining that table? Knowing that the database sees so much traffic it's often hosted on it's own DB server, I'd probably look at adding a table to the website db rather than the analytics db - divamatrix 2012-04-05 20:34
Better if teh new table is in the same db or at least the same server, bit more work if not. Big question for the table is schema is can you get more than one url record for the same guid - Tony Hopkinson 2012-04-05 20:51
Actually, a little more investigation tells me that I'll definitely want to have the table in the same db. So that's assumed. To answer your other question, absolutely. the state table, for example, is a table that holds ids for specific types of visitors. The URL table is a complex table that basically looks at the url and session id per request and unless it finds a record that matches both the session and the url exactly, it will create a new record. I'm looking for unique base urls visited for each user type id by looking at the url field querystrings that contain their i - divamatrix 2012-04-05 22:50
As long as you can simply derive all the data for this new table from the URL record being inserted/updated, you are laughing the way to success. They main thing to bear in mind with triggers is don't go mad with them - Tony Hopkinson 2012-04-06 12:44


0

You may get a performance improvement if you build a temp table first, with the option to index the temp table. You could then also modify the schema (of your temp table) which could give you options on your join. Often when joining to BIG tables it helps to extract a subset of data to a temp table first, then join to it. Other times the overhead of the temp table is bigger than using an 'ugly' join

2014-04-08 06:30
by Adriaan Davel
Ads