Transform two strings in such a way that the distance between the input strings is 'reflected' in the distance between the output strings? - 【StackMirror】|algorithm|hash|string-matching

I have a list of user identifiers which are pretty long. The identifiers may not be exactly identical each time they come with HTTP request therefore I use fuzzy string comparison to authenticate the user. For that very reason, I couldn't hash the identifier because my fuzzy string comparison algorithm won't work with the hashed values since even slightly different plain texts yield completely different values when hashed. Now is there some algorithm algx such that distance(s1,s1') is in some way proportional to distance (algx(s1),algx(s1'))? Or is there any other way to go about the problem? Note: distance in this sense means the amount of editing needed to transform one text into another one.

2012-04-05 14:39
by NoName

Wait, this is for user authentication? Why would you ever not check for (case-insensitive) exactness - Jonathon Reinhart 2012-04-05 14:41

The identifier may change a bit because it is generated from browser configuration which might change. This is not for basic authentication but as a security reinforcement besides basic authentication - NoName 2012-04-05 14:44

Does it have to be the 'distance' between strings? Transforming so that the 'distance' between their binary representation might be easier - Widor 2012-04-05 14:49

Widor - not necessaril - NoName 2012-04-05 14:59

Sounds like you are looking for locality-sensitive hashing.

2012-04-05 14:46
by Jouni K. Seppänen

See http://stackoverflow.com/questions/5769949/locality-sensitive-hash-implementatio - Kaganar 2012-04-05 14:53

Hi Jouni K. Seppänen - is there any java implementation of it that you suggest - NoName 2012-04-06 08:51

You could use something like Levenshtein distance which measures the difference between 2 strings. There's also a PHP function of the same name.

2012-04-05 14:45
by Dan

One solution is to keep count of each alphabet and comparing the count arrays. A bad match between the counts means the strings are definitely not similar.

2012-04-05 17:21
by ElKamina