In our infinite wisdom, we decided our rows would be keyed with a tab in the middle:
item_id <tab> location
For example:
000001 http://www.url.com/page
Using Hbase Shell, we cannot perform a get command because the tab character doesn't get written properly in the input line. We tried
get 'tableName', '000001\thttp://www.url.com/page'
without success. What should we do?
I had the same issue for binary values: \x00. This was my separator.
For the shell to accept your binary values, you need to provide them in double quote (") instead of single quote (').
put 'MyTable', "MyKey", 'Family:Qualifier', "\x00\x00\x00\x00\x00\x00\x00\x06Hello from shell"
Check how your tab is being encoded, my best bet would be that it is UTF8 encoded so from the ASCII table, this would be "000001\x09http://www.url.com/page".
On a side note, you should use null character for your separator, it will help you in scan.
Hope you can change the tab character. :) Yeah that's a bad idea since Map Reduce jobs use the tab as a delimiter, and its generally a bad idea to use a tab or space as a delimiter.
You could use a double colon (::) as a delimiter. But wait, what if the URL has a double-colon in the URL? Well, urlencode the URL when you store it to HBase - that way, you have a standard delimiter, and the URL part of the key will not conflict with the delimiter.
In Python:
import urllib
DELIMITER = "::"
urlkey = urllib.quote_plus(location)
rowkey = item_id + DELIMITER + urlkey