C# Percent encode å to %C3%A5 based on RFC 5849 OAuth 1.0

I am trying to Percent encode å to %C3%A5 based on RFC 5849 OAuth 1.0

http://tools.ietf.org/rfc/rfc5849.txt

This can be seen in the GoCardless Ruby spec https://github.com/gocardless/gocardless-ruby/blob/master/spec/utils_spec.rb

 it "encodes non-ascii alpha characters" do
    subject["å"].should == "%C3%A5"
 end

My C# code looks like this:

    private const string UnreservedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";

    public static string PercentEncode(string value)
    {
        var input = new StringBuilder();
        foreach (char symbol in value)
        {
            if (UnreservedChars.IndexOf(symbol) != -1)
            {
                input.Append(symbol);
            }
            else
            {
                input.Append('%' + String.Format("{0:X2}", (int)symbol));
            }
        }

        return input.ToString();
    }

These tests are failing:

[Test]
public void It_encodes_non_ascii_alpha_characters()
{
    Util.PercentEncode("å").ShouldBe("%C3%A5"); 
}

Expected string length 6 but was 3. Strings differ at index 1.
  Expected: "%C3%A5"
  But was:  "%E5"
  ------------^

These tests are failing:

[Test]
public void It_encodes_other_non_ascii_characters()
{
    Util.PercentEncode("支払い").ShouldBe("%E6%94%AF%E6%89%95%E3%81%84");
}

Expected string length 27 but was 15. Strings differ at index 1.
 Expected: "%E6%94%AF%E6%89%95%E3%81%84"
 But was:  "%652F%6255%3044"
 ------------^

And BTW I do have passing tests for these:

[Test]
public void It_encodes_reserved_ascii_characters()
{
    Util.PercentEncode(" !\"#$%&'()").ShouldBe("%20%21%22%23%24%25%26%27%28%29");
    Util.PercentEncode("*+,/{|}:;").ShouldBe("%2A%2B%2C%2F%7B%7C%7D%3A%3B");
    Util.PercentEncode("<=>?@[\\]^`").ShouldBe("%3C%3D%3E%3F%40%5B%5C%5D%5E%60");
}

EDIT for anyone wanting to do this here is the working C# code:

public class Util
{
    private const string UnreservedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";

    public static string PercentEncode(string value)
    {
        var input = new StringBuilder();
        foreach (char symbol in value)
        {
            if (UnreservedChars.IndexOf(symbol) != -1)
            {
                input.Append(symbol);
            }
            else
            {
                byte[] bytes = Encoding.UTF8.GetBytes(symbol.ToString());
                foreach (byte b in bytes)
                {
                    input.AppendFormat("%{0:X2}", b);
                }
            }
        }

        return input.ToString();
    }
}

2012-04-04 18:37
by superlogical

OK, you've stated a goal and you've written a bunch of code that doesn't meet your goal. What is your question? You forgot to ask a question - Eric Lippert 2012-04-04 18:40

The spec seems to be using URL encoding. Try using <code>HttpUtility.UrlEncode()</code> instead of rolling your own - millimoose 2012-04-04 18:40

@Inerdial OAuth uses a slightly different encoding called 'Percent Encoding' you can read about the differences here http://tools.ietf.org/html/rfc5849#section-3. - superlogical 2012-04-05 08:11

The problem is you're not taking this part into consideration:

Text values are first encoded as UTF-8 octets per [RFC3629] if they are not already. This does not include binary values that are not intended for human consumption.

So you should actually use:

byte[] bytes = Encoding.UTF8.GetBytes(symbol.ToString());
foreach (byte b in bytes)
{
    input.AppendFormat("%{0:x2}", b);
}

2012-04-04 18:41
by Jon Skeet

Thanks for the help Jon, works like a charm - superlogical 2012-04-04 18:53