I am trying to Percent encode å to %C3%A5 based on RFC 5849 OAuth 1.0
http://tools.ietf.org/rfc/rfc5849.txt
This can be seen in the GoCardless Ruby spec https://github.com/gocardless/gocardless-ruby/blob/master/spec/utils_spec.rb
it "encodes non-ascii alpha characters" do
subject["å"].should == "%C3%A5"
end
My C# code looks like this:
private const string UnreservedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
public static string PercentEncode(string value)
{
var input = new StringBuilder();
foreach (char symbol in value)
{
if (UnreservedChars.IndexOf(symbol) != -1)
{
input.Append(symbol);
}
else
{
input.Append('%' + String.Format("{0:X2}", (int)symbol));
}
}
return input.ToString();
}
These tests are failing:
[Test]
public void It_encodes_non_ascii_alpha_characters()
{
Util.PercentEncode("å").ShouldBe("%C3%A5");
}
Expected string length 6 but was 3. Strings differ at index 1.
Expected: "%C3%A5"
But was: "%E5"
------------^
These tests are failing:
[Test]
public void It_encodes_other_non_ascii_characters()
{
Util.PercentEncode("支払い").ShouldBe("%E6%94%AF%E6%89%95%E3%81%84");
}
Expected string length 27 but was 15. Strings differ at index 1.
Expected: "%E6%94%AF%E6%89%95%E3%81%84"
But was: "%652F%6255%3044"
------------^
And BTW I do have passing tests for these:
[Test]
public void It_encodes_reserved_ascii_characters()
{
Util.PercentEncode(" !\"#$%&'()").ShouldBe("%20%21%22%23%24%25%26%27%28%29");
Util.PercentEncode("*+,/{|}:;").ShouldBe("%2A%2B%2C%2F%7B%7C%7D%3A%3B");
Util.PercentEncode("<=>?@[\\]^`").ShouldBe("%3C%3D%3E%3F%40%5B%5C%5D%5E%60");
}
EDIT for anyone wanting to do this here is the working C# code:
public class Util
{
private const string UnreservedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~";
public static string PercentEncode(string value)
{
var input = new StringBuilder();
foreach (char symbol in value)
{
if (UnreservedChars.IndexOf(symbol) != -1)
{
input.Append(symbol);
}
else
{
byte[] bytes = Encoding.UTF8.GetBytes(symbol.ToString());
foreach (byte b in bytes)
{
input.AppendFormat("%{0:X2}", b);
}
}
}
return input.ToString();
}
}
The problem is you're not taking this part into consideration:
- Text values are first encoded as UTF-8 octets per [RFC3629] if they are not already. This does not include binary values that are not intended for human consumption.
So you should actually use:
byte[] bytes = Encoding.UTF8.GetBytes(symbol.ToString());
foreach (byte b in bytes)
{
input.AppendFormat("%{0:x2}", b);
}