According to the BNF for Zinc, it looks like a Str and Uri require escaping $ and `, but from a quick look at the java reference implementation, it does not appear that this is enforced/checked. The test suite does not appear to reflect this requirement either.
Can you clarify the requirements for character escaping in Str and Uri?
Thanks, matthew
Brian FrankSat 6 Oct 2012
I copied that from Fantom. I think we should do the same thing in Java implementation too, so I will take a look. Thanks for pointing that out.
Brian FrankMon 15 Oct 2012
Actually what I was doing in the Java code was just disallowing any character that required escaping in HUri. But that causes some round trip problems for values you might read from a server. This is what I propose as the new BNF for Str/Uri:
The Str escapes are pretty much the standard C-lang escapes plus "$". The URI escapes are the special characters that have meaning in the URI structure that you might want to use without their special behavior. For example a file name "File#2.txt" is something you run across, but you don't want the "#" to be interpreted as the fragment identifier. We allow \uxxxx escapes in both Str or Uri, but also just any unicode char above 0x20.
Matthew GianniniMon 15 Oct 2012
Why not use RFC 2396 octal escaping for URIs (similar to java.net.URI)?
file#1.txt would encode to file%231.txt
Using the suggested escaping means that decoded haystack URIs might not be usable without another level of encoding. For example,
http://www.haystack.com/file#1.txt
would need to be encoded in haystack as
http://www.haystack.com/file\#1.txt.
Decoding would yield the same result. But that representation is not actually resolvable if you paste it into a web browser. You'd have to do another round of encoding to change all haystack uri escape sequences into octal.
http://www.haystack.com/file%231.txt
So why not just use octal in the first place?
Brian FrankMon 15 Oct 2012
I debated that too originally. But the reason I personally prefer backslash escapes:
common chars in filenames like space don't need to be escaped
unicode chars (also commonly found in file systems) don't need to be escaped (which I've found to especially buggy in % encoding libraries)
seems better to be consistent with String and C-like languages that already use backslash escapes
Matthew Giannini Fri 5 Oct 2012
According to the BNF for Zinc, it looks like a Str and Uri require escaping
$
and`
, but from a quick look at the java reference implementation, it does not appear that this is enforced/checked. The test suite does not appear to reflect this requirement either.Can you clarify the requirements for character escaping in Str and Uri?
Thanks, matthew
Brian Frank Sat 6 Oct 2012
I copied that from Fantom. I think we should do the same thing in Java implementation too, so I will take a look. Thanks for pointing that out.
Brian Frank Mon 15 Oct 2012
Actually what I was doing in the Java code was just disallowing any character that required escaping in HUri. But that causes some round trip problems for values you might read from a server. This is what I propose as the new BNF for Str/Uri:
The Str escapes are pretty much the standard C-lang escapes plus "$". The URI escapes are the special characters that have meaning in the URI structure that you might want to use without their special behavior. For example a file name "File#2.txt" is something you run across, but you don't want the "#" to be interpreted as the fragment identifier. We allow
\uxxxx
escapes in both Str or Uri, but also just any unicode char above 0x20.Matthew Giannini Mon 15 Oct 2012
Why not use RFC 2396 octal escaping for URIs (similar to java.net.URI)?
file#1.txt
would encode tofile%231.txt
Using the suggested escaping means that decoded haystack URIs might not be usable without another level of encoding. For example,
http://www.haystack.com/file#1.txt
would need to be encoded in haystack as
http://www.haystack.com/file\#1.txt
.Decoding would yield the same result. But that representation is not actually resolvable if you paste it into a web browser. You'd have to do another round of encoding to change all haystack uri escape sequences into octal.
http://www.haystack.com/file%231.txt
So why not just use octal in the first place?
Brian Frank Mon 15 Oct 2012
I debated that too originally. But the reason I personally prefer backslash escapes: