Recently I was involved in troubleshooting a very interesting problem with a ASMX 1.1 web service that I wanted to share. Being a self-proclaimed SOA geek I was shocked to learn that the character type in .NET was not interoperable. That's right folks, if you build a web service and expose a character field you are not conforming to the basic XSD specification and you are subject to 10 lashes from the interop police.
How did I come across this? Well I was helping to troubleshoot a problem with BizTalk server where a developer wanted was building an orchestration that would have to do checks against a simple character type to facilitate decision making in the workflow. In other words the code needed to say "if message.node.innertext = Y ... do this". The problem started to show up when the messages were dumped to a file for tracing. Instead of "Y" as the innertext in the xml node it was "89". Hmmmmm....well that is actually the ascii code equivalent to that character so it's not as wacky as it initially seemed.
The first thought was that some character encoding setting was "kerflunkered" on the BizTalk orchestration. We looked here and there, we changed different pipelines to use xmlreceive instead of passthrough, and we tried adding some hints to the message transformers to get it to change. Unfortunately none of these things made any difference. The next things I looked at were the XSD definition of the response message from the web service in BizTalk and that started to point out the issue. In the type definition of the character fields you could see that it didn't use the standard "xs:" types. Instead it was a type that had a namespace of www.microsoft.com/wsdl/types. Uh oh ... now it's starting to become clearer.
The next thing I had to look at was the actual response from the web service on the wire. So I started up my SOAP toolkit trace utility and captured the messages flowing back and forth to the web service and sure enough there it was staring me in the face. The response from the web service was already represented as the ASCII code before it even got back to BizTalk server. Well crap, then how does .NET always make it so easy for you to work with character fields when you're dealing with web services? I mean if this is true then every web service and consumer that has dealt with character fields would have been comparing against integer values.
Well this is where the magic and wonder of the XMLSerializer comes in. If you open up the XMLSerializer code and look at how it serializes and deserializes character types you'll find that it does a conversion to a UInt16 and then does a directcast to a character using the XMLConvert class. This is why I titled this my love/hate relationship with the xmlserializer. I love that this detail was abstracted away from me but I hate that I never noticed it and obviously could cause all sorts of interop problems without really even knowing it. I guess this is why the contract first camps continue to get traction whenever you start talking about truly interoperable web services. There's just so many little things you take for granted with your type system. If you start with XSD you'd never have this problem. Bravo to BizTalk server for sticking to it's native XML tenet.
Lastly, I had to look at how this was handled in WCF (because in my opinion that's what Saturday mornings are for :) with the new System.Runtime.Serialization namespace. So I opened up Lutz's reflector and started digging into the datacontract serializer. To say that it's a little more sophisticated would be a gross understatement. Basically you have to first figure out how WCF deals with the built in data types which was initially a little tough to track down. Eventually I found the "TryCreateBuiltInDataContract" methods on the DataContract type (this was after some initial dead ends of tunnelling through the DataContractSerializer type).
What you will quickly see from this class is that each of the different known types in .NET has its own contract. I am interested in the character type so I need to look at CharDataContract which is a type in the root of system.runtime.serialization. Once you get to this type you can see there are two key methods "ReadXmlValue" and "WriteXmlValue". The write method is the easiest so I'll describe that first. You can see the code is simply "writer.WriteChar((char) obj)" and in that method you can see the value is written as "writer.WriteValue((int) value)". Guess what that means ... it means with WCF you'll still see the ASCII value in your xml when you are using the new data contract serializer.
Some additional inspection of the read side of this shows me that there is a "ReadElementContentAsChar" which actually calls to the same System.Xml.XMLConvert class that was used by the old serialization routine. How about that for some consistency. So you'll still be getting just about exactly the same behavior your used to in terms of the character type when you move to WCF.
In conclusion, I'm not saying that any of this is good or bad. It just surprised me that the ASCII character codes were what are used on the wire. I suppose it makes sense when you consider all of the possible character sets that could be handled by using the ASCII/Unicode number ranges. As an additional experiment I'd like to see how the Java web service stack handles this as that'll go a long way to understanding how interop would truly work in these situations. If you don't want to worry about this stuff then stay away from the character type all together and just use strings. Sure there's some very slight memory footprint benefits you're getting by sticking with a primitive value type like character but if you're dealing with xml as a transport you're not really focusing that much on data transport/storage size anyway.