Automate XML simplifying in .NET

comments

Challenge

Recently I had to refactor many times the similar XML:

<root>
    <simpleType>some-value</simpleType>
    <complexType>
        <int>156</int>
        <bool>true</bool>
        <string>value</string>
    </complexType>
</root>

into this:

<root simpleType="some-value">
    <complexType int="156" bool="true" string="value" />
</root>

Actually, I need to move elements of simple types (types which are doesn't have any children elements) to attributes. Manual refactoring is boring. So, I found the way to automate this task. As .NET developer, I'll produce C# code here.

Solution

There is great Json.NET library. This library has functionality for conversion between XML and JSON:

If you convert XML node like:

<root>
    <element>value</element>
</root>

to JSON string with JsonConvert.SerializeXmlNode(), you'll got following JSON:

{
    "root": {
        "element": "value"
    }
}

And if you convert XML node with attributes like:

<root attribute="value"/>

to JSON string with JsonConvert.SerializeXmlNode(), you'll got following JSON:

{
    "root": {
        "@attribute": "value"
    }
}

Note that attribute in JSON starts with @ and element is not. I think that @ character is taken from XPATH specification. When you need to get attribute value with XPATH, you'll write /root/@attribute XPATH expression. Json.NET is follows this conventions for attributes.

The solution is:

  • serialize XML node with JsonConvert.SerializeXmlNode
  • add @ to JSON properties names for all simple JSON types (like string, decimal, boolean)
  • deserialize XML node with JsonConvert.DeserializeXmlNode

Note that there is also JsonConvert.SerializeXNode() and JsonConvert.DeserializeXNode() methods with the same functionality for LINQ to XML classes. I'll use them in example code below for simplicity.

Here is C# code:

using System;
using System.Linq;
using System.Xml.Linq;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

public static class XmlSimplifyHelper
{
    public static string SimplifyXml(string sourceXml)
    {
        var jsonDocument = JsonConvert.SerializeXNode(XDocument.Parse(sourceXml));
        var jsonSimplified = SimplifyJson(JToken.Parse(jsonDocument)).ToString();
        var xNode = JsonConvert.DeserializeXNode(jsonSimplified);
        return xNode.ToString();
    }

    private static JToken SimplifyJson(JToken json)
    {
        return Rename(json, (name, token) => (!prop.Name.StartsWith("@") && IsSimpleType(token)) ? "@" + name : name);
    }

    private static JToken Rename(JToken json, Func<string, JToken, string> map)
    {
        var prop = json as JProperty;
        if (prop != null)
        {
            return new JProperty(map(prop.Name, prop.Value), Rename(prop.Value, map));
        }

        var arr = json as JArray;
        if (arr != null)
        {
            var cont = arr.Select(el => Rename(el, map));
            return new JArray(cont);
        }

        var o = json as JObject;
        if (o != null)
        {
            var cont = o.Properties().Select(el => Rename(el, map));
            return new JObject(cont);
        }

        return json;
    }

    private static bool IsSimpleType(JToken token)
    {
        return !(token is JArray) && !(token is JObject);
    }
}

Here I got Rename() method from this stackoverflow answer on how to rename Json.NET properties. As it turns out, the names are immutable and we need to reconstruct JSON with renamed properties.

Happy coding!

Comments