Convert protobuf message definitions to XSD Schema

comments

What if you have HTTP API based on protocol buffers and you want to add XML support to the API? You'd like to have one source of truth and you already have much of protocol buffer message definitions? You have unwanted option to write XSD Schema manually which is error-prone. Here I'll show how you can automate XSD schema generation from existing protocol buffers message definitions.

First step is to parse protocol buffer message definitions to more convenient format for processing.

Fortunately, protoc compiler has switch --descriptor_set_out=FILE which writes a FileDescriptorSet (a protocol buffer, defined in descriptor.proto) containing all of the input files to file.

FileDescriptorSet will contains all parsed message definitions in structured way. But protocol buffer is not very convenient format for processing. We can translate it to XML. As C# developer, I can:

With XML representation of protocol buffer message definitions we have all the possibilities of XSLT transformations.

All this stuff has already implemented in great protobuf-net's utiltiy, called protogen. Unfortunately, this utility is out of support, but you can find it included into protobuf-net v1.0.0.280 nuget package. You can download this package, you'll find protogen in tools subdirectory.

If you want to receive XML-view of protocol buffer message definitions, call protogen with xml.xslt like:

protogen -i:descriptor.proto -o:descriptor.xml -t:xml -d

The output file will looks like:

<FileDescriptorSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <file>
    <FileDescriptorProto>
      <name>descriptor.proto</name>
      <package>google.protobuf</package>
      <dependency />
      <message_type>
        <DescriptorProto>
          <name>FileDescriptorSet</name>
          <field>
            <FieldDescriptorProto>
              <name>file</name>
              <number>1</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.FileDescriptorProto</type_name>
            </FieldDescriptorProto>
          </field>
          <extension />
          <nested_type />
          <enum_type />
          <extension_range />
        </DescriptorProto>
        <DescriptorProto>
          <name>FileDescriptorProto</name>
          <field>
            <FieldDescriptorProto>
              <name>name</name>
              <number>1</number>
              <type>TYPE_STRING</type>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>package</name>
              <number>2</number>
              <type>TYPE_STRING</type>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>dependency</name>
              <number>3</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_STRING</type>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>message_type</name>
              <number>4</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.DescriptorProto</type_name>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>enum_type</name>
              <number>5</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.EnumDescriptorProto</type_name>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>service</name>
              <number>6</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.ServiceDescriptorProto</type_name>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>extension</name>
              <number>7</number>
              <label>LABEL_REPEATED</label>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.FieldDescriptorProto</type_name>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>options</name>
              <number>8</number>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.FileOptions</type_name>
            </FieldDescriptorProto>
            <FieldDescriptorProto>
              <name>source_code_info</name>
              <number>9</number>
              <type>TYPE_MESSAGE</type>
              <type_name>.google.protobuf.SourceCodeInfo</type_name>
            </FieldDescriptorProto>
          </field>
          <extension />
          <nested_type />
          <enum_type />
          <extension_range />
        </DescriptorProto>

        ...

      </message_type>
    </FileDescriptorProto>
  </file>
</FileDescriptorSet>

This mechanism is fully extensible: you can write custom XSLT transformation and execute it with -t:<transformation> switch. Thanks to Marc Gravell!

I needed to create XSD schema from message definitions with the requirements:

  • use XML attributes for simple types (like strings)
  • correct handling of required and optional for attributes with use="required" and use="optional" XSD attributes
  • correct handling of optional for elements with minOccurs="0" attribute
  • specify elements in any order (using <xs:all /> instead of common <xs:sequential />)
  • correct handling of repeated with maxOccurs="unbounded"

The result is simple XSLT transformation which you can see on github.

Note that I tried it on Windows only. It is possible to run .NET under Mono, but for now XSLT contains Microsoft-specific extension to execute msxsl:node-set() function, just to make XslCompiledTransform happy with this XSLT. I'm not sure that msxsl extensions is included in Mono.

If you apply it to descriptor.proto with

protogen -i:descriptor.proto -o:descriptor.xsd -t:xsd-attributes -d

you'll get the XSD which will looks like:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
  <!--Generated from: descriptor.proto-->
  <!--Namespace: google.protobuf-->
  <xs:complexType name="google.protobuf.FileDescriptorSet">
    <xs:all>
      <xs:element name="files">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="file" type="google.protobuf.FileDescriptorProto" maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:all>
  </xs:complexType>
  <xs:complexType name="google.protobuf.FileDescriptorProto">
    <xs:all>
      <xs:element name="message_types">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="message_type" type="google.protobuf.DescriptorProto" maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="enum_types">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="enum_type" type="google.protobuf.EnumDescriptorProto" maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="services">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="service" type="google.protobuf.ServiceDescriptorProto" maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="extensions">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="extension" type="google.protobuf.FieldDescriptorProto" maxOccurs="unbounded" />
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="options" type="google.protobuf.FileOptions" />
      <xs:element name="source_code_info" type="google.protobuf.SourceCodeInfo" />
    </xs:all>
    <xs:attribute name="name" use="optional" type="xs:string" />
    <xs:attribute name="package" use="optional" type="xs:string" />
  </xs:complexType>

  ...

</xs:schema>

Update: I also have added xsd.xslt transformation which will generate XSD schemas for C# protobuf classes (generated by protogen.exe) serialized to XML with XmlSerializer class.

Hope this helps someone! You can customize XSLT for your needs.

Happy coding!

Comments