Tag Archives: best practices

XML schemas compatibility

Photo by psd

This is the fourth installment of this series about managing backward compatibility in software development. Here I talk about what makes an XML Schema backward incompatible.

I specifically address W3C XML Schemas but general principles applies regardless of the schema language you use.

But first, why bother about XML Schemas compatiblility?

Actually, in enterprise applications, XML is often used either to specify configuration files or interchange formats. With the rise of WebServices and RESTFull applications on the Internet there is also an increase in the use of XML.

Thus, making sure that existing configuration files still work with your new software or, more importantly,  that other applications can still communicate with yours can really make a difference.

So, what makes a schema incompatible?

  • Changing an element or attribute type to a more restricted type (like adding constraints on a xs:string)
  • Changing the order of a sequence in a complex element
  • Removing or renaming an element or attribute from a complex type
  • Adding a mandatory element or attribute to a complex type without providing a default

Removing complex or simple types will also make it incompatible if:

  • Your schema is included or imported by other schemas or
  • You do not replace them by compatible anonymous types (compatible meaning equivalent or less strict, e.g. if one defines a simple type JavaClass, which is a xs:string with a constraint, and replaces it with xs:string).

Then, how to preserve backward compatibility?

If some elements of the schema are becoming obsolete, do not remove them. Instead, mark them as deprecated in the schema documentation and, if applicable, remove their mapping to the object model (that way you will not have to maintain the code equivalent of the deprecated elements).

The best strategy I came across so far is using namespacing: If a given schema must be refactored, create a new one, changing its namespace (a good practice is to include the major version of the schema in the namespace).

You then have two options:

  1. provide an XSL stylesheet that enables the migration of XML documents from the old schema the new one
  2. provide support code to be able to read both document structures

Of course, the second solution is the most desirable from the operational point of view (and the first one is not always applicable). However, the trade-off is that it is more expensive from the development point of view. Once again, choosing between who is going to do the work (the guy who develops or the guy who installs your application) is a matter of project management.

Database schemas compatibility

by gnizrThis is the third post about software compatibility, the previous ones were talking about project management and bugs and this one deals with database schemas compatibility (I will deal with stored procedures in the chapters about code compatibility).

First of all, what does backward compatibility means when talking about the database?

  1. Being able to retain data stored in one schema into a new one.
  2. Preserving compatibility with external systems (like report engines) that may be accessing the database directly.

Point #1 is achieved through migration tools that update the database schema, in some cases such tools may be very tricky.

Point #2 is a bigger challenge. Changes that may break the database compatibility are:

  1. Removing a table or changing its name.
  2. Removing a column, changing its type (including its precision or length) or changing its name.
  3. Changing the semantic of a column (e.g. changing the valid values).
  4. Adding foreign keys.

In case #1 and #2, if such changes cannot be avoided, a good enough solution is to implement database views that mockup the old tables based on the new ones.

The thing is that for #2 you will need to rename the actual table which will force an update of the foreign keys in other tables and surely more code update than what was initially expected. Leaving an unused column in the table may be a better solution. As usual, this is a trade-off that should be discussed at the project level.

Point #3 is more tricky because it really depends on the change and the usage of the column. Most of the time transforming a “change” into a “remove and add new” will enable to refer to #2. Triggers can then be used to update the old column or it can just be left unused.

Point #4 is a problem when there are scripts that delete entries in a table. If all of a sudden there is a new foreign key that depends on this table then the script will fail, thus breaking the compatibility. I actually have no technical solution for this one. I think that only documentation can be given, but if any of you has an idea please share it with us :)

Nevertheless, one should recall to never do any incompatible change without a good enough reason.

About bugs and software compatibility

This is my second post about backward compatibility in software, the first one was dealing with the project management aspect of software compatibility, this one talks about bugs and how, sometimes, correcting a bug can break compatibility.

First of all, coming back to my previous post on the subject, deciding whether or not to break the backward compatibility of an application is a project management matter. The decision that correcting a bug will break compatibility must not be left solely to the developer, sometimes the company may decide that compatibility should be preserved even when it comes to bugs.

Raymond Chen, a well-known developer at Microsoft, has some good examples on his blog, The Old New Thing, to illustrate this. Raymond actually gives us a good insight at Microsoft policy concerning backward compatibility of its OS.

This post, from Joel Spolsky (another well-known ex-employee of MS) gives another good example with this leap-year-bug deliberately created for Excel/1-2-3 compatibility.

So, to make it short, when you correct a bug, incompatibilities can appear because:

  • Either the bug as been detected and a workaround as been put in place. This workaround will have to be removed once the bug is corrected.
  • Or this was not initially considered a bug and the behavior is going to unexpectedly change.

As an example, if an interface exchanges strings representing date and time and you later discover that the time zone is omitted from the format. If someone developed a parser for this date and time but never expected a time zone information, then the application will break. This is a semantic incompatibility, but one that is brought by a bug fix.

In the case where your management decided that bug for bug compatibility was not necessary, the incompatibility and its potential impacts should be documented in the migration release notes.

In the case where you have to maintain the bug to maintain the compatibility, I recommend you subscribe to Raymond Chen’s blog or stop writing bugs.

How to manage software compatibility

For most software companies the ability to ship new versions of a product that will preserve clients’ data and customizations is a matter of market share. Still, this is often an afterthought and there seems to be little documentation available.

This article is the first of a serie about managing backward compatibility in enterprise applications. This will not be a definitive guide but I will try to spot the common areas where incompatibilities can appear and give guidelines about managing them.

This first post is about the project management side of backward compatibility.

One of the most important thing to remember about backward incompatibility is that it is mostly a matter of process and project management.

In order to find the most accurate way of solving a compatibility issue you need to talk about it because the solution can be driven by technical, business or project considerations. Once a solution is accepted, the reason as to why this as been done that way must be properly advertised (this is of uttermost importance when only documentation is provided) and rolled-out.

As backward compatibility is a project concern it must be:

  1. Listed in the project risks list
  2. Considered at the project level
  3. Optionally considered at the product level (mostly when it has business impacts)

There are three ways to solve backward incompatibilities, they are listed from the most desirable to the one that requires the less developer work:

  1. Ensure binary compatibility – Work is done at the development’s level.
  2. Provide migration tools – Work is split between development and services but emphasis is put on development.
  3. Provide thorough documentation of incompatibilities and ways to overcome them – Work is split between development and services but emphasis is put on services
  4. Reject or postpone the change – Work is then at the product management level

Like for bugs, backward compatibility cannot be guaranteed at 100{5f676304cfd4ae2259631a2f5a3ea815e87ae216a7b910a3d060a7b08502a4b2}, the best thing a project manager can provide is a good measure of the risk upon it for a given version.

When a new version is released, incompatibilities, those that have not been foreseen or at least documented, must then be treated like any other bug and become part of the maintenance process.

In the following posts I will focus on what can make an application backward incompatible and give some guidelines in order to limit those issues and ensure binary compatibility.

See also Backward Compatibility on Wikipedia.

Fun with Java files encoding

Have you ever tried to write Java code with non-ASCII characters? Like having French method names?

The other day I stumbled upon Java classes written in French. Class names like “Opération”, methods names like “getRéalisateur” and embedded log messages and comments all the same.

At first you say “not common but cool” (and you start thinking about writing code in Chinese because your boss always wondered how we could forbid clients from decompiling our classes without using an obfuscator).

But cool it is not!

Why? Because of encoding!

Here is a quiz, what is the encoding those Java files were saved in?

  1. UTF-8 (after all this is how strings are encoded in the JVM)
  2. ASCII (come-on, everybody is writing code in English)
  3. MacRoman (why not?)

Just wonder for a while.

Answer is #3 because the Java IDE (Eclipse in this case) is by default using the platform encoding to save files. And those classes have been created on a Mac.

I actually had no problem reading and compiling them because I also use Eclipse on a Mac and because the Java compiler is also assuming the source files are in the platform encoding.

So what, nothing wrong then? Yeah, except the integration server is running on Ubuntu and sometimes I work on Windows as well. And on those platforms the default encoding is not MacRoman…

Something interesting is that it is always like that! I mean, even when you code in plain English there are chances that your IDE is going to write the files in the platform encoding. But nobody notices because as long as you only use characters in the ASCII-7 range, then they will be encoded the same in almost all encodings.

So what is the solution? Well it depends if you really want to code in French (or in Chinese). My advice anyway is “don’t do that” and externalize localized strings. However, if you really insist you have two solutions:

  1. Make the whole production chain encoding-explicit: Configure your IDE to use UTF-8 and specify in your build that the Java compiler is going to deal with UTF-8 encoded files (UTF-8 is better in most cases).
  2. Make sure you only use ASCII-7 characters in your files and replace all non-ASCII-7 characters with their \uXXXX equivalent (even in comments).

However, be aware that #1 is not always possible, you might be using processing tools that do not offer you the option to use something else than the platform encoding.

Have fun with encoding :)

Image Credits: Arite