Web Services and application integration, part 3: Data transfer

It’s easy to be wise after the fact, and it’s equally easy to look at something designed two years ago and find better ways to do it now.

I am not sure, however, this excuses the data access mechanisms in Part 19 of the DICOM Standard. Clearly designed with one person or group’s particular implementation in mind, the data transfer interface places heavy implementation requirements on the host in order to allow applications that need not parse DICOM. Conversely, the implementation requirements of the application mean this interface is poorly suited to applications that generate large quantities of data. I contend that the initial round of an interface should talk existing protocols as much as possible, without inventing new and redundant ones until they have been proven necessary.

The ‘Native DICOM Model’ in section A.1 is an XML formulation of a DICOM object, clearly designed for use by applications that choose not to parse DICOM. However, this seems like a strange goal. DICOM parsing is not so unusual as to not have a multitude of candidate toolkits.

Still, this is not as concerning as section A.2 ‘Abstract Multi-Dimensional Image Model’. This section requires that hosting applications be able to interpret DICOM image data, involving transformation into real-world values, interpolation and esoteric things such as splitting MR sequences and CT gates – something required nowhere else in the standard – and also convert such data back into DICOM. I understand the motivation for this requirement, since it is clearly intended to centralise this kind of processing, but what it mostly does is put prospective implementers off. Indeed, the very concept of performing this transformation is enough for me.

Even after the host has bent over backwards to support the two models, the work is not yet done. It is usually reasonable for a host to be able to supply data on demand for an application; when one considers the likely use cases for this interface (PACS clients, anticipatory processing), one can usually accept the requirement to be able to provide data multiple times. However, the host is also required to provide a generalised query interface over the models, using XPath. This implies that the host has efficient access to all possible models of the data, and fetching data over remote connections is probably not feasible without extensive local caching. Again, I understand the motivation, but that’s not going to cause an implementation.

Finally, the application gets its data. I clearly have a bias in my interpretation of the use cases, since I generally deal in large datasets (deformable registrations, warped dose volumes, contours), but once the application has generated its data it is required to cache it for as long as the host feels like making it wait. In the general case, this means shifting it to disk. The symmetric DataExchange mechanism that must have appealed to the authors of this interface falls down here, and feels completely unnecessary.

My choices are straightforward: talk DICOM wherever possible. Part 19 already expects that DICOM data can be transferred via HTTP (for which WADO is a candidate); for a first interface, I would contend that URLs, sorted by study and series, would be enough for getting data to the client. I see no compelling reason to have a notification arrangement for the host data provision; the URLs should be provided at launch time. The new WADO-RS standard (Supplement 161) can also provide the data using the XML format and hence give access to the metadata alone; while I wish they’d just used the standard DICOM format, it’s better than the abstract image model!

Similarly, for the application providing data to the host, it seems unnecessary to have a lazy fetch mechanism. HTTP POSTing data to the host should be sufficient. I believe there is a storage proposal pending (edit: STOW-RS, Supplement 163); this uses multipart HTTP requests to allow sending of multiple data sets at a time (including using the XML format). A prospective new Web Services interface would surely make the optimum use of WADO-RS et al rather than rolling another data exchange interface.

Next time: general thoughts and conclusion.


Posted in Uncategorized | Tagged , , | Leave a comment

Web Services and application integration, part 2: Schemas and WSDL

The DICOM standard is often a victim of special interests. Unlike, for example, the IEEE, the DICOM standardisation process does not require multiple interoperable implementations; in fact, it does not require any implementations before something becomes standardised. Prior experience in implementation is desirable, but the process will not wait for implementations and testing.

As a result, Part 19 is a mess of inconsistencies. The XML data model – itself a questionable addition at this time – is specified using RELAX NG Compact, while the SOAP interface is expressed with WSDL and XSD. The WSDL itself is obviously tool-generated, and contains a reference to a Microsoft schema. Worst of all, however – the host and application interfaces use different namespaces.

A significant part of the point of using Web Services is to make use of the existing tool support. For that reason, the specification should be written to make the best use of that tool support. I have no particular issues with RELAX NG Compact as a validation technology, but it does not have the straightforward built-in support that XSD enjoys. Both .NET and Java (JAXB) can generate serialization code automatically from XSD; there seems no reason for RELAX NG to enter the picture.

Using different namespaces for host and application interfaces would be fine if there was no overlap between the two. Unfortunately, that is not the case here. Figure 8-1 in the Standard makes this clear: both host and application share the DataExchange, State, Rectangle and UID types, and all of their dependencies. However, any code generated from these two interfaces will generate two sets of all of these types, and duplication of code is the inevitable end result. It is clear that a well-thought-out interface would have none of these drawbacks.

It is possible to create a clean, simple interface in one WSDL file with one schema that is used by both host and application. Without an attempt at implementation, these issues were not found during standardization – and now may never be fixed. I hope that a future interface may address these problems.

Next time: data transfer.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Web Services and application integration, part 1: Initial communication

Web Services are the RPC mechanism of choice for most software environments. Without getting too deeply into the arguments over superiority of protocol, it is notable that SOAP-based Web Services enjoy first-class tool support in all business-class modern programming languages and environments, displacing COM and CORBA in the enterprise.

A sign of the maturity of the technology is that it has even made its way into the DICOM standard, by way of Part 19: Application Hosting. This part of the Standard attempts to define an API between an application host (e.g. a PACS client) and an application (e.g. a 3D viewer or CAD software). The use of Web Services here is appropriate; however, the particular implementation has many drawbacks. In this series, I will attempt to address the drawbacks in a manner that suggests a better implementation.

Firstly, we have to address the issue of bidirectional communication. It is not enough to have a SOAP connection between application and host, because HTTP is inherently one-way. As a result, application and host each have to implement a SOAP endpoint. This is the first of the DICOM stumbling blocks: the URL of the SOAP endpoint of the application is specified by the host (see section 7.1 Initialization). This means that the host has to pick a URL on which the application can listen – independent of the implementing technology of the application. In practice, this is going to mean picking an unused port. However, this is not so easy; the only race-free way of picking an unused port is to actually create a socket on port 0 – and the host can’t do this for the application. In practice, therefore, fixed port ranges are used – placing scalability limits on the host, and forcing multiple hosts (e.g. in a multi-user system) to communicate.

There is a better system. I support an approach that makes better use of the technology: pass only the host’s URL to the application. The host’s endpoint should then support an operation that the application can call to pass the application’s URL to the host. After this initial handshake is complete, the host and application both have access to their counterpart’s URL, but now the application can choose a URL it is capable of listening on without collision, by creating the socket itself.

Of course, any system involving user applications hosting Web services will inevitably lead to issues with Windows Communication Foundation, the .NET Framework’s implementation of Web services. While Web services can themselves be hosted in a .NET application, the default HTTP binding for WCF uses HTTP.SYS for its listener. This has one fatal drawback for applications: the administrator has to specifically allow URLs for use by applications. This process is largely accomplished through a command line tool, and is not straightforward to embed in an installer. Furthermore, HTTP.SYS cannot listen on ‘port 0’; the port must be known at install time.

To get around this limitation, it is perhaps of interest to use a simple HTTP implementation to replace the default HTTP binding. We will discuss this option later.

Next time: schemas and WSDL.

Posted in Uncategorized | Tagged , , | Leave a comment

The tag-value map

There are many general-purpose data serialisation formats. The one I deal with most commonly is defined by the DICOM standard, and like many such formats it consists of a map from a set of predetermined keys to values. Each of these values can be in any of a variety of types, but the allowable type for a particular key is constrained.

Now, one could represent this as a simple map from a tag value to a suitable base type (e.g. System.Object), relying on casting operations to get and set values. This is not a great idea from the perspective of static compile-time checking, however. C# and Java generics allow for a neater solution.

At their heart, Java generics are merely sweetening for the compiler – they have no run-time representation. This tends to limit what can be done with type inference, but is adequate for our purposes. Let us consider a map from a key type to a list of values:

Map<Key, List<?>> map;

We have used List<?> here because we want to allow any value type. Getting an element from this map involves an explicit cast in code to a List of the appropriate type. However, if we wrap this map inside another object we can hide the cast in an accessor method:

class MapWrapper 
    Map<Key, List<?>> map;
    public <T> List<T> get(Key key) { return (List<T>) map.get(key); }

Nothing is yet saved because the type has to be explicitly specified in each call. If we add a generic parameter to the key object, though, we can use type inference to push the type out to the key object definition instead:

class Key<T> {...}
class MapWrapper
    Map<Key<?>, List<?>> map; 
    public <T> List<T> get(Key<T> key) { return (List<T>) map.get(key); } 

Now, whenever the map is accessed through that key the type is inferred from the key. If the key is stored in a single place, often referred to as a ‘key dictionary’ or (in DICOM parlance) ‘tag dictionary’, the application can globally benefit from compile-time type checking:

class Keys 
    public static final Key<String> SomeValue = new Key<String>(); 
    public static final Key<Integer> AnotherValue = new Key<Integer>(); 
MapWrapper w;
List<String> s = w.get(Keys.SomeValue); 
int i = w.get(Keys.AnotherValue).get(0); 
int i2 = w.get(Keys.SomeValue).get(0); // causes compile-time error

In C# and VB.NET, the generic type specification forms part of the concrete type. This is a powerful facility that offers many benefits, but it also makes our approach slightly more involved because it cannot offer wildcards. Since the type arguments to the internal map must be general, this requires a base type without a type parameter for both key and value. For the list we may use the old untyped IList interface, but we need to add one ourselves for the key:

abstract class KeyBase { ... } 
sealed class Key<T> : KeyBase { ... } 
class MapWrapper 
    Dictionary<KeyBase, IList> map; 

Despite this, the result does not yield any more casting at runtime than its Java equivalent.

The result of this pattern can be used effectively in many more scenarios than merely low-level data serialisation. Maps are frequently used by frameworks and plug-in architectures to handle data in an application-agnostic fashion – the ASP.NET Session object springs to mind – and this technique can be a useful alternative to wrapper classes and extensive casting.

Posted in Uncategorized | Leave a comment

Wrapper classes and exception handling in JNI

The JNI API is sufficiently verbose that some may be tempted to create wrapper libraries to hide the complexity and provide a more object-oriented, strictly-typed interface to their own code. This is, more often than not, a trap for the unwary. JNI calls are generally costly and should be rationed carefully, and hiding such behind a nice-looking wrapper class can cause more harm than good.

My preference, after having been burned in this fashion, is to have the simplest of the simple. A small extension of the JNIEnv class allows for Java exceptions to propagate as C++ exceptions in a very tightly-defined manner:

struct Environment : public JNIEnv_

Now, if we do not add virtual methods or fields to this class, the layout in memory of Environment is exactly the same as that of JNIEnv_. We can add non-virtual methods at our leisure, and even override non-virtual methods of the base class, and still be substitutable for the base class. You can now write your JNI methods using this class instead:

jobject JNICALL methodName( Environment *env, jobject _this, int param )

Our Environment class doesn’t currently do anything new. What I propose is to override each of the JNIEnv_ methods (except for the exception-handling and Primitive methods):

   jint GetVersion();
   jclass DefineClass(const char *name, jobject loader, const jbyte *buf, jsize len);
   jclass FindClass(const char *name);
   jmethodID FromReflectedMethod(jobject method);
   jfieldID FromReflectedField(jobject field);

The implementation of each of these overrides can simply call the base class, followed by a method that we will define to handle the exceptions that may be thrown.

inline jint Environment::GetVersion()
   jint version = JNIEnv_::GetVersion();
   return version;
inline jclass Environment::DefineClass(const char *name, jobject loader, const jbyte *buf, jsize len)
   jclass clazz = JNIEnv_::DefineClass(name, loader, buf, len);
   return clazz;

The implementation of throwNativeException requires some explanation. We presume here that exceptions are relatively rare (i.e. you’re not using them for control flow), so catching Java exceptions and translating them into native exceptions does not produce an unacceptable overhead. The use of native exceptions allows the normal C++ exception handling to exit the method cleanly, except that a native exception that propagates out of a JNI method will crash the JVM. It is therefore necessary to catch that native exception inside each JNI method that uses this technique and allow the Java exception to propagate.

Therefore, in the Environment class:

       * Throws a JNI Java exception corresponding to this C++ exception.
       * @param t the exception to throw.
      void throwJavaException( const std::exception & t )
         // Ignore the C++ exception if there's a current Java exception.
         if ( !ExceptionOccurred() )
            ThrowNew( FindClass( "java/lang/Exception" ), t.what() );

       * Throws a native exception corresponding to the current Java exception, if any.
      void throwNativeException()
         if ( ExceptionOccurred() )
            throw std::exception();

JNI methods can then be implemented without constant explicit checks for Java exceptions:

Volume getVolume( Environment * env, jobject o )
   jclass c = env->GetObjectClass( o );
   return Volume( env->GetIntField( o, env->GetFieldID( c, "width", "I" ) ),
                         env->GetIntField( o, env->GetFieldID( c, "height", "I" ) ),
                         env->GetIntField( o, env->GetFieldID( c, "depth", "I" ) ),
                         env->GetDoubleField( o, env->GetFieldID( c, "columnSpacing", "D" ) ), 
                         env->GetDoubleField( o, env->GetFieldID( c, "rowSpacing", "D" ) ), 
                         env->GetDoubleField( o, env->GetFieldID( c, "sliceSpacing", "D" ) ) );

jobject JNICALL calculate( com::mirada::jni::Environment *env, jobject /*_this*/, jobject o )
      Volume v = getVolume( env, o );
      Point p = nativeCalculate( v );
      jclass c = env->FindClass( "java/awt/Point" );
      return env->NewObject( c, env->GetMethodID( c, "<init>", "(II)V" ), p.x, p.y );
   catch ( std::exception& e )
      env->throwJavaException( e );
      return 0;

This pattern adds no overhead to the normal control flow, while hiding the clutter of Java exception handling.

It is important not to override the exception-checking methods themselves, of course. It is also inappropriate to override the ‘Primitive’ methods since the exception-checking methods are not valid within a GetPrimitive…ReleasePrimitive pair; once you have called GetPrimitive… then the only valid calls will be GetPrimitive… and ReleasePrimitive… until the last Get is paired with a Release.

In order to gain these benefits throughout your JNI code, it is important to use the new Environment class everywhere, since functions that use JNIEnv will not throw native exceptions (the overriding is non-virtual).

This pattern frees the developer from constant calls to ExceptionOccurred() whenever a JNI exception occurs, but adds no run-time overhead over the manual check. It allows JNI calls to be written in a much more expressive manner. It also frees the user from the use of return codes or more ExceptionOccurred() calls should utility methods be used.

Posted in Uncategorized | Leave a comment

Native code and premature optimisation

Let me declare my interests first: I am a great fan of Microsoft’s .NET development tools, I build my company’s applications in Java and I am highly skilled in C++. I am therefore hardly objective, but hopefully balanced.

It’s hard to get statistics on what people write software in. Some people will tell you it’s mostly C++, some mostly VB, others a combination of C# and Java. I suspect the methodologies of all these informal studies are deeply flawed. I believe, however, that quite a few people are busy building applications in each of those technologies today. I also believe that each of these are suitable for a large proportion of almost all applications for the platforms on which they are available – so one could select any of them for a desktop application on Windows, for example.

I’m not here to fuel the Java-.NET wars, not even on the desktop, so I shall (for the moment) lump the two together under the umbrella term ‘managed’. We therefore have ‘managed code’ and ‘native code’, where for all intents and purposes the latter means C++. I am aware of the subtleties of these descriptions – garbage collectors for C++, muddying of the waters with C++/CLI, gcj and so on – but these are merely distractions from the point.

My contention is simple: that one should choose managed languages wherever possible and for the maximum amount of the application possible.

‘Possible’ here does not imply compromising the functionality of the application. For some applications, managed code may be entirely inappropriate. I hope, however, that you will challenge the assumption that it is always inappropriate for a class of applications, because that is not true, and has not been true for some time. Device drivers aside (and, really, who among you are writing such?), the majority of applications are possible in managed code. Some environments are using it exclusively, e.g. Android, Silverlight; some exclude it entirely, like iOS – but where it is a choice, one should be wary of native code.

Managed code offers many productivity benefits. It is true that some of these benefits are not artefacts of being in a managed, garbage-collected environment but simply due to a clean start – that does not, however, negate them. Your mainstream choices for desktop application development today are Java, .NET, Flash or C++ with one of the multitude of libraries, and you would need a compelling reason indeed to go with a minority technology. Server application developers have already ceased to read this article because they are unable to figure out why you would use C++ anyway. Scripting environments like Flash, HTML5 and so on will be dealt with at another time.

The most-cited reason I see for selecting C++ over another suitable technology is performance. It is, indeed, true that for any managed code system it is possible to build an equivalent or higher performance native system using C++. Indeed, it would be a peculiar environment indeed in which that were not the case. This argument, however, deserves closer attention. ‘Fast’, ‘slow’ and the boundary between them are not absolutes. Rarely does a system need to obtain the highest performance possible for every moment of its execution, and for interactive environments the need for performance is usually restricted to small sections.

By comparison, managed environments tout several advantages over C++. The productivity gains of these environments are hard to deny. Managed systems may be made resilient for less effort because the effects of bugs are less catastrophic. These attributes have already made Java and .NET natural candidates for Web server applications. Recent – and ‘recent’ in this context is five years – advances in managed environments have also raised their raw performance to that comparable with native code in many situations.

It’s also sensible to consider the wider picture. Whatever you believe about the distribution of these environments in terms of applications, it’s clear that the distribution of skills is heavily biased towards Java and C#. Universities start this trend by taking a nice, easy and safe environment and using it to teach computer science concepts at a higher level of abstraction. There will undoubtedly be a module in that degree course that introduces C++, but the average student comes out with a much better grasp of Java.

In addition, modern C++ has grown up a lot in the last twenty years. C++ isn’t just ‘C with Classes’; it can cater for a bewildering array of programming styles. Template meta-programming is thrown blithely into conversations on the Boost mailing list as though it were a basic concept; functors, lambdas, iterators, templating, STL containers… and every single library for C++ feels the need to add its own set of many of these concepts, too. A computer science graduate can be productive in a small way in a short time in Java, or cross-train in C# in just a little longer; in a C++ environment, it’s going to take a lot longer.

The points in favour of a managed environment over native C++ are many, so in the next post in this series we’ll discuss what happens when you hit the stumbling block – what happens when nothing else will do?

Posted in Uncategorized | Leave a comment

Morse code detection and swimming at the deep end

Morse code is deceptively simple in concept. The uninitiated view it as a merely a series of dots and dashes, which don’t appear to be all that hard to detect. This makes it a perfect example of the hidden depth of seemingly straightforward use cases and user requirements, because the reality of implementation is significantly more complicated.

Accurate coders (all human, because who would need to use the relatively inefficient Morse code between computers?) have their own rhythm, but the speed of coding varies from user to user. Competent and able Morse coders come between 20 and 30 words per minute (WPM); the fastest ‘high-speed’ coders can hit 60 WPM (!) – while those using it as an assistive technology (e.g. for those suffering cerebral palsy) may be in the single digits. Slowest of all, of course, are software developers testing their own Morse code detectors, for whom words can be very laborious as they hunt for the letters on the Wikipedia article.

Traditionally, Morse code is calibrated at the beginning of a transmission by sending either of the words ‘PARIS‘ (.--. .- .-. .. ...) or ‘CODEX‘ (-.-. --- -.. . -..-). This allows the receiver to calculate the average ‘dot’ timing, of which the ‘dash’, inter-character and inter-word timings are multiples. Using this calculation, it is possible to then run a clock sampling the Morse signal. From the Nyquist sampling criterion, this clock must be at least twice the dot rate, but a larger multiple would help with key de-bouncing and compensating for human variance.

Alternatively, if immediate letter-by-letter output is not required, one could attempt to infer the Morse signal from a data sample. This requires input to not be solely dots or dashes. The characters that are solely dots are E, H, I, S and 5, while the characters that are solely dashes are M, O, T and 0. (The interesting correspondence between the O-0 pair – 3 dashes and 5 dashes respectively – and the S-5 pair – 3 dots and 5 dots – may be only a coincidence, but is nevertheless entertaining.) Neither group is sufficient for more than a couple of words of a coherent English sentence, so the suitability of this technique is dictated by its probable use. In addition, while it is probably entirely suitable for continuous text entry, editing would probably require at least timing estimation to allow the entry of single letters and words as corrections.

Taking the first approach as the most robust, therefore, the clock-based synchronous sampling approach would seem to be the obvious solution for embedded applications and single keys. For entry in other situations, for example in a Web page, running a continuous timer is a power-hungry and inefficient solution. The longer calibration word, CODEX, represents 53 dots, and therefore typing at the competent speed of 30 WPM implies a dot period of about 38ms. Synchronous timers at three or four times this speed are probably beyond the reach of most JavaScript engines; indeed, Windows requires the high-performance multimedia timers to improve over 16ms resolution.

The preferred method is therefore to use the timing of the input events. The first ‘press’ input event, whether that be mouse click, key press, touch on touch screen or something more exotic, represents the start of the first letter. On ‘release’, a timer is then started for the inter-character duration (three dots), which at over 100ms is well within the grasp of most timers. This timer is stopped for each successive ‘press’ and restarted for ‘release’. When the timer elapses, the letter is complete and may be analysed and output, while another timer is started for a further four dots. If no input is received before this timer elapses, a space may be output as the inter-word spacing has been reached.

Some hysteresis is desirable in these timings. Since the inter-element gap is only one dot long, while the short gap (between letters) is three dots long, the short and long gaps may be usefully shortened by some tolerance (e.g. half a dot) to allow for human variation. It is at this point that we leave the realms of exact timings and enter the field of user testing and configuration to suit a user’s style, skill level and accuracy.

Having established the start and end of each individual letter, we may now consider the problem of analysing the letter itself based on the (hopefully accurate) timings of the input events. If we allow for a timing variation of ε (as a proportion of dot length), ε < 1, we arrive at the following set of possible symbols for given intervals T between ‘press’ and ‘release’.

Interval Symbol
T < 1 – ε Nothing (key bounce)
1 – ε ≤ T ≤ 1 + ε Dot
1 + ε < T < 3 – ε ?
3 – ε ≤ T ≤ 3 + ε Dash
T > 3 + ε ?

The cases marked ? above are edge cases, the interpretation of which and the subsequent action are very much up to the implementation. Some implementations may use the presence of these as an indication that a recalibration must be performed. Some may choose different tolerances for the various boundaries; it is of course possible to set those boundaries such that there are no unknown cases, at the risk of possible misinterpretation if recalibration is indeed required. Still other implementions may be aimed at training Morse users and give a gentle nudge to the user that their coding is inaccurate. The choice is again application-specific and relates back to the user requirements, including those relating to the type of user and system configuration.

This is by no means an exhaustive treatment of the Morse decoding algorithmic landscape. More advanced Morse coding machines offer dedicated buttons for symbols and short sequences (.- on a single button, for example), which can clearly improve signalling speed significantly without necessarily raising the symbol entry difficulty above that suitable for assistive uses. There are also as many different algorithms to parse Morse entry as there are implementations of it; the aforementioned deceptive simplicity of the concept leads people to derive their implementation from first principles, as I have done here, rather than using well-established algorithms. The purpose of this article is neither to propose best practice nor to give a comprehensive overview, but simply to illustrate that even small, simple tasks in software engineering can have hidden depths and pitfalls which the aspiring project planner, requirements analyst or software developer would do well to consider before diving in.

Posted in Uncategorized | Tagged | Leave a comment