The term taxonomy has been widely used and abused to the point that when something is referred to as a taxonomy it can be just about anything, though usually it will mean some sort of abstract structure. Taxonomies have their beginning with Carl von Linné[3], who developed a hierarchical classification system for life forms in the 18th century which is the basis for the modern zoological and botanical classification and naming system for species. In this paper we will use taxonomy to mean a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy without doing anything further, though in real life you will find the term “taxonomy” applied to more complex structures as well.

The benefit of this approach is that it allows related terms to be grouped together and categorized in ways that make it easier to find the correct term to use whether for searching or to describe an object. For example, this could help users and authors by making it clear that there are two closely related terms: “topic maps” and “XTM”, and helping them choose the right one. (Or, at least, for the users, telling them that they should perhaps try both.)

Figure 1. An example taxonomy

The figure above shows the placement of topic maps within a hypothetical taxonomical structure. As can be seen, this structure could easily help someone looking for information on topic maps or classifying a document to do with topic maps to pick the right terms to use.

Note that the taxonomy helps users by describing the subjects; from the point of view of metadata there is really no difference between a simple controlled vocabulary and a taxonomy. The metadata only relates objects to subjects, whereas here we have arranged the subjects in a hierarchy. So a taxonomy describes the subjects being used for classification, but is not itself metadata; it can be used in metadata, however. The diagram below illustrates this.

Figure 2. Using the taxonomy in metadata

In this diagram, the blue lines are the metadata, while the black lines that make up the taxonomy is part of the subject-based classification scheme. The distinction derives from the blue lines being statements about the paper, but the black line between “topic maps” and “knowledge representation” is not a statement about the paper; it’s a statement about “topic maps”. One consequence of this is that if we have another paper about “topic maps” we do not need to repeat that “topic maps” belong under “knowledge representation”.

As we said, the taxonomy provides more information about the concepts, and it does so to help the users. However, while the taxonomy does help the user, a number of important pieces of information about the concepts are not being captured here, such as:

  • The fact that “XML Topic Maps” is synonymous with “XTM”.
  • The difference between “XTM” and “topic maps”. (Many users use these interchangeably, but they do not mean the same thing.)
  • The fact that “topic navigation maps” is synonymous with “topic maps”, but should no longer be used.
  • The relationship between topic maps and subject-based classification and topic maps and the semantic web.
  • The relationship between XTM and XML and HyTM and SGML.
  • The similiarity between HyTM and XTM, and their difference from TMQL and TMCL, as well as the similarity between TMQL and XQuery.

All of these have consequences for the end user, since it means that they must search using precisely the right term, look in precisely the right places to find the terms, and so on. A taxonomy as we defined it here cannot handle these problems, though it should be noted that many systems referred to as taxonomies to some extent can, as they extend the basic model defined here.


