Responding directly to your question, you start by checking whether a given token is numeric, alphanumeric or alphabetic (you can use regex here) and then you classify it as such. In general, the approach you're looking for is called generalization hierarchy of tokens or hierarchical feature selection (Google it). The basic idea is that you could treat each token as a separate element, but that's not the best approach since you can't cover them all [*]. Instead, you use common features among tokens (for example, 2000
and 1981
are distinct tokens but they share a common feature of being 4 digit numbers and possibly years). Then you have a class for four digit numbers, another for alphanumeric, and so on. This process of generalization helps you to simplify your classification approach.
Frequently, if you start with a string of tokens, you need to preprocess them (for example, remove punctuation or special symbols, remove words that are not relevant, stemming, etc). But maybe you can use some symbols (say, punctuation between cities and countries - e.g. Melbourne, Australia
), so you assign that set of useful punctuation symbols to other symbol (#
) and use that as a context (so the next time you find an unknown word next to a comma next to a known country, you can use that knowledge to assume that the unknown word is a city.
Anyway, that's the general idea behind classification using an ontology (based on a taxonomy of terms). You may also want to read about part-of-speech tagging.
By the way, if you only want to have 3 categories (numeric, alphanumeric, alphabetic), a viable option would be to use edit distance (what is more likely, that UA4E30 belongs to the alphanumeric or numeric category, considering that it doesn't correspond to the traditional format of prefixed numeric strings?). So, you assume a cost for each operation (insertion, deletion, subtitution) that transforms your unknown token into a known one.
Finally, although you said you're using Protege (which I haven't used) to build your ontology, you may want to look at WordNet.
[*] There are probabilistic approaches that help you to determine a probability for an unknown token, so the probability of such event is not zero. Usually, this is done in the context of Hidden Markov Models. Actually, this could be useful to improve the suggestion given by etov.