Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009 http://cogcomp.cs.illinois.edu/page/publication_view/199)
From working with the 2003 CoNLL data I know that
B stands for 'beginning' (signifies beginning of an NE)
I stands for 'inside' (signifies that the word is inside an NE)
O stands for 'outside' (signifies that the word is just a regular word outside of an NE)
While I've been told that the words in BILOU stand for
B - 'beginning'
I - 'inside'
L - 'last'
O - 'outside'
U - 'unit'
I've also seen people reference another tag
E - 'end', use it concurrently with the 'last' tag
S - 'singleton', use it concurrently with the 'unit' tag
I'm pretty new to the NER literature, but I've been unable to find something clearly explaining these tags. My questions in particular relates to what the difference between 'last' and 'end' tags are, and what 'unit' tag stands for.