IL Instructions not exposed by C#
Asked Answered
A

4

35

What IL instructions are not exposed by C#?

I'm referring to instructions like sizeof and cpblk - there's no class or command that executes these instructions (sizeof in C# is computed at compile time, not at runtime AFAIK).

Others?

EDIT: The reason I'm asking this (and hopefully this will make my question a little more valid) is because I'm working on a small library which will provide the functionality of these instructions. sizeof and cpblk are already implemented - I wanted to know what others I may have missed before moving on.

EDIT2: Using Eric's answer, I've compiled a list of instructions:

  • Break
  • Jmp
  • Calli
  • Cpobj
  • Ckfinite
  • Prefix[1-7]
  • Prefixref
  • Endfilter
  • Unaligned
  • Tailcall
  • Cpblk
  • Initblk

There were a number of other instructions which were not included in the list, which I'm separating because they're basically shortcuts for other instructions (compressed to save time and space):

  • Ldarg[0-3]
  • Ldloc[0-3]
  • Stloc[0-3]
  • Ldc_[I4_[M1/S/0-8]/I8/R4/R8]
  • Ldind_[I1/U1/I2/U2/I4/U4/I8/R4/R8]
  • Stind_[I1/I2/I4/I8/R4/R8]
  • Conv_[I1/I2/I4/I8/R4/R8/U4/U8/U2/U1]
  • Conv_Ovf_[I1/I2/I4/I8/U1/U2/U4/U8]
  • Conv_Ovf_[I1/I2/I4/I8/U1/U2/U4/U8]_Un
  • Ldelem_[I1/I2/I4/I8/U1/U2/U4/R4/R8]
  • Stelem_[I1/I2/I4/I8/R4/R8]
Anthropometry answered 18/8, 2011 at 16:15 Comment(14)
@Hans I don't see how this isn't on-topicHemline
I made an edit - is this better? AFAIK, it's at least not subjective... Would it be better if I posted on the MSDN forums?Anthropometry
@Hans I don't see why it isn't on-topic either. Its not asking for an opinion, its a simple question about a programming language with a specific answer, and I wouldn't know where else the OP could ask it on stackexchange.Equalizer
This is actually even a fairly interesting question. Not subjective, off topic, offensive, localized, or anything else in anyway that I can see.Hemline
Well, the edit makes it better, there's at least a hint of practical. But answerable is the problem, it requires somebody from Microsoft that has access to the compiler source code. He will not want to talk about undocumented C# keywords, for one.Octagonal
@Hans Well that's about like saying that it's impossible to tell what x86 instructions aren't exposed from gcc without looking through the source. It takes quite a bit of effort, but it's likely someone has tried to reverse-engineer and document this beforeHemline
When did we have to demonstrate practicality? I've seen endless questions with people 'writing my own http server' - which I'm sure is just for fun. As for answerable, I wouldn't know if someone here couldn't answer that. Aren't C# and MSIL ECMA standards too?Equalizer
I don't see how this is a question about undocumented C# keywords - a quick Google can find me those. I'm pretty sure it can be answered, or else I wouldn't have found out about sizeof and cpblk.Anthropometry
If it's asking for a list of things then the question should be changed to a community wiki right?Pigheaded
Please show or provide a link to the code.Coursing
@Dour What exactly are you asking for? The code for the instructions I've already exposed?Anthropometry
You said "I'm working on a small library...". If this is a commercial product I can understand keeping the source private, but if not this would be a valuable project for other people.Coursing
I'll release it on Codeplex when I get the chance :DAnthropometry
Perfectly valid question. I'm trying to figure out if Cpblk can be construed from a C# program and I find both the question and answer most relevant to high performance .NET development.Bernardo
P
36

I'm referring to instructions like sizeof and cpblk - there's no class or command that executes these instructions (sizeof in C# is computed at compile time, not at runtime AFAIK).

This is incorrect. sizeof(int) will be treated as the compile-time constant 4, of course, but there are plenty of situations (all in unsafe code) where the compiler relies upon the runtime to determine what the memory size of a structure is. Consider, for example, a structure that contains two pointers. It would be of size 8 on a 32 bit machine but 16 on a 64 bit machine. In those circumstances the compiler will generate the sizeof opcode.

Others?

I don't have a list of all the opcodes we don't produce -- I have never had a need to build such a list. However, off the top of my head I can tell you that there is no way to generate an "call indirect" (calli) instruction in C#; we are occasionally asked for that feature as it would improve performance of certain interop scenarios.

UPDATE: I just grepped the source code to produce a list of opcodes we definitely do produce. They are:

add
add_ovf
add_ovf_un
and
arglist
beq
beq_s
bge
bge_s
bge_un
bge_un_s
bgt
bgt_s
bgt_un
bgt_un_s
ble
ble_s
ble_un
ble_un_s
blt
blt_s
blt_un
blt_un_s
bne_un
bne_un_s
box
br
br_s
brfalse
brfalse_s
brtrue
brtrue_s
call
callvirt
castclass
ceq
cgt
cgt_un
clt
clt_un
constrained
conv_i
conv_ovf_i
conv_ovf_i_un
conv_ovf_u
conv_ovf_u_un
conv_r
conv_r_un
conv_u
div
div_un
dup
endfinally
initobj
isinst
ldarg
ldarg_
ldarg_s
ldarga
ldarga_s
ldc_i
ldc_r
ldelem
ldelem_i
ldelem_r
ldelem_ref
ldelem_u
ldelema
ldfld
ldflda
ldftn
ldind_i
ldind_r
ldind_ref
ldind_u
ldlen
ldloc
ldloc_
ldloc_s
ldloca
ldloca_s
ldnull
ldobj
ldsfld
ldsflda
ldstr
ldtoken
ldvirtftn
leave
leave_s
localloc
mkrefany
mul
mul_ovf
mul_ovf_un
neg
newarr
newobj
nop
not
or
pop
readonly
refanytype
refanyval
rem
rem_un
ret
rethrow
shl
shr
shr_un
sizeof
starg
starg_s
stelem
stelem_i
stelem_r
stelem_ref
stfld
stind_i
stind_r
stind_ref
stloc
stloc_s
stobj
stsfld
sub
sub_ovf
sub_ovf_un
switch
throw
unbox_any
volatile
xor

I'm not going to guarantee that that's all of them, but that is certainly most of them. You can then compare that against a list of all the opcodes and see what is missing.

Pantelegraph answered 18/8, 2011 at 16:50 Comment(3)
I might add that it's impossible to do the regular sizeof on a generic type, in which case using the instruction directly (and I'm doing this via DynamicMethods) can work around this. Calli is a new one to me - might I ask how it would improve performance?Anthropometry
@YellPika: Suppose you have a raw pointer to a vtable of a C++ object and you wish to invoke one of the methods in that vtable whose signature is known to you. There is no way to do so directly in C# today, even in unsafe code, because there is no way to express the notion of "this is a pointer to a memory location that contains code for a method that returns void and takes an int passed on the stack". What you have to do is build a delegate object with that signature and then give the pointer to the delegate constructor, and then invoke the delegate. That is very slow and memory-heavy.Pantelegraph
Wow, this is great - I wish I could upvote you twice :D. Thanks for the great response.Anthropometry
C
13

Based on Eric's answer here are some I have spotted. Where I can see a reason I have indicated it, if not I freely speculate. Feel free to indicate if those speculations are wrong.

Break

Signals the Common Language Infrastructure (CLI) to inform the debugger that a break point has been tripped.

You would do this by calling System.Diagnostics.Debugger.Break(), this appears not to use that instruction directly but instead uses a BreakInternal() method baked into the CLR.

Cpblk and Cpobj

Copies a specified number bytes from a source address to a destination address. Copies the value type located at the address of an object (type &, * or native int) to the address of the destination object (type &, * or native int).

I presume these were added for C++/CLI (previously Managed C++), but that is purely speculation on my part. They may also be present in certain system calls but not generated normally by the compiler and provide some scope for unsafe fun and games.

Endfilter

Transfers control from the filter clause of an exception back to the Common Language Infrastructure (CLI) exception handler.

C# doesn't support exception filtering. The VB compiler doubtless makes use of this though.

Initblk

Initializes a specified block of memory at a specific address to a given size and initial value.

I am going to speculate again that this is potentially useful in unsafe code and C++/CLI

Jmp

Exits current method and jumps to specified method.

I will speculate that this sort of trampolining may be useful to those wanting to avoid tail calls. Perhaps the DLR makes use of it?

Tailcall

Performs a postfixed method call instruction such that the current method's stack frame is removed before the actual call instruction is executed.

Discussed in depth elsewhere, currently the c# compiler does not emit this opcode

Unaligned

Indicates that an address currently atop the evaluation stack might not be aligned to the natural size of the immediately following ldind, stind, ldfld, stfld, ldobj, stobj, initblk, or cpblk instruction.

C# (and the CLR) makes quite a few guarantees concerning the aligned nature of much of its resulting code and data. It is not surprising that this is not emitted, but I can see why it would be included.

Unbox

Converts the boxed representation of a value type to its unboxed form.

The c# compiler prefers to use the Unbox_Any instruction exclusively for this purpose. I presume, based on the addition of this to the instruction set in the 2.0 release it makes generics either feasibale, or much simpler. At that point using it throughout the code for everything, generics or otherwise, was either safer, simpler or quicker (or some combination of all).


Footnote:

Prefix1, Prefix2, Prefix3, Prefix4, Prefix5, Prefix6, Prefix7, Prefixref

Infrastructure. This is a reserved instruction.

These are not instructions as such, Some IL instructions are longer than others. These variable length ones should start with prefixes which are never valid on their own to make parsing clear. These prefix opcodes are reserved for that so they are not used elsewhere. Doubtless someone implementing a switch statement based parser for an IL sequence would appreciate these so they could trap those and maintain state.

Ceria answered 18/8, 2011 at 18:47 Comment(5)
Prefix* aren't actual CLR instructions; they are, as their name suggests, prefixes. The complete instruction is then made of two bytes - the prefix and the following byte. For example, prefix1 is FE, and stloc is FE 0E. Only prefix1 is used in the current version, but the other prefixes exist to make room for (a lot of) future opcodes.Tabb
@Tabb thanks, I wondered if it was for variable length il instructions. Any reference for that I can link to in the answer?Ceria
I don't have any references. If you look at Opcodes' two bytes you see that. FF for first byte means that byte is not output (single-byte instructions). Most instructions are of the FF XX form, two-byte instructions are of the FE XX form, and Prefix1 is FF FE. Prefix1..n are not defined as opcodes in the C# spec, but their byte values are the ones defined as a first byte in a multibyte opcode.Tabb
@config cool, if you want to add that info yourself I'll community wiki the answer.Ceria
I'm not sure they actually belong in the answer - the answer is about unused IL instructions and these aren't actually IL instructions (even though they are fields of the OpCodes class).Tabb
L
4

One interesting example is tail.call (OpCodes.Tailcall) that would make possible Tail Call Optimization for recursion.

Lothario answered 18/8, 2011 at 17:0 Comment(1)
see #7096657Zwolle
H
1

The .override IL directive (I don't know if it's the correct term, but it's certainly not an instruction) is generated by the C# compiler, but only in the special case of explicit interface implementation.

It would be interesting to be able to use it more freely, like in VB.NET, where implementing members can be aliased or even have a different access modifier than the interface member.

Heavierthanair answered 18/8, 2011 at 19:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.