iTranslated by AI
Reading the JVM | JVM Structure Part 6: Instruction Set Summary and Types
This is a continuation of the previous post. You can find the previous post here:
This series is designed as a guide for deciphering the JVM specification.
Since the JVM specification is very long and contains many difficult topics, I will summarize the key points for each section.
Additionally, by understanding the internal structure and operating principles of the JVM, we aim to deeply understand the mechanisms of Java's performance, security, and memory management.
You can find the series here:
Chapter 2: The Structure of the Java Virtual Machine
Chapter 2 of the JVM specification is "The Structure of the Java Virtual Machine."
That being said, this chapter is particularly long and complex among the seven chapters of the JVM specification, so I will explain it in eight parts.
Here, we will cover the content from Chapter 2.11.1 to Chapter 2.11.4.
2.11 Instruction Set Summary (› 2.11 Instruction Set Summary)
A JVM instruction consists of a 1-byte opcode that specifies the operation to be performed, and additional operands (operands) as needed.
instruction ::= opcode [operand]*
Here, the internal structure of the JVM interpreter can be simplified as follows:
do {
// Get the position of the next instruction
currentPc = fetchNextPC();
if (!currentPc) {
break; // Exit the loop if there is no next instruction
}
opcode = readNextOpcode(currentPc);
switch (opcode) {
// Cases to process instructions
}
} while (true); // Infinite loop
Storage of Operands and Bytecode
The number and size of operands vary per instruction.
When the operand size exceeds 1 byte, it is stored in big-endian (high-order byte first).
For example, a 16-bit operand is stored as two bytes (byte1, byte2), and its value is calculated as byte1 << 8 | byte2.
For instance, an instruction to store an unsigned 16-bit integer value into a local variable is stored as two unsigned bytes (byte1, byte2), and its value is calculated as byte1 << 8 | byte2.
The bytecode instruction sequence is basically packed in 1-byte increments.
Furthermore, each instruction (opcode and its operands) is arranged contiguously.
There is no need to align the next instruction using any kind of padding.
In other words, an instruction sequence like the following is also valid bytecode:
0x10 0x05 0x3c 0x00
↑ bipush
↑ Local variable 5
↑ istore_1
↑ nop
2.11.1 Types and the Java Virtual Machine (› 2.11.1 Types and the Java Virtual Machine)
The Deep Relationship Between Instructions and Types
Most instructions supported by the JVM are specialized to the type of data they operate on (even if they might not appear so internally). For example, both the iload and fload instructions load values from a local variable, but the former loads an int value while the latter loads a float value. Even if their internal implementations were identical, the opcodes would be different.
Instructions that are specialized in this way often include type information in their names, as shown in the table below.
| Prefix | Description |
|---|---|
b |
Represents an 8-bit integer (byte) type. |
c |
Represents a 16-bit character (char) type. |
s |
Represents a 16-bit integer (short) type. |
i |
Represents a 32-bit integer (int) type. |
l |
Represents a 64-bit integer (long) type. |
f |
Represents a 32-bit floating-point (float) type. |
d |
Represents a 64-bit floating-point (double) type. |
a |
Represents an object reference (Object) type. |
Details will be explained in Chapter 6 (please stay tuned).
Opcode Limits and the Performance Dilemma
As mentioned earlier, JVM opcodes are represented by 1 byte. Having only 1 byte means the total number of opcode types cannot exceed 2^8 = 256.
Furthermore, as previously stated, taking even a single addition instruction as an example, different opcodes are required for each type, such as add (iadd, ladd, fadd, dadd).
Therefore, the JVM adopted a strategy of providing some orthogonal instructions to complement non-orthogonal ones.
Orthogonal instructions, like dup or swap, perform the same operation regardless of the type. Since they behave identically for any type, the same opcode can be used across all types, keeping the number of opcodes down.
On the other hand, non-orthogonal instructions behave differently depending on the type. For example, iadd adds int values, while ladd adds long values. By using these non-orthogonal instructions, the VM's operation has been optimized.
By combining these non-orthogonal and orthogonal instructions, the JVM achieves efficient execution while keeping the number of opcodes manageable. Furthermore, when necessary, it provides instructions to convert between types (such as d2i or i2d) to cover instructions comprehensively.
Example:
iload_0 ; Push int type local variable 0 onto the stack
i2d ; Convert the int value on the stack to double type
dstore_1 ; Store the double value on the stack into local variable 1
dload_1 ; Push double type local variable 1 onto the stack
dup ; Duplicate the value on the stack
dadd ; Add the two double values on the stack
...
In this way, the dup instruction is an orthogonal instruction that behaves the same for any type, while the i2d instruction is a non-orthogonal instruction that converts an int value to a double value. By combining these, values of different types can be manipulated efficiently.
Type and Instruction Reuse
There is a fact that must be mentioned here: most instructions do not have variants for the integer types byte, char, and short. In fact, instructions for the boolean type do not exist at all.
Indeed, while there is an instruction to push an int type from a local variable to the stack (iload), there is no instruction to push a byte type from a local variable to the stack (like bload).
There is a somewhat deep reason for this. For example, every time a byte or short value is pushed onto the operand stack, it is implicitly converted to an int type via sign extension. Similarly, every time a boolean or char value is pushed, it is implicitly converted to an int type via zero extension.
In this way, the JVM processes byte, char, short, and boolean values by converting them to int types, so dedicated instructions for these types are not necessary. Consequently, values of these types can be correctly manipulated using instructions that operate on int types.
Types and Categories
Types have a concept called "categories." This indicates information regarding the size of the type and its layout in memory. In the JVM, types are classified into the following two categories:
-
Category 1:
int,float,byte,char,short,boolean,reference,returnAddresstypes- These types are represented in 32 bits (4 bytes).
- For example, an
intvalue occupies 4 bytes of memory.
-
Category 2:
long,doubletypes- These types are represented in 64 bits (8 bytes).
- For example, a
longvalue occupies 8 bytes of memory.
2.11.2 Load and Store Instructions (› 2.11.2 Load and Store Instructions)
These instructions transfer values between the local variables and the operand stack of a Java Virtual Machine frame.
Loading from Local Variables
Examples of instructions that load values from local variables include:
| Instruction | Description |
|---|---|
iload |
Loads an int value from a local variable and pushes it onto the operand stack. |
fload |
Performs the same operation for the float type. |
dload |
Performs the same operation for the double type. |
lload |
Performs the same operation for the long type. |
aload |
Performs the same operation for an object reference type. |
Furthermore, by adding suffixes _0, _1, _2, or _3 to each, you can directly specify the index of the local variable.
For example, iload_0 loads an int value from local variable 0.
This allows the operand for specifying the local variable index to be omitted, reducing the instruction size.
Storing to the Local Variables
Instructions that store values into local variables include:
| Instruction | Description |
|---|---|
istore |
Loads an int value from the operand stack and stores it in a local variable. |
fstore |
Performs the same operation for the float type. |
dstore |
Performs the same operation for the double type. |
lstore |
Performs the same operation for the long type. |
astore |
Performs the same operation for an object reference type. |
Similarly, you can directly specify the local variable index by adding the suffixes _0, _1, _2, or _3 to each.
Extending Local Variable Indices
Instructions that handle local variables (the ones mentioned above plus iinc and ret) specify the local variable index in the range of 0 to 255.
This is because the operand is represented by 1 byte.
However, there may be cases where a local variable index exceeds 255.
In such cases, the wide instruction is used to represent the operand with 2 bytes.
By doing so, the local variable index can be specified in the range of 0 to 65535.
Example:
iload 9999 // This is an invalid instruction.
wide iload 9999 // Therefore, use the wide instruction to specify the index with 2 bytes.
Pushing Constants onto the Stack
Instructions that push constants from "nowhere" onto the operand stack include:
| Instruction | Description |
|---|---|
bipush |
Pushes an 8-bit signed integer onto the operand stack. |
sipush |
Handles a 16-bit signed integer. |
ldc |
Loads a value from the constant pool. |
ldc_w |
Loads a value from the constant pool using a 16-bit index. |
ldc2_w |
Loads a value from the constant pool using a 64-bit index. |
iconst_<i> |
Pushes an int constant onto the operand stack. |
fconst_<f> |
Pushes a float constant onto the operand stack. |
dconst_<d> |
Pushes a double constant onto the operand stack. |
lconst_<l> |
Pushes a long constant onto the operand stack. |
In these cases...
-
<i>is one ofm1,0,1,2, or3, written asiconst_m1,iconst_0, etc.
Note thaticonst_m1represents-1. -
<f>is one of0,1, or2. -
<d>is one of0or1. -
<l>is one of0or1.
2.11.3 Arithmetic Instructions (› 2.11.3 Arithmetic Instructions)
Arithmetic instructions (typically) operate on the two values at the top of the operand stack and push the result back onto the operand stack.
These are divided into instructions that handle integer values and those that handle floating-point values. In particular, instructions using floating-point values do not strictly adhere to the IEEE 754 standard (Reference: Reading JVM | JVM Structure Part 4 - On Object Representation and Floating-Point Arithmetic).
Instructions are categorized as follows:
| Instruction | int Type | long Type | float Type | double Type |
|---|---|---|---|---|
| Addition | iadd |
ladd |
fadd |
dadd |
| Subtraction | isub |
lsub |
fsub |
dsub |
| Multiplication | imul |
lmul |
fmul |
dmul |
| Division | idiv |
ldiv |
fdiv |
ddiv |
| Remainder | irem |
lrem |
frem |
drem |
| Negation | ineg |
lneg |
fneg |
dneg |
| Bitwise Shift |
ishl, ishr, iushr
|
lshl, lshr, lushr
|
||
| Bitwise AND | iand |
land |
||
| Bitwise OR | ior |
lor |
||
| Bitwise XOR | ixor |
lxor |
||
| Increment | iinc |
|||
| Comparison |
fcmpl, fcmpg
|
dcmpl, dcmpg
|
2.11.4 Type Conversion Instructions (› 2.11.4 Type Conversion Instructions)
The JVM provides instructions for converting between different types. These can be used when a programmer explicitly performs type conversion in code, or to complement the non-orthogonal parts of the JVM instruction set.
Widening Numeric Conversions
The JVM provides instructions to widen values as follows:
| Instruction | Description |
|---|---|
i2l |
Converts an int value to a long value. |
i2f |
Converts an int value to a float value. |
i2d |
Converts an int value to a double value. |
l2i |
Converts a long value to an int value. |
l2f |
Converts a long value to a float value. |
l2d |
Converts a long value to a double value. |
f2d |
Converts a float value to a double value. |
This x2y format means converting a value of type x to type y (incidentally, 2 stands for "to").
In most widening conversions, the value is converted such that its absolute value does not change. In fact, in int → long or int → double conversions, the numerical value is simply copied. Furthermore, in float → double conversions, the floating-point value is also simply copied.
Regarding Floating-Point Value Conversions
Regarding Integer Value Conversions
The int → long conversion copies the two's complement representation of the int value into the long value and then zero-fills on the left. Furthermore, when converting a char, it similarly copies the two's complement representation of the char value into the int value and zero-fills on the left.
Narrowing Numeric Conversions
The JVM provides instructions to narrow values as follows:
| Instruction | Description |
|---|---|
i2b |
Converts an int value to a byte value. |
i2c |
Converts an int value to a char value. |
i2s |
Converts an int value to a short value. |
l2i |
Converts a long value to an int value. |
f2i |
Converts a float value to an int value. |
d2i |
Converts a double value to an int value. |
d2l |
Converts a double value to a long value. |
d2f |
Converts a double value to a float value. |
Regarding Floating-Point Value Conversions
When converting floating-point values to integer types (int, long), the value is rounded as follows:
- If the floating-point value is
NaN(Not-a-Number), it is converted to0. - If the floating-point value is not
±Infinityand is representable in the target type, it is rounded to that value (according to the rounding policy). - If the floating-point value is
±Infinity, it is converted to the maximum or minimum value of the respective type depending on the sign.
Summary
The JVM instruction set is composed of a combination of opcodes and operands. Instructions are specialized for specific types, and by combining orthogonal and non-orthogonal instructions, efficient execution is achieved.
Types are classified into categories, and instructions behave differently for each type. Furthermore, there are instructions for loading and storing data between local variables and the operand stack, and instructions for pushing constants onto the operand stack. Arithmetic instructions are divided into those for integer values and floating-point values, and instructions for type conversion are also provided.
In the next part, we will continue the explanation of the JVM instruction set, focusing on instructions for object creation and control flow.
Happy bytecode life!
Next Part Link
References & Links
- Lindholm, T., Yellin, F., Bracha, G., & Smith, W. M. D. (2025). The Java® Virtual Machine Specification: Java SE 24 Edition.
- Lindholm, T., & Yellin, F. (1999). The Java™ Virtual Machine Specification (2nd ed.). Addison-Wesley. ISBN 978-0-201-43294-7
- Otavio, S. (2024). Mastering the Java Virtual Machine. Packet Publishing. ISBN 978-1-835-46796-1
Discussion