iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🖥

Reading the JVM | JVM Structure Part 6: Instruction Set Summary and Types

に公開

This is a continuation of the previous post. You can find the previous post here:

https://zenn.dev/peyang/articles/reading-jvm-chapter-02-9-10

This series is designed as a guide for deciphering the JVM specification.
Since the JVM specification is very long and contains many difficult topics, I will summarize the key points for each section.
Additionally, by understanding the internal structure and operating principles of the JVM, we aim to deeply understand the mechanisms of Java's performance, security, and memory management.

You can find the series here:

https://zenn.dev/peyang/articles/reading-jvm-chapter-00

Chapter 2: The Structure of the Java Virtual Machine

Chapter 2 of the JVM specification is "The Structure of the Java Virtual Machine."
That being said, this chapter is particularly long and complex among the seven chapters of the JVM specification, so I will explain it in eight parts.

Here, we will cover the content from Chapter 2.11.1 to Chapter 2.11.4.

2.11 Instruction Set Summary (› 2.11 Instruction Set Summary)

A JVM instruction consists of a 1-byte opcode that specifies the operation to be performed, and additional operands (operands) as needed.

instruction ::= opcode [operand]*

Here, the internal structure of the JVM interpreter can be simplified as follows:

do {
  // Get the position of the next instruction
  currentPc = fetchNextPC();
  if (!currentPc) {
    break;  // Exit the loop if there is no next instruction
  }
  opcode = readNextOpcode(currentPc);
  switch (opcode) {
    // Cases to process instructions
  }
} while (true);  // Infinite loop

Storage of Operands and Bytecode

The number and size of operands vary per instruction.
When the operand size exceeds 1 byte, it is stored in big-endian (high-order byte first).
For example, a 16-bit operand is stored as two bytes (byte1, byte2), and its value is calculated as byte1 << 8 | byte2.

For instance, an instruction to store an unsigned 16-bit integer value into a local variable is stored as two unsigned bytes (byte1, byte2), and its value is calculated as byte1 << 8 | byte2.

The bytecode instruction sequence is basically packed in 1-byte increments.
Furthermore, each instruction (opcode and its operands) is arranged contiguously.
There is no need to align the next instruction using any kind of padding.

In other words, an instruction sequence like the following is also valid bytecode:

0x10 0x05 0x3c 0x00
↑ bipush
     ↑ Local variable 5
          ↑ istore_1
               ↑ nop

2.11.1 Types and the Java Virtual Machine (› 2.11.1 Types and the Java Virtual Machine)

The Deep Relationship Between Instructions and Types

Most instructions supported by the JVM are specialized to the type of data they operate on (even if they might not appear so internally). For example, both the iload and fload instructions load values from a local variable, but the former loads an int value while the latter loads a float value. Even if their internal implementations were identical, the opcodes would be different.

Instructions that are specialized in this way often include type information in their names, as shown in the table below.

Prefix Description
b Represents an 8-bit integer (byte) type.
c Represents a 16-bit character (char) type.
s Represents a 16-bit integer (short) type.
i Represents a 32-bit integer (int) type.
l Represents a 64-bit integer (long) type.
f Represents a 32-bit floating-point (float) type.
d Represents a 64-bit floating-point (double) type.
a Represents an object reference (Object) type.

Details will be explained in Chapter 6 (please stay tuned).

Opcode Limits and the Performance Dilemma

As mentioned earlier, JVM opcodes are represented by 1 byte. Having only 1 byte means the total number of opcode types cannot exceed 2^8 = 256.

Furthermore, as previously stated, taking even a single addition instruction as an example, different opcodes are required for each type, such as add (iadd, ladd, fadd, dadd).

Therefore, the JVM adopted a strategy of providing some orthogonal instructions to complement non-orthogonal ones.

Orthogonal instructions, like dup or swap, perform the same operation regardless of the type. Since they behave identically for any type, the same opcode can be used across all types, keeping the number of opcodes down.

On the other hand, non-orthogonal instructions behave differently depending on the type. For example, iadd adds int values, while ladd adds long values. By using these non-orthogonal instructions, the VM's operation has been optimized.

By combining these non-orthogonal and orthogonal instructions, the JVM achieves efficient execution while keeping the number of opcodes manageable. Furthermore, when necessary, it provides instructions to convert between types (such as d2i or i2d) to cover instructions comprehensively.

Example:

iload_0  ; Push int type local variable 0 onto the stack
i2d      ; Convert the int value on the stack to double type
dstore_1 ; Store the double value on the stack into local variable 1

dload_1 ; Push double type local variable 1 onto the stack
dup     ; Duplicate the value on the stack
dadd    ; Add the two double values on the stack
...

In this way, the dup instruction is an orthogonal instruction that behaves the same for any type, while the i2d instruction is a non-orthogonal instruction that converts an int value to a double value. By combining these, values of different types can be manipulated efficiently.

Type and Instruction Reuse

There is a fact that must be mentioned here: most instructions do not have variants for the integer types byte, char, and short. In fact, instructions for the boolean type do not exist at all.

Indeed, while there is an instruction to push an int type from a local variable to the stack (iload), there is no instruction to push a byte type from a local variable to the stack (like bload).

There is a somewhat deep reason for this. For example, every time a byte or short value is pushed onto the operand stack, it is implicitly converted to an int type via sign extension. Similarly, every time a boolean or char value is pushed, it is implicitly converted to an int type via zero extension.

In this way, the JVM processes byte, char, short, and boolean values by converting them to int types, so dedicated instructions for these types are not necessary. Consequently, values of these types can be correctly manipulated using instructions that operate on int types.

Types and Categories

Types have a concept called "categories." This indicates information regarding the size of the type and its layout in memory. In the JVM, types are classified into the following two categories:

  1. Category 1: int, float, byte, char, short, boolean, reference, returnAddress types
    • These types are represented in 32 bits (4 bytes).
    • For example, an int value occupies 4 bytes of memory.
  2. Category 2: long, double types
    • These types are represented in 64 bits (8 bytes).
    • For example, a long value occupies 8 bytes of memory.

2.11.2 Load and Store Instructions (› 2.11.2 Load and Store Instructions)

These instructions transfer values between the local variables and the operand stack of a Java Virtual Machine frame.

Loading from Local Variables

Examples of instructions that load values from local variables include:

Instruction Description
iload Loads an int value from a local variable and pushes it onto the operand stack.
fload Performs the same operation for the float type.
dload Performs the same operation for the double type.
lload Performs the same operation for the long type.
aload Performs the same operation for an object reference type.

Furthermore, by adding suffixes _0, _1, _2, or _3 to each, you can directly specify the index of the local variable.
For example, iload_0 loads an int value from local variable 0.
This allows the operand for specifying the local variable index to be omitted, reducing the instruction size.

Storing to the Local Variables

Instructions that store values into local variables include:

Instruction Description
istore Loads an int value from the operand stack and stores it in a local variable.
fstore Performs the same operation for the float type.
dstore Performs the same operation for the double type.
lstore Performs the same operation for the long type.
astore Performs the same operation for an object reference type.

Similarly, you can directly specify the local variable index by adding the suffixes _0, _1, _2, or _3 to each.

Extending Local Variable Indices

Instructions that handle local variables (the ones mentioned above plus iinc and ret) specify the local variable index in the range of 0 to 255.
This is because the operand is represented by 1 byte.

However, there may be cases where a local variable index exceeds 255.
In such cases, the wide instruction is used to represent the operand with 2 bytes.
By doing so, the local variable index can be specified in the range of 0 to 65535.

Example:

iload 9999  // This is an invalid instruction.
wide iload 9999  // Therefore, use the wide instruction to specify the index with 2 bytes.

Pushing Constants onto the Stack

Instructions that push constants from "nowhere" onto the operand stack include:

Instruction Description
bipush Pushes an 8-bit signed integer onto the operand stack.
sipush Handles a 16-bit signed integer.
ldc Loads a value from the constant pool.
ldc_w Loads a value from the constant pool using a 16-bit index.
ldc2_w Loads a value from the constant pool using a 64-bit index.
iconst_<i> Pushes an int constant onto the operand stack.
fconst_<f> Pushes a float constant onto the operand stack.
dconst_<d> Pushes a double constant onto the operand stack.
lconst_<l> Pushes a long constant onto the operand stack.

In these cases...

  • <i> is one of m1, 0, 1, 2, or 3, written as iconst_m1, iconst_0, etc.
    Note that iconst_m1 represents -1.
  • <f> is one of 0, 1, or 2.
  • <d> is one of 0 or 1.
  • <l> is one of 0 or 1.

2.11.3 Arithmetic Instructions (› 2.11.3 Arithmetic Instructions)

Arithmetic instructions (typically) operate on the two values at the top of the operand stack and push the result back onto the operand stack.

These are divided into instructions that handle integer values and those that handle floating-point values. In particular, instructions using floating-point values do not strictly adhere to the IEEE 754 standard (Reference: Reading JVM | JVM Structure Part 4 - On Object Representation and Floating-Point Arithmetic).

Instructions are categorized as follows:

Instruction int Type long Type float Type double Type
Addition iadd ladd fadd dadd
Subtraction isub lsub fsub dsub
Multiplication imul lmul fmul dmul
Division idiv ldiv fdiv ddiv
Remainder irem lrem frem drem
Negation ineg lneg fneg dneg
Bitwise Shift ishl, ishr, iushr lshl, lshr, lushr
Bitwise AND iand land
Bitwise OR ior lor
Bitwise XOR ixor lxor
Increment iinc
Comparison fcmpl, fcmpg dcmpl, dcmpg

2.11.4 Type Conversion Instructions (› 2.11.4 Type Conversion Instructions)

The JVM provides instructions for converting between different types. These can be used when a programmer explicitly performs type conversion in code, or to complement the non-orthogonal parts of the JVM instruction set.

Widening Numeric Conversions

The JVM provides instructions to widen values as follows:

Instruction Description
i2l Converts an int value to a long value.
i2f Converts an int value to a float value.
i2d Converts an int value to a double value.
l2i Converts a long value to an int value.
l2f Converts a long value to a float value.
l2d Converts a long value to a double value.
f2d Converts a float value to a double value.

This x2y format means converting a value of type x to type y (incidentally, 2 stands for "to").

In most widening conversions, the value is converted such that its absolute value does not change. In fact, in intlong or intdouble conversions, the numerical value is simply copied. Furthermore, in floatdouble conversions, the floating-point value is also simply copied.

Regarding Floating-Point Value Conversions

Regarding Integer Value Conversions

The intlong conversion copies the two's complement representation of the int value into the long value and then zero-fills on the left. Furthermore, when converting a char, it similarly copies the two's complement representation of the char value into the int value and zero-fills on the left.

Narrowing Numeric Conversions

The JVM provides instructions to narrow values as follows:

Instruction Description
i2b Converts an int value to a byte value.
i2c Converts an int value to a char value.
i2s Converts an int value to a short value.
l2i Converts a long value to an int value.
f2i Converts a float value to an int value.
d2i Converts a double value to an int value.
d2l Converts a double value to a long value.
d2f Converts a double value to a float value.

Regarding Floating-Point Value Conversions

When converting floating-point values to integer types (int, long), the value is rounded as follows:

  1. If the floating-point value is NaN (Not-a-Number), it is converted to 0.
  2. If the floating-point value is not ±Infinity and is representable in the target type, it is rounded to that value (according to the rounding policy).
  3. If the floating-point value is ±Infinity, it is converted to the maximum or minimum value of the respective type depending on the sign.

Summary

The JVM instruction set is composed of a combination of opcodes and operands. Instructions are specialized for specific types, and by combining orthogonal and non-orthogonal instructions, efficient execution is achieved.

Types are classified into categories, and instructions behave differently for each type. Furthermore, there are instructions for loading and storing data between local variables and the operand stack, and instructions for pushing constants onto the operand stack. Arithmetic instructions are divided into those for integer values and floating-point values, and instructions for type conversion are also provided.

In the next part, we will continue the explanation of the JVM instruction set, focusing on instructions for object creation and control flow.

Happy bytecode life!

https://zenn.dev/peyang/articles/reading-jvm-chapter-02-11-5-10

  • Lindholm, T., Yellin, F., Bracha, G., & Smith, W. M. D. (2025). The Java® Virtual Machine Specification: Java SE 24 Edition.
  • Lindholm, T., & Yellin, F. (1999). The Java™ Virtual Machine Specification (2nd ed.). Addison-Wesley. ISBN 978-0-201-43294-7
  • Otavio, S. (2024). Mastering the Java Virtual Machine. Packet Publishing. ISBN 978-1-835-46796-1
GitHubで編集を提案

Discussion