iTranslated by AI
Summary of Data Types and Data Structures
Disclaimer
This article was written to organize the author's personal ideas.
There are no clear sources, and the accuracy of the content has not been verified. Please keep this in mind.
Basic Data Types
Boolean Type
Represents a boolean value in 1 bit. 0 is false, and 1 is true.
It is used as an argument for branch instructions (GOTO instructions) and as an operation result for comparison instructions.
Unlike other data types, it uses what is stored in dedicated registers called flag registers on the CPU.
Numerical Type
Represents numbers that can be expressed in finite-digit binary. Data types are further divided based on the range and meaning they can represent.
Except for cases where boolean types (flag registers) are used, it is used for arguments and results of all operations.
Sign
There are signed and unsigned types.
In the signed case, the sign is represented by whether the most significant digit is 0 (+) or 1 (-). In integer types, negative numbers are represented by two's complement for easier calculation.
Decimals
Decimals are represented by specifying the position of the decimal point (exponent) separately from the binary sequence (radix).
A format called floating-point, which uses the latter several bits of the data to specify the position of the decimal point (exponent part), is commonly used.
Precision
The more bits used for numerical representation, the higher the precision that can be achieved.
Single precision (32 bit) and double precision (64 bit) are primarily used.
The starting bit position of the exponent part in a floating-point number also varies depending on the precision.
Character Type
Stores a numerical value corresponding to a character based on a specific character code. Operations can be used to convert between uppercase and lowercase, or to convert it into binary expressed in a different character code.
Binary Type
Represents arbitrary binary data (such as compressed or encrypted data) as a sequence of 0s and 1s.
Pointer Type (Memory Address Type)
An unsigned integer that points to a memory location in units of chunks. The length of one chunk (number of bytes) differs depending on the hardware design. It specifies the memory location to be read from or written to, or the location of the next instruction to be executed.
Instruction Type
A data type that specifies an operation. Instructions are allocated in a contiguous region of memory and are generally executed in sequence. For branch instructions (GOTO instructions), if the boolean argument is true, the program jumps to a specified memory address and executes the next instruction.
Data Structures
Memory Array Type (Vector Type)
Stores data of the same type contiguously in a fixed-length contiguous region allocated in memory.
It also manages meta-information such as a pointer to the start position, the length per data item, and the number of elements (or a pointer to the end position).
Generally, character and binary types are not used alone but are used within memory array data structures.
Multidimensional Memory Array Type (Tensor Type)
Stores data of the same type contiguously in a fixed-length contiguous region allocated in memory.
Here it is the same as the memory array type, but the multidimensional memory array type manages the contiguously stored data by partitioning it into multiple dimensions.
It represents multiple dimensions by storing the meta-information of memory arrays within another memory array.
It is also used in 3D rendering and neural networks, and there are dedicated processors (GPUs) that are proficient at handling this data type and its operations.
Linked List Type
Represents a sequence of multiple data items through the data itself and a pointer to the next data item.
Unlike memory array types, it does not need to be a fixed length, making it easy to remove or insert elements. Furthermore, elements do not need to be of the same data type.
On the other hand, because it requires repeated access using pointers, its access speed is inferior to that of memory array types.
Struct Type
Stores multiple values of different types in a fixed-length contiguous region of memory.
Each value must be given a name, and the structure must be declared in advance for memory allocation.
Map Type (Hash Map Type)
Links multiple data items with names (keys) to pointers for accessing the data (values).
It is usually called a hash map because strings are typically used as keys and are hashed for comparison.
Linked List with Map Type
A data type that compensates for each other's weaknesses by creating a linked list and a map for each element simultaneously.
It solves the disadvantage of the linked list type, where one must access elements sequentially from the start to find specific data, and the disadvantage of the map type, which is poor at enumerating elements and does not guarantee order.
Search Tree
Like the linked list type, it represents a series of data items through the data and pointers to the next data.
The difference from the linked list type is that data can have pointers to two or more data items, and the order of data is controlled based on specific rules.
This works advantageously when searching for data within a series according to a certain rule.
Function Type
Comprises an instruction type and a pointer type indicating its start position, with a GOTO instruction within the instructions to return to the caller.
It can specify any number of arguments and return values, and these values are exchanged by sharing memory locations between the caller and the callee.
Call Stack
A linked list of pointer types. When another function is called within a function's instructions (i.e., a GOTO instruction to the start position is executed), the source memory address is appended to the end.
At the end of the callee function's processing, a GOTO instruction is executed for the pointer at the end of the call stack, returning to the caller's location (the next instruction).
Discussion