iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🦕

Deciphering the Ancient Language FORTRAN

に公開

In research laboratories involved in computational science, "secret" FORTRAN source codes that have been added to over generations are matured and passed down.

Professor: "Do you know FORTRAN?"
Student: "I know Fortran!"

You might hear conversations like this. Since FORTRAN is an ancient technology that is difficult to handle even with modern state-of-the-art editors, this student is going to suffer.

What is FORTRAN?

Here, I will refer to versions prior to FORTRAN 77 as "FORTRAN" and versions from Fortran 90 onwards as "Fortran". There are several features of FORTRAN that have been deemed obsolete, but the fixed-form source format is particularly characteristic.

For an example of a fixed-form program, please see the appendix of "Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory" by A. Szabo and N.S. Ostlund. This is a program distributed by the Computational Chemistry List (CCL). While it is on the more readable side, it is unreadable unless you are accustomed to the fixed form.

When looking at 30-year-old numerical calculation textbooks in libraries, you can see traces of Pascal or Basic being used, but no one uses them anymore. Even Python 2.x versions mostly don't work. On the other hand, because backward compatibility is strictly maintained, FORTRAN is still active and can be built without problems even with the latest compilers. In fact, Szabo's appendix can be built and executed with gfortran, although it will issue warnings. For how to use gfortran on WSL, please refer to this.

For Windows
Invoke-WebRequest -Uri http://www.ccl.net/cca/software/SOURCES/FORTRAN/szabo/szabo.f -OutFile "szabo.f"
gfortran szabo.f -o szabo.exe
./szabo.exe
For Ubuntu
wget http://www.ccl.net/cca/software/SOURCES/FORTRAN/szabo/szabo.f
gfortran szabo.f -o szabo
./szabo

It is best not to touch it unless there is a specific need to rewrite it, but the unfortunate soul faced with the necessity of rewriting FORTRAN must learn it. However, since FORTRAN is probably not taught in lectures these days, you will have to suffer through researching it when you inherit the legacy source code.

What makes it so painful?

The fact that the program is all uppercase or lacks indentation might make you lose the will to read it, but these can be solved with tools and are not actually major problems. Global variables are also quite dangerous, but that's not limited to FORTRAN. The biggest causes of pain are statement numbers and GOTO statements; the difficulty lies in the fact that this syntax is hard to understand. Let's immediately start rewriting FORTRAN into Fortran, focusing on the handling of statement numbers and GOTO statements.

Coding Rules

These are the coding rules I personally propose, and I will be converting the code to conform to them.

  • Write everything in lowercase in principle. Uppercase is difficult to read.
  • Prohibition of the use of statement numbers (for do statements, if statements, fmt in write statements, etc.).
  • Prohibition of continue statements (always pair do and if with end do and end if).
  • Use implicit none for everything.
  • implicit integer (I-N) is prohibited.
  • implicit double precision (a-h,o-z) is prohibited.
  • Global variables are prohibited.
  • Passing variables via common statements or modules is prohibited (Like global variables, I recommend explicitly writing arguments. If they become too long, utilize optional variables, etc.).
  • Global variables are also prohibited in principle.
  • Commenting out empty lines is unnecessary.
  • Use appropriate indentation.

Hello World!

This doesn't require any special explanation. For reference, the disassembly results also match perfectly.

Fixed-form
      SUBROUTINE HELLO()
      PRINT *, "HELLO"
      END SUBROUTINE
Free-form
subroutine hello()
  print *, "HELLO"
end subroutine
Output
 HELLO

FORMAT Statement

For information on edit descriptors, please refer to this site.

Fixed-form
      PRINT 10, 3.1415926535
   10 FORMAT(F8.4)
      PRINT 20, "HELLO", 1234, 3.1415926535
   20 FORMAT(3X,A10,I8,F8.4)
Free-form
  print '(f8.4)', 3.1415926535
  print '(3x,a10,i8,f8.4)', "HELLO", 1234, 3.1415926535
Output
  3.1416
        HELLO    1234  3.1416

The output is the same, but when rewritten into free-form, the disassembly result changed by exactly one line. I have no idea what caused this difference. I welcome comments from experts.

fotmat.od
-  96:	c7 85 30 fe ff ff 04 	movl   $0x4,-0x1d0(%rbp)
+  96:	c7 85 30 fe ff ff 03 	movl   $0x3,-0x1d0(%rbp)

DO Statement

Always pair a do with an end do.

Fixed-form
      DO 10 I=1,2
   10 PRINT *, I
Free-form
  do i=1,2
    print *, i
  end do
Output
           1
           2

Looking at the above, one might mistakenly think that DO 10 I=1,2 means "execute 10 PRINT *, I twice," but it actually means "execute up to statement number 10 twice." Take a look at this example:

Fixed-form
      DO 20 I=1,2
      PRINT *, I
   20 PRINT *, I
Free-form
  do i=1,2
    print *, i
    print *, i
  end do
Output
           1
           1
           2
           2

In this way, DO 20 I=1,2 means "once you reach statement number 20, go back." I struggled because I couldn't understand this for a while. Applying this logic, the CONTINUE statement—a no-op instruction—can be used as a substitute for end do.

Fixed-form
      DO 30 I=1,2
      PRINT *, I
   30 CONTINUE
Free-form
  do i=1,2
    print *, i
  end do
Output
           1
           2

Now that you understand the meaning of the do statement, let's look at examples involving two do statements. The following example simply has two do statements in a row.

Fixed-form
      DO 40 I=1,2
   40 PRINT *, I
      DO 50 J=3,5
   50 PRINT *, J
Free-form
  do i=1,2
    print *, i
  end do
  
  do j=3,5
    print *, j
  end do
Output
           1
           2
           3
           4
           5

While similar to the above, the next example is a nested loop. The 60 CONTINUE corresponds to two end do statements.

Fixed-form
      DO 60 I=1,2
      PRINT *, I
      DO 60 J=3,5
      PRINT *, I, J
   60 CONTINUE
Free-form
  do i=1,2
    print *, i
    do j=3,5
      print *, i, j
    end do
  end do
Output
           1
           1           3
           1           4
           1           5
           2
           2           3
           2           4
           2           5

Next is a straightforward nested loop. It's confusing because there is no explicit part corresponding to the end do.

Fixed-form
      DO 70 I=1,2
      DO 70 J=3,5
   70 PRINT *, I, J
Free-form
  do i = 1,2
    do j = 3,5
      print *, i, j
    end do
  end do
Output
           1           3
           1           4
           1           5
           2           3
           2           4
           2           5

The last one is an example of a straightforward nested loop where 80 CONTINUE corresponds to two end do statements.

Fixed-form
      DO 80 I = 1,2
      DO 80 J = 3,5
      PRINT *, I, J
   80 CONTINUE
Free-form
  do i = 1,2
    do j = 3,5
      print *, i, j
    end do
  end do
Output
           1           3
           1           4
           1           5
           2           3
           2           4
           2           5

IF Statement

This is perhaps one of the factors making the automation of converting FORTRAN to Fortran difficult. When looking at old programs, I often see if statements used in the form "if true, do not execute," perhaps due to being used in combination with goto statements. Below is an example where if i=1, the execution skips the code in between and moves to 10 CONTINUE.

Fixed-form
      IF (I.EQ.0) GO TO 10
      PRINT *, 'I /= 0'
   10 CONTINUE
Free-form
  if (i/=0) then
    print *, 'I /= 0'
  end if

To rewrite this using if and end if blocks, you must check for i/=0 instead of i==0. There is a possibility of making a mistake unless you reverse-calculate the correct conditional statement after drawing a flowchart. Is this the point that makes it hard to automate? Also, although the result is the same, the code itself changes significantly, so the disassembly result also changed quite a bit.

Fixed-form
      IF (I.EQ.0) GO TO 20
      PRINT *, 'I /= 0'
      IF (I.EQ.1) GO TO 20
      PRINT *, 'I /= 1'
   20 CONTINUE
Free-form
  if (i/=0) then
    print *, 'I /= 0'
    if (i/=1) then
      print *, 'I /= 1'
    end if
  end if

Finally, here is the final boss: the so-called arithmetic IF statement. This requires a lot of thinking, so I will show the simplest example. I will introduce a straightforward example where the branch destination appears after the if statement.

Fixed-form
      IF (I-1) 30, 40, 50
   30 PRINT *, 'I < 1'
   40 PRINT *, 'I < 1 or I = 1'
   50 PRINT *, 'I < 1 or I = 1 or I > 1'
Free-form
  if (i<1)                    print *, 'I < 1'
  if (i<1 .or. i==1)          print *, 'I < 1 or I = 1'
  if (i<1 .or. i==1 .or. i>1) print *, 'I < 1 or I = 1 or I > 1'
Free-form
  if (i<1) then
    print *, 'I < 1'
  end if
  
  if (i<1 .or. i==1) then
    print *, 'I < 1 or I = 1'
  end if

  if (i<1 .or. i==1 .or. i>1) then
    print *, 'I < 1 or I = 1 or I > 1'
  end if

Although it's not very good practice, I also showed a style that omits then and end if. Please be careful as it cannot always be rewritten in the same way; it depends on the line numbers of the branch destinations. In cases where the branch destination appears before the if statement, rewriting it to a while statement might be effective.

do while Statement

In FORTRAN, loops can be implemented with if and goto statements, but this is an obsolete way of writing. Use do while instead.

Fixed-form
      I = 0
   10 CONTINUE
      PRINT *, I
      I = I + 1
      IF (I.LT.5) GO TO 10
Free-form
  i = 0
  do while (i<5)
    print *, i
    i = i + 1
  end do
Output
           0
           1
           2
           3
           4

I found a branch using an arithmetic IF statement on line 568 of Szabo's appendix, so I'll include it as an example. Since IF (I-5) 20, 20, 30 means "go to 20 when I<5 or I==5, and go to 30 when I>5," it simply requires replacing < with <=.

Fixed-form
      I = 0
   20 CONTINUE
      PRINT *, I
      I = I + 1
      IF (I-5) 20, 20, 30
   30 CONTINUE
Free-form
  i = 0
  do while (i<=5)
    print *, i
    i = i + 1
  end do

DIMENSION and DATA Statements

Added on 2022/11/14. We will assign values to variables and arrays. For array constructors, please refer to this and this.

Fixed-form
      IMPLICIT INTEGER (A-Z)
      
      DIMENSION B(3)
      DIMENSION C(3)
      DIMENSION D(3)
      DIMENSION E(5)
      DIMENSION F(3)
      DIMENSION G(3)
      DIMENSION H(2,3)
      
      DATA A/1/
      DATA B/1,2,3/
      DATA C/3,6,9/
      DATA D/3*4/
      DATA E/3*4,5,6/
      DATA F,G/1,2,3,4,5,6/
      DATA H/1,2,3,4,5,6/
Free-form
implicit none

integer :: A = 1
integer :: B(3) = [1,2,3]
integer :: C(3) = [(3*i, i=1,3)]
integer :: D(3) = [(4, i=1,3)]
integer :: E(5) = [(4, i=1,3), 5, 6]
integer :: F(3) = [1,2,3], G(3) = [4,5,6]
integer :: H(2,3) = reshape([1,2,3,4,5,6], [2,3])
Output
 PRINT *, A      ->           1
 PRINT *, B      ->           1           2           3
 PRINT *, C      ->           3           6           9
 PRINT *, D      ->           4           4           4
 PRINT *, E      ->           4           4           4           5           6
 PRINT *, F      ->           1           2           3
 PRINT *, G      ->           4           5           6
 PRINT *, H(1,:) ->           1           3           5
 PRINT *, H(2,:) ->           2           4           6

The DIMENSION statement was removed. DATA statements are also obsolete. First, regarding A, the type is determined by IMPLICIT INTEGER (A-Z), and although it starts with DATA A/1/, let's explicitly rewrite it as integer :: A = 1. The variables from B onwards are also integer types, but since they are arrays, their sizes are specified in the DIMENSION statement. For C(3), since there is a pattern like DATA C/3,6,9/, it can be replaced using a do loop inside the array constructor like integer :: C(3) = [(3*i, i=1,3)]. For D(3), DATA D/3*4/ and DATA D/4,4,4/ have the same meaning. Similarly for E(5), DATA E/3*4,5,6/ and DATA E/4,4,4,5,6/ mean the same thing in this syntax. For F(3) and G(3), DATA F,G/1,2,3,4,5,6/ corresponds to the combination of DATA F/1,2,3/ and DATA G/4,5,6/, allowing values to be assigned to two arrays. Furthermore, when combined with abbreviations like those in E(5), such as DATA F,G/1,4*2,3/, it was difficult to understand at first glance (DATA F,G/1,4*2,3/ has the same meaning as DATA F,G/1,2,2,2,2,3/, which can be decomposed into DATA F/1,2,2/ and DATA G/2,2,3/). Finally, for the 2D array H(2,3), it means creating the following matrix:

H = \begin{pmatrix} 1 & 3 & 5\\ 2 & 4 & 6\\ \end{pmatrix}

Since DATA statements seem to store data in the order of memory layout, DATA H/1,2,3,4,5,6/ is equivalent to:

H(1,1) = 1
H(2,1) = 2
H(1,2) = 3
H(2,2) = 4
H(1,3) = 5
H(2,3) = 6
do j = 1,3
  do i = 1,2
    H(i,j) = i + 2*(j-1)
  end do
end do
H(:,1) = [1,2]
H(:,2) = [3,4]
H(:,3) = [5,6]

You can also specify initial values in a single line using reshape(values, array_size), which is the previous example.

Conclusion

Regarding debugging methods:

  1. Comparison of output
  2. Referring to this for disassembly
  3. Step-by-step execution with GDB to check line by line

I've considered these, but I'm looking for an automated and reliable method (though I suppose if it existed, I wouldn't be struggling...). Since I found that disassembly might not be as useful as expected in rewriting FORTRAN to Fortran, perhaps a better way would be to record GDB step-by-step execution and compare it before and after the rewrite.

In short, I just need a convenient way to guarantee that the flow chart hasn't changed before and after the rewrite. It seems an LLVM Fortran compiler has been released, and since it can also output a control flow graph, it might be usable. Alternatively, creating a script that "embeds print statements in every line, builds, and executes" might help determine if GoTo statements were correctly eliminated before and after the change. (I'd also like to automate white-box testing.)

Automatic refactoring is our long-cherished wish. I found several papers like this one and this one, so I plan to try them out in another article.

In addition to the recent "Big Three" of HPC (C, C++, Fortran), the choice of Julia has emerged. While it is possible to call C or Fortran from Julia using ccall(), that only applies to "elegant" code; it cannot be used for programs where global variables are rampant and functionality is not separated into subroutines or functions. Let's brace ourselves and first rewrite FORTRAN to Fortran, and then Fortran to Julia. If you absolutely want to call it without changing the code, you can call it from Julia using the method explained here. Since implementing natively in Julia allows you to benefit from package managers and the BigFloat type, I'd also like to summarize porting to Julia using Julia for Fortran Users or scripts that assist in conversion to Julia in another article.

If it becomes possible to perform Fortran compilation/decompilation within Julia, I feel that Julia-based refactoring tools will also become more substantial.

Other potentially useful information

https://www.nag-j.co.jp/fortran/FI_17.html
https://qiita.com/cure_honey/items/e06b89e238c3df3df693
https://qiita.com/implicit_none/items/55c47407aa376277a531
https://qiita.com/implicit_none/items/bc113fe438cfb07e0a44#gotoの飛び先多重ループからの脱出-1

The programs introduced in this article have been verified to ensure that the output does not change before and after rewriting. All scripts for building, comparing output, and disassembling along with the programs are all available in this repository:
https://github.com/ohno/F77f90

Discussion