Problem in Fortran Matrix Offloading: Transpose Example
Fortran stores matrix in Column-Major ordering, while normal backends stores matrix in Row-Major ordering, which will brings trouble. I will use matrix transpose as an example to explain.
1 Problem
Consider following example:
- In Fortran, we create a matrix $A$, print it, stores in memory and returns a pointer $p_F$.
- Pass $p_{F}$ to a different backend, suppose C++, do a transpose and stores the result in $p_{c}$
- Back to Fortran, recover data from $p_{C}$, print it.
Question: will step 3 print the correct result as we expected?
2 Case 1: $N \times N$ Matrix
Suppose the original matrix is $3 \times 3$ matrix:
$$ A_{Fortran} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \tag{Printed in Step 1} $$When Fortran stores it in memory, it will become something like:
$$ [3, 4, 7, 2, 5, 8, 3, 6, 9] \tag{Memory Stored from Fortran} $$Pass to C++ backend, when recovering, C++ would think it is a matrix like:
$$ A_{C++} = \begin{bmatrix} 1 & 4 & 7 \\ 2 & 5 & 8 \\ 3 & 6 & 9 \end{bmatrix} $$Do the transpose, it becomes:
$$ (A_{C++})^T = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} $$Save to $p_{C}$:
$$ [1, 2, 3, 4, 5, 6, 7, 8, 9] \tag{Memory Stored from Cpp} $$When recovering from it, Fortran again scans it thinking is stores in columns:
$$ (A_{Fortran})^T = \begin{bmatrix} 1 & 4 & 7\\ 2 & 5 & 8\\ 3 & 6 & 9 \end{bmatrix} \tag{Printed in Step 3} $$Notice that the result is correct by coincident! Because we scan the memory in a wrong way TWICE.
3 Case 2: $N \times M$ Matrix ($N \neq M$)
We’re not this lucky if we’re going to transpose matrix with different length in size.
Suppose the original matrix is $2 \times 3$ matrix:
$$ B_{Fortran} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \tag{Printed in Step 1} $$When Fortran stores it in memory, it will become something like:
$$ [1, 4, 2, 5, 3, 6] \tag{Memory Stored from Fortran} $$Pass to C++ backend, when recovering, C++ would think it is a matrix like:
$$ B_{C++} = \begin{bmatrix} 1 & 4 & 2 \\ 5 & 3 & 6 \end{bmatrix} $$Do the transpose, it becomes:
$$ (B_{C++})^T = \begin{bmatrix} 1 & 5 \\ 4 & 3 \\ 2 & 6 \end{bmatrix} $$Save to $p_{C}$:
$$ [1, 5, 4, 3, 2, 6] \tag{Memory Stored from Cpp} $$When recovering from it, Fortran again scans it thinking is stores in columns:
$$ (B_{Fortran})^T = \begin{bmatrix} 1 & 3\\ 5 & 2\\ 4 & 6 \end{bmatrix} \tag{Printed in Step 3} $$Notice that this is NOT the correct transpose of step 1.
If you have any feedback to this article, feel free to comment here or send an email to me