Why algorithms (in particular, I/O algorithms) are important

Suppose we have a two-dimensional array, which is a map with given heights at different points. Let's calculate the average elevation. To do this, you can go through either each row:
sum = 0
for i = 0 to m - 1 do
  for j = 0 to m - 1 do
    sum = sum + A[i, j]
avg = sum / m^2
or each column:
sum = 0
for j = 0 to m - 1 do
  for i = 0 to m - 1 do
    sum = sum + A[i, j]
avg = sum / m^2
Then summarize all heights and divide by the number of points on the map. Do you think there is any difference between these methods? It would seem not, but in fact, there is. The first method (traversing all rows) turns out to be more productive than the second method (traversing the columns). The whole thing is related to the organization of the array in memory and access to each cell. I tried to compare on Java. That's what I did.

// Computing the average elevation
class Avg {
  public static void main(String[] argv) {
    int n = 1000; // 1000 * 1000 = 10^6
    int[][] matrix = new int[n][n];

    fillMatrix(matrix, n, 1);

    long start = System.nanoTime();
    int avg = avgRowByRow(matrix);
    System.out.println(String.format(
      "avg (Row By Row) = %s,
      it takes %s nano-seconds",
      avg,
      System.nanoTime() - start
    ));

    start = System.nanoTime();
    avg = avgColumnByColumn(matrix);
    System.out.println(String.format(
      "avg (Column By Column) = %s
      it takes %s nano-seconds",
      avg,
      System.nanoTime() - start
    ));
  }

  private static void fillMatrix(int[][] matrix, int n, int default_value) {
    for (int i = 0; i < n; i++) {
      for (int j = 0; j < n; j++) {
        matrix[i][j] = default_value;
      }
    }
  }

  private static int avgRowByRow(int[][] matrix) {
    int sum = 0, m = matrix.length;

    for (int i = 0; i < m; i++) {
      for (int j = 0; j < m; j++) {
        sum = sum + matrix[i][j];
      }
    }

    return sum / m / m; // sum / (m * m)
  }

  private static int avgColumnByColumn(int[][] matrix) {
    int sum = 0, m = matrix.length;

    for (int j = 0; j < m; j++) {
      for (int i = 0; i < m; i++) {
        sum = sum + matrix[i][j];
      }
    }

    return sum / m / m; // sum / (m * m)
  }
}
Then I tried to run it a couple of times and got the following result. The result can be said to be obvious:
avg (Row By Row)       = 1, it takes 3822568 nano-seconds
avg (Column By Column) = 1, it takes 5646746 nano-seconds

avg (Row By Row)       = 1, it takes 3902352 nano-seconds
avg (Column By Column) = 1, it takes 5467638 nano-seconds

avg (Row By Row)       = 1, it takes 3939734 nano-seconds
avg (Column By Column) = 1, it takes 5077092 nano-seconds

avg (Row By Row)       = 1, it takes 3736544 nano-seconds
avg (Column By Column) = 1, it takes 4950847 nano-seconds

avg (Row By Row)       = 1, it takes 3900019 nano-seconds
avg (Column By Column) = 1, it takes 5009578 nano-seconds

avg (Row By Row)       = 1, it takes 3954720 nano-seconds
avg (Column By Column) = 1, it takes 5109380 nano-seconds

avg (Row By Row)       = 1, it takes 3824056 nano-seconds
avg (Column By Column) = 1, it takes 5276105 nano-seconds