in Learning, MATLAB

MATLAB Vectorization Demystified – Part 3: Preallocation

“MATLAB Vectorization Demystified”: In this series, I try to explain vectorization operation in MATLAB in details step-by-step through several examples. This is part 3 in which “preallocation” of variables/parameters are explained.

Preallocation

You probably have seen this MATLAB warning when coding for or while loops:

Preallocation warning in MATLAB

Preallocation warning in MATLAB

This is the fact that for and while loops that incrementally increase the size of a variable each time through the loop can adversely affect performance and memory usage/management. The reason is that repeatedly resizing arrays often requires MATLAB to spend extra time looking for larger contiguous blocks of memory, and then moving the array into those blocks. On the kth iteration through a loop, MATLAB has to make a copy of k elements into newly allocated memory. The time required to copy k elements is proportional to k. The time required to execute the entire loop is therefore proportional to the sum of the integers from 1 to n, which is \dfrac{n\times (n+1)}{2}. In conclusion, memory copying during the assignment statement causes the time required to execute the loop to be proportional to n^2.

This is a MATLAB Documentation example which shows the performance degradation due to ignoring preallocation:

% warning: depending on your machine, it might take couple of minutes or so
tic, clear; for k = 1:2000000, y(k) = rand; end, toc

Elapsed time is 0.777868 seconds.

Often, you can improve code execution time by preallocating the maximum amount of space required for the variable. What happens if we preallocate variable before the loop in the above example?

% warning: depending on your machine, it might take couple of minutes or so
tic, clear; x=zeros(1,2000000); for k = 1:2000000, x(k)=rand; end, toc

Elapsed time is 0.114744 seconds.

As you can see, the same code is performed about 7 times faster.
NOTE: In MATLAB 2011a and beyond, repeated array growth performance in a loop has been improved significantly. That’s why we don’t see a huge difference when we run the example. The big change in MATLAB R2011a is that MATLAB uses a better strategy for reallocating memory as the array grows when it detects this growth pattern. As a result, the time required to execute the loop grows only linearly with n. For more details, please check this blog post on Mathworks website.

Array of all zeros: preallocation

zeros() function allows you to create a variable filled with zeros and a specific size for prealliocation or other uses. The variable can be of any size and dimensions.

sampleMatrix1 = zeros(5 , 4)

sampleMatrix1 =
      0    0    0    0
      0    0    0    0
      0    0    0    0
      0    0    0    0
      0    0    0    0

We can also specify the type of numeric data for the matrix:

sampleMatrix1 = zeros(5 , 4 , 'int8')
% 'double', 'single', 'logical','int8', 'uint8', 'int16', 
% 'uint16', 'int32', 'uint32', 'int64', 'uint64'

sampleMatrix1 =
      5×4 int8 matrix
      0    0    0    0
      0    0    0    0
      0    0    0    0
      0    0    0    0
      0    0    0    0

It also allows you to specify another array as the prototype. The type of the array will be the same as the given prototype array in this case with any size requested.

sampleMatrix2 = randi(100 , 5 , 4)

sampleMatrix2 =
      58    50    56    21
      39    52    61    58
      28    20    72    42
      38    50    69    81
      24    74    76    35

sampleMatrix1 = zeros(3 , 6 , 'like' , sampleMatrix2)

sampleMatrix1 =
      0    0    0    0    0    0
      0    0    0    0    0    0
      0    0    0    0    0    0

Array of all ones: preallocation

ones() function allows you to create a variable filled with 1 while having a specific size and type, similar to zeros() function.

sampleMatrix3 = ones(3 , 5 , 'uint16')

sampleMatrix3 =
      3×5 uint16 matrix
      1    1    1    1    1
      1    1    1    1    1
      1    1    1    1    1

We can also preallocate a variable with numbers other than 1 with this function. For instance:

sampleMatrix3 = ones(3 , 5) * -1.23

sampleMatrix3 =
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300

What happens if we use unit16 as the type of our variable sampleMatrix3?

sampleMatrix3 = ones(3 , 5 , 'uint16') * -1.23

sampleMatrix3 =
      3×5 uint16 matrix
      0    0    0    0    0
      0    0    0    0    0
      0    0    0    0    0

Since the type of the variable is specified as uint16, which simply means an unsigned number, it cannot accept negative values. In addition, it can only accept integers, so it will round the numbers to the nearest integer. Look at the following example:

sampleMatrix3 = ones(3 , 5 , 'uint16') * 1.93

sampleMatrix3 =
      3×5 uint16 matrix
      2    2    2    2    2
      2    2    2    2    2
      2    2    2    2    2

We had to initialize the same variable with a for loop without preallocation if it was not because of ones() function in MATLAB:

for i = 1:3 % for rows
   for j = 1:5 % for columns
     sampleMatrix3(i , j) = -1.23;
   end
end
sampleMatrix3

sampleMatrix3 =
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300
      -1.2300    -1.2300    -1.2300    -1.2300    -1.2300

NOTE: This kind of initialization (preallocation) is very useful when only a part of the initialized values are going to be changed within the code. That will save time during computation as you don’t need to repeat the repetitive operation. This could be a vector showing the status of an alarm over a finite time, where we expect that the alarm is silent for most of the time. This vector could be preallocated by zeros() where 0 shows the alarm in off mode. We only need to change the elements when the alarm goes on, which rarely happens.

Array of all NaN: preallocation

Similar to zeros() and ones(), we nan() function which allows us to preallocate a variable of NaN in each element of the vector/matrix of any size and dimension. It could be used as an error placeholder too. For instance, if you have a vector whose size isn’t known in advance, it’s a good way to keep track of how many of the values have been used.

sampleMatrix4 = nan(2 , 3 , 'like' , sampleMatrix3) 

sampleMatrix4 =
      NaN    NaN    NaN
      NaN    NaN    NaN
      NaN    NaN    NaN

NOTE: NaN acts like an error placeholder which can have a specific numeric data type. Sometimes we end up making arrays whose dimensions cannot be known beforehand. In these cases, it is much better to initialize a larger array than needed using NaN and then trim off the extra entries later.

m = 5; n = 4; M = randi(m, 1, n);
sampleMatrix4 = nan(1, m*n);
j = 0;
for i = 1:n
   sampleMatrix4(j+(1:M(i))) = i;
   j = j + M(i);
end
sampleMatrix4(isnan(sampleMatrix4)) = [];

sampleMatrix4 =
      1    1    1    1    1    2    2    3    3    3    3    3    4    4    4

Different from other numeric values, two NaN are not the same because both NaNs are undefined and their equivalence does not mean anything:

NaN == NaN

ans =
      logical
      0

Counterintuitively, the two NaNs are not the same. Another example:

x = NaN; y = NaN;
x == y

ans =
      logical
      0

If you want to count the number of existing NaN within a vector/matrix, you should use isnan() function:

sampleVector1 = [2 1.02 NaN -0.36 NaN]

sampleVector1 =
      2.0000    1.0200    NaN    -0.3600    NaN

Let’s do it in the wrong way first, which is assuming that NaN can be treated like other numerical values:

sum(sampleVector1 == NaN)     

ans =
      0

Now, let’s use isnan() function:

sum(isnan(sampleVector1))     

ans =
      2

Please note that isnan() returns a logical vector of the size of the original vector. Using sum(), we calculated the total number of NaN in the vector.

Array of all inf: preallocation

Similar to the previous three functions, inf() can be used to generate a vector/matrix of any size, dimension, and numeric data type that is filled with inf:

sampleMatrix5 = inf(3 , 5 , 'like' , sampleMatrix3)   

sampleMatrix5 =
      Inf    Inf    Inf    Inf    Inf
      Inf    Inf    Inf    Inf    Inf
      Inf    Inf    Inf    Inf    Inf

Despite NaN, infs can be compared where two infs are the same:

inf == inf

ans =
      logical
      1

To count the number of infs in a vector/matrix, we can use isinf() function or logical indexing:

sampleVector2 = [inf 2 0.23 -inf 1]

sampleVector2 =
      Inf    2.0000    0.2300    -Inf    1.0000

Let’s see how isinf() works:

sum(isinf(sampleVector2))

sampleVector2 =
      2

Now, we try the ordinary logical indexing to do the same task:

sum(sampleVector2 == inf) 

sampleVector2 =
      1

What happened? Why the total number of existing inf is not 2 in the second case? The reason is rather simple: inf and -inf are two different numbers in MATLAB as -inf is referring to infinity on the negative side. In such cases, only isnan() works properly by considering both -inf the same values. Check out the codes below:

inf == -inf

ans =
      logical
      0

inf > -inf

ans =
      logical
      1

NaN > -NaN

ans =
      logical
      0

Identity matrix: preallocation

Identity matrix has 1 for the diagonal elements while other elemets are 0. Similar to other numeric preallocation functions, it accept a numeric type. However, it can only generate 2D identity matrix of any size.

sampleMatrix7 = eye(3 , 5 , 'int8')

sampleVector2 =
      3×5 int8 matrix
      1    0    0    0    0
      0    1    0    0    0
      0    0    1    0    0

Sometimes, we need to create a matrix or retrieve diagonal elements of a given matrix. In such cases, we can use diag() function in MATLAB. Let’s say we want to create a 3-by-3 identity-like matrix with different diagonal numbers:

sampleMatrix7 = diag([30 60 -90])

sampleMatrix7 =
      30    0     0
      0     60    0
      0     0     -90

We also can return the diagonal elements of an array using diag() function:

diag(sampleMatrix7)

ans =
      30
      60
      -90

Linearly-spaced vector: preallocation

Sometimes, we need to preallocate a variable with equidistant numbers within a given range. In this case, MATLAB has linspace() function which does the job. It always returns a row vector. The spacing between the points is \dfrac{x_2-x_1}{n-1}, where x_1 and x_2 are the beginning and end of the given range, and n is the number of points requested to be generated.

sampleVector3 = linspace(0.3 , -1.2 , 10)

sampleVector3 =
      0.3000    0.1333    -0.0333    -0.2000    -0.3667    -0.5333    -0.7000    -0.8667    -1.0333    -1.2000

It is similar to colon : operation but it provides better control over the number of points and always include the endpoint. The two lines of codes below return the same results, as expected:

sampleVector3 = linspace(0 , 20 , 10)
sampleVector3 = 0:(20 - 0)/(10 - 1):20

sampleVector3 =
      0    2.2222    4.4444    6.6667    8.8889    11.1111    13.3333    15.5556    17.7778    20.0000

Therefore, the colon : operation can also be used to preallocate an equidistant vector. This might be more useful when we want to initialise a matrix.

2-D and 3-D rectangular coordinates using meshgrid() and ndgrid()

Sometimes, we need a 2-D or 3-D coordinates from a set of given ranges. In this case, we can use meshgrid(). However, to understand how meshgrid() works and the concept behind it, let’s start with a function that has only one variable, e.g., f(x)=x^2. To show the function values from 0 to 5, we create a vector of x within the range and evaluate f(x) for each point in x:

x = linspace(0 , 5 , 20);
f_x = x.^2

x =
      0    0.0693    0.2770    0.6233    1.1080    1.7313    2.4931    3.3934    4.4321    5.6094    6.9252    8.3795    9.9723    11.7036    13.5734    15.5817    17.7285    20.0139    22.4377    25.0000

That was fairly simple. Now, assume that we have function g(x,y), which is a function of both x and y, and is defined as g(x,y)=x^2+y^2. If we want to evaluate function g(x,y) for 0\leq x \leq 5 and 0\leq y \leq 7:

x = linspace(0 , 5 , 20);
y = linspace(0 , 7 , 20);
g_x_y = x.^2+y.^2

x =
      0    0.2050    0.8199    1.8449    3.2798    5.1247    7.3795    10.0443    13.1191    16.6039    20.4986    24.8033    29.5180    34.6427    40.1773    46.1219    52.4765    59.2410    66.4155    74.0000

There are two issues with the example above:

  1. We intentionally selected the same size of x and y, which is not the case in many examples. When the sizes of x and y are different, then we will get an error.
  2. As we all know, g is a function of two variables. Therefore, when a plot is requested, it should be a 3-D plot. However, it is not possible with our calculations in the previous example.

Let’s plot g(x,y) using surf():

surf(x , y , g_x_y)

Error using surf (line 71)
Z must be a matrix, not a scalar or vector.

Per the error, g_x_y should be a matrix, which in this case is a vector. To solve this problem, let’s look at a schematic of a 3-D plot and the relations between variables and function:

Meshgrid and 3-D plot (a)

Meshgrid and 3-D plot (a)


Meshgrid and 3-D plot (b)

Meshgrid and 3-D plot (b)


Meshgrid and 3-D plot (c)

Meshgrid and 3-D plot (c)

Then, what we really need to get it to work is a matrix of x and y where all possible combinations of the two vectors are there. That’s what meshgrid() does for us:

[x , y] = meshgrid(0:5 , 0:7);
g_x_y = x.^2 + y.^2;
surf(x , y , g_x_y)
3-D surface with meshgrid()

3-D surface with meshgrid()

ndgrid() does the same conceptually but it can create a rectanguler grid for N-D spaces while meshgrid() can only handle 2-D and 3-D spaces.

[x , y] = ndgrid(0:5 , 0:7);
g_x_y = x.^2 + y.^2;
surf(x , y , g_x_y)
3-D surface with meshgrid()

3-D surface with meshgrid()

Log-spaced vector: preallocation

If we need k logarithmically spaced points between 10^a and 10^b, we can use logspace() function:

sampleVector4 = logspace(1 , 2 , 10)

sampleVector4 =
      10.0000    12.9155    16.6810    21.5443    27.8256    35.9381    46.4159    59.9484    77.4264    100.0000

A common misunderstanding about preallocation

MATLAB users have been told about preallocation that sometimes we do it when it is unnecessary. This not only complicates the code but can actually cause the very issues that preallocation is meant to alleviate i.e., runtime performance and peak memory usage. For more information, please check out this blog post. The unnecessary preallocation often looks something like this:

Ecercises

Exercise 1: In a single line of code, create a 5-by-6 preallocated matrix with 0 on the main diagonal elements and -10 on other elements:

  % your code
  
Answer:

  sampleMatrixE1 = ~eye(5 , 6) * -10
  

sampleMatrixE1 =
      0      -10    -10    -10    -10    -10
      -10    0      -10    -10    -10    -10
      -10    -10    0      -10    -10    -10
      -10    -10    -10    0      -10    -10
      -10    -10    -10    -10    0      -10

Exercise 2: Create a 5-by-5 preallocation matrix in which the elements associated with sampleMatrixE2_2 bigger than 200 or smaller than 800 are represented by inf and the rest of the elements are 10. Don’t use an intermediate variable, no more than two lines, and don’t change anything in sampleMatrixE2_2.

  % to get similar set of random numbers everytime you run the code
  rng(1,'twister'); 
  sampleMatrixE2_2 = randi(1000 , 5 , 5)
  

sampleMatrixE2_2 =
      418    93     420    671    801
      721    187    686    418    969
      1      346    205    559    314
      303    397    879    141    693
      147    539    28     199    877

you probably want to use double() function to convert logical to double type.

Answer:

  sampleMatrixE2 = double(sampleMatrixE2_2 > 200 & ...
            sampleMatrixE2_2 < 800) + ones(5 , 5) * 10
  sampleMatrixE2(sampleMatrixE2 == 11) = inf
  

sampleMatrixE2 =
      Inf    10     Inf    Inf    10
      Inf    10     Inf    Inf    10
      10     Inf    Inf    Inf    Inf
      Inf    Inf    10     10     Inf
      10     Inf    10     10     10

Exercise 3: Preallocate a 4-by-4 matrix which looks like the figure below in a single line without intermediate variable:

Exercise 3

Exercise 3

Answer:

  sampleMatrixE3 = [nan(2 , 2) -eye(2 , 2) ; ~eye(2 , 2) inf(2 , 2)]
  

sampleMatrixE3 =
      NaN    NaN    -1     0
      NaN    NaN    0      -1
      0      1      Inf    Inf
      1      0      Inf    Inf

Exercise 4: Preallocate sampleMatrixE4 with the same number of rows in sampleMatrixE4_1 and the number of columns in which sampleMatrixE4_1 has at least 3 or more numbers between 100 and 900 and are odd numbers in a sinle line code.

  % to get similar set of random numbers everytime you run the code
  rng(1,'twister'); 
  sampleMatrixE4_1 = randi(1000 , 4 , 8)
  

sampleMatrixE4_1 =
      418    147    397    205    418    801    877    170
      721    93     539    879    559    969    895    879
      1      187    420    28     141    314    86     99
      303    346    686    671    199    693    40     422

Answer:

  sampleMatrixE4 = ones(size(sampleMatrixE4_1 , 1) , ...
       length(find(sum(sampleMatrixE4_1 > 100 & ...
       sampleMatrixE4_1 < 900 & rem(sampleMatrixE4_1 , 2) ~= 0 , 1) >= 3)))
  

sampleMatrixE4 =
      1    1
      1    1
      1    1
      1    1

Exercise 5: Create a 2-D log-spaced rectangular coordinate paris for x and y variables where x has 5 points in 10\leq x\leq 100 and y has 7 points in 100\leq y \leq 1000.

  % your code
  
Answer:

  [x , y] = meshgrid(logspace(1 , 2 , 10) , ...
                     logspace(2 , 3 , 15))
  

x =
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
      10.0000    17.7828    31.6228    56.2341    100.0000
  
y =
   1.0e+03 *
      0.1000    0.1000    0.1000    0.1000    0.1000
      0.1468    0.1468    0.1468    0.1468    0.1468
      0.2154    0.2154    0.2154    0.2154    0.2154
      0.3162    0.3162    0.3162    0.3162    0.3162
      0.4642    0.4642    0.4642    0.4642    0.4642
      0.6813    0.6813    0.6813    0.6813    0.6813
      1.0000    1.0000    1.0000    1.0000    1.0000

Comments

Comment