Skip to article frontmatterSkip to article content

Numpy and Arrays

One major data type that we have not yet introduced is the array -- a close sibling of the list. One reason we’ve isolated it here is because it is not included in default Python -- instead, it’s included in a package that we can import. The array offers many important advantages over the list when it comes to vectorized operations (which we’ll define below).

By the end of the day, you’ll be able to

  • Import the numpy package
  • Construct and manipulate numpy arrays
  • Compare and contrast numpy arrays and lists
  • Apply numpy functions for simple but important calculations

In tandem with this notebook, I recommend creating a new script that you can use as a playground - it’s best not to use python interactiely in the shell as we move to more sophisticated programs.

Importing Numpy (and other Packages)

As part of your conda environment for this series, you downloaded a number of packages that are ubiquitously used throughout Python programming in any context. One of those packages is called numpy, which is short for numerical Python. You can import this package via the line

import numpy as np

where the “as np” part is optional and just defines an alias (short name) that can save you a bit of typing later on. After you’ve done this, you now have access to a large number of useful data structures and efficient, practical functions available as part of the package!

Import statements only needed to be called once at the start of the program (whether that be a script or in a python shell). Generally, we like to include them in a block at the top of the code, e.g.,

import numpy as np
import scipy
import astropy

We won’t go into any more detail about packages for the day, but we will dig into the numpy package specifically since it complements the rest of the material we were discussing about data types.

Motivating numpy: The Problem with Lists and the need for Arrays

The main purpose of numpy is to allow for efficient data handling and computation, including for multidimensional collections of objects. Numpy achieves this by improving on the list data type through a new data type called an “array”. To understand why the array is different (and better), let’s define one of each:

# define a list using default Python
list1 = [1,3,5]

# np.array() is a function that takes a list as an argument and converts it into an array
array1 = np.array([1,3,5]) 

print(list1)
print(array1)

Let’s now say we want to add 10 to each element in list1 and array1. You might think of trying:

print(list1 + 10)

but evidently that returns an error. Default Python doesn’t think you can simply add a single number to a list, because they are incompatible datatypes. Contrast that with numpy:

print(array1 + 10)

which has the expected effect of adding 10 to each element.

A related behavior of list is seen if you try adding them:

print([1,3,5] + [2,4,6])

versues if you try with arrays,

a = np.array([1,3,5])
b = np.array([2,4,6])
print(a+b)

In short, numpy usually does what you expect to see from vectors: we call these element-wise operations.

The numpy advantage, in short, is being able to do vector operations. This works much faster than looping through eac (which is a generally slow thing to do in Python).

While we’re here, one disadvantage of numpy arrays is that unlike lists, all elements in a numpy array must be the same data type. Thus, you can’t have an array [“cat”, 6], for example, while you could for a list.

More reasons to use numpy: A myriad of useful functions

Even if you don’t care about arrays, numpy offers many useful convenience functions that are not offered in default Python.

For example, you might want to try to compute the cosine of an angle:

print(np.cos(10)) ## cosine function

or generate N evenly-spaced numbers between two endpoints of an interval,

low = 0
high = 100
N = 101
arr = np.linspace(low,high,N) ## generate 51 evenly spaced numbers between 0 and 5
print(arr)

or get the mean and median of a list/array quickly:

vals = [1,3,5,7,9]
print(np.mean(vals), np.median(vals))

Note that in the above example, vals is actually a list - not an array. That’s fine - numpy will do the conversion for you under the hood.

This might not seem that impressive yet, but it becomes really useful when dealing with multi-dimensional data. If we wanted, for example, we could definite a numpy array of zeros of shaping 50x50 as follows:

blank_image = np.zeros((50,50))
print(blank_image)

We could then add or mulyiply a scalar value easily, or take the mean, or all at once!)

print(np.mean(5*blank_image + 2))

and it just acts like a giant vector. You can see where this might be going when it comes to manipulating astronomical images!