Functions in Python

Functions in programming languages work the say way as in mathematics. For example, you will be familiar with trigonometric functions:

f(x) = sin(x)

(1)

In this case, we have some function, $f$ , which has parenthesis defining its arguments (in this case, x). In this case, we are returning $sin(x)$ , though we could have constructed an arbitrary function of other functions, e.g., $f(x) = sin(x)cos^2(x)$ .

Now let’s define a Python function. Let’s make a function that returns the combination of sin and cos squared above.

import numpy as np
def f(x):
    answer = np.sin(x)*np.cos(x)**2 
    return answer

Here we are using the special Python reserved word def to define the name of the function (here f), and in parenthesis, we add the arguments to the function (here x). We end the line in a colon (much like a loop).

“Inside” the function, we indent our code (four spaces or a tab) and then place any computations we want to carry out. Here, we compute answer to be the multiplication of our two trig functions.

Finally, we use the reserved word return to “return” this answer from the inside of the function back out into the rest of our code.

To use our function, we simply call it:

returned_answer = f(0.5)
print(returned_answer)

0.3692301313020644

As we can see, the answer from the function gets stored in the variable returned_answer as the output of the function. This should look familiar to how you call any Python function (including those imported from libraries, though they may be prepended with the library name, e.g., np.function()).

Scope¶

Functions are useful for many reasons. To list some obvious ones, they allow a calculation to be encapsulated and the code reused. If we need the output of that multiplication many times in our code, we can call f(x) instead of writing out np.sin(x)*np.cos(x)**2. Now in this case, that’s not so bad --- but we can write functions with many more steps (lines of code).

Beyond that, functions are useful due to their local scope. What does that mean?

So far, when we have written scripts in python, we define variables directly on lines of the file (or in the interpreter). At any later time, we can access those variables via their names.

You should find that we can’t!

print(answer)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 print(answer)

NameError: name 'answer' is not defined

The reason answer is not defined is because any variable defined inside a function is local only to that function. You can think of it like an automotive assembly line at a factory. Raw materials (the arguments) enter at one end, the factory uses a set of screws, bolts, etc., to build the car, and the final car is returned. We are not concerned with the steps that happened inside the factory, and those variables (like tools) are reused every time the function is called.

More formally, Python defines several levels of scope. Scope defines where a variable is allocated and accessible. Variables made in a Python script are in the scope of that module. When you run that script, your interpreter can access any variable you made in the script.

a = 10 
b = 20

print(a)
print(b)

10
20

Functions define their own local scope, which is limited just to the function. Variables made there only “exist” inside the function. That said, a function can retrieve variables from a larger scope:

a = 10 
def f(x):
    return a + x

print(f(10))

Type Annotations and Docstrings¶

So far, we’ve defined our functions minimally. But there are two steps we can take to make our functions better documented. Documentation is extremely important in code; we’ve covered comments, and this is the next step for making clear the behavior of our code. Let’s start with a new, slightly more complex function:

def kepler_period(sma_au):
    return np.sqrt(sma_au**3/7.496e-6)

kepler_period(1) #should give earth's year in days!

365.2457835357012

Technically, this function is already more documented than our previous functions, because its name and arguments are clearer in defining what they should be. But we can do more. First, we will annotate the arguments:

def kepler_period(sma_au:float)->float: 
    return np.sqrt(sma_au**3/7.496e-6)

We add a colon to each argument and specify the type (for example, float, int, str, dict), and at the end, we use an arrow (->) to indicate the return type of the function, before the final colon.

We’ll return to deeper questions about type annotations later (e.g., what if multiple types are acceptable?)

For now, let’s add a docstring --- this is a specialized string within our function which can be used to verbose-ly define its arguments and returns.

def kepler_period(sma_au:float)->float: 
    """
    Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).

    Parameters
    ----------
    sma_au: float
        body's semimajor axis (sma) in AU units 
    
    Returns
    -------
    period: float
        the orbital period (in days). 
    """
    return np.sqrt(distance_au**3/7.496e-6)

The docstring, defined with triple quotes (which allows multiline strings) now gives a sentence-long description of the function, defines the arguments and returns, and their data types, as well as notating optional arguments and indicating their default values. For a one line function, this is a bit overkill, but for functions you write for your research, it can be very useful to have good documentation. We can see the documentation for any Python function via the help() function:

help(kepler_period)

Help on function kepler_period in module __main__:

kepler_period(sma_au: float) -> float
    Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).
    
    Parameters
    ----------
    sma_au: float
        body's semimajor axis (sma) in AU units 
    
    Returns
    -------
    period: float
        the orbital period (in days).

There’s even more we can add. For example, the docstring for any numpy function will be very verbose:

import numpy as np
help(np.argmax)

Help on _ArrayFunctionDispatcher in module numpy:

argmax(a, axis=None, out=None, *, keepdims=<no value>)
    Returns the indices of the maximum values along an axis.
    
    Parameters
    ----------
    a : array_like
        Input array.
    axis : int, optional
        By default, the index is into the flattened array, otherwise
        along the specified axis.
    out : array, optional
        If provided, the result will be inserted into this array. It should
        be of the appropriate shape and dtype.
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the array.
    
        .. versionadded:: 1.22.0
    
    Returns
    -------
    index_array : ndarray of ints
        Array of indices into the array. It has the same shape as `a.shape`
        with the dimension along `axis` removed. If `keepdims` is set to True,
        then the size of `axis` will be 1 with the resulting array having same
        shape as `a.shape`.
    
    See Also
    --------
    ndarray.argmax, argmin
    amax : The maximum value along a given axis.
    unravel_index : Convert a flat index into an index tuple.
    take_along_axis : Apply ``np.expand_dims(index_array, axis)``
                      from argmax to an array as if by calling max.
    
    Notes
    -----
    In case of multiple occurrences of the maximum values, the indices
    corresponding to the first occurrence are returned.
    
    Examples
    --------
    >>> a = np.arange(6).reshape(2,3) + 10
    >>> a
    array([[10, 11, 12],
           [13, 14, 15]])
    >>> np.argmax(a)
    5
    >>> np.argmax(a, axis=0)
    array([1, 1, 1])
    >>> np.argmax(a, axis=1)
    array([2, 2])
    
    Indexes of the maximal elements of a N-dimensional array:
    
    >>> ind = np.unravel_index(np.argmax(a, axis=None), a.shape)
    >>> ind
    (1, 2)
    >>> a[ind]
    15
    
    >>> b = np.arange(6)
    >>> b[1] = 5
    >>> b
    array([0, 5, 2, 3, 4, 5])
    >>> np.argmax(b)  # Only the first occurrence is returned.
    1
    
    >>> x = np.array([[4,2,3], [1,0,3]])
    >>> index_array = np.argmax(x, axis=-1)
    >>> # Same as np.amax(x, axis=-1, keepdims=True)
    >>> np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1)
    array([[4],
           [3]])
    >>> # Same as np.amax(x, axis=-1)
    >>> np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1).squeeze(axis=-1)
    array([4, 3])
    
    Setting `keepdims` to `True`,
    
    >>> x = np.arange(24).reshape((2, 3, 4))
    >>> res = np.argmax(x, axis=1, keepdims=True)
    >>> res.shape
    (2, 1, 4)

You can see the formatting is the same, but there are extra sections. You are welcome to add these to your own functions, but that is likely not neccesary until you begin distributing your code to others to use.

Optional Arguments¶

Thus far, our kepler function has only one argument: the semimajor axis. This is possible because we use the constant slope derived for kepler’s laws that is specific to our solar system (depends on the Sun’s mass) and our universe (depends on the magnitude of the gravitational constant $G$ ).

The formula itself for Kepler’s third law is

\frac{a^3}{P^2} = \frac{G(M+m)}{4\pi^2} \approx \frac{GM}{4\pi^2}

(2)

Let’s implement this in our function, but not require the user to supply $M$ and $G$ by default:

def kepler_period(sma_au:float,
                  star_mass: float = 1.989e+30,
                  G: float = 6.674e-11)->float: 
    """
    Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).

    Parameters
    ----------
    sma_au: float
        body's semimajor axis (sma) in AU units 
    star_mass: float, optional
        mass of the stellar (dominating) body, in kg, default 1.989e30 (sun's mass)
    G: float, optional
        gravitational constant in SI units, default 6.674e-11
    Returns
    -------
    period: float
        the orbital period (in days). 
    """
    sma_meters = 1.496e+11 * sma_au 
    P_squared = (4*np.pi**2*sma_meters**3)/(G*star_mass)
    return np.sqrt(P_squared)*1.15741e-5

Let’s test that our function can still work with just a sma_au provided (and of course that it is still correct!)

kepler_period(1)

365.21953254318385

With a bit of extra unit massaging (putting everything in SI units, then converting the end result from seconds to days), we see we get earth’s period out accurately. However, if we wished, we could now start playing with solar system parameters:

kepler_period(1,star_mass=5e30)

230.34902887820553

So if our sun were $5\times10^{30}\; \rm kg$ rather than $2\times10^{30}\; \rm kg$ , the earth’s year would only be 230 days long. Notice that because star_mass is a non-positional argument, we had to give its name explicitly when adding it in. This is why optional arguments are also called keyword arguments --- we can give some, none, or all, and in any order (after the positional arguments).

Warning

When defining a function that has both positional (required) and optional arguments, the positional arguments must come first, Thus,

def func(a,b,c=1,d=3):
    ...

is legal, but

def func(a=2,b,c,d):
    ...

is not.

As a (quick) preview of what is to come, here’s that same function again, but I’ll allow the astropy package to handle all the unit conversions for us:

import astropy.units as u 
import astropy.constants as ac
def kepler_period(sma:u.Quantity,
                  star_mass: u.Quantity = 1*u.Msun,
                  G: u.Quantity = ac.G)->u.Quantity: 
    """
    Returns the orbital period (in days) for solar system bodies given their semimajor axis length in any length units.

    Parameters
    ----------
    sma: u.Quantity
        body's semimajor axis (sma) in any length units
    star_mass: u.Quantity, optional
        mass of the stellar (dominating) body, in any mass units
    G: u.Quantity, optional
        gravitational constant in any appropriate units.
    Returns
    -------
    period: u.Quantity
        the orbital period (in days). 
    """
    P_squared = (4*np.pi**2*sma**3)/(G*star_mass)
    return np.sqrt(P_squared).to(u.day)

print(kepler_period(sma=2.1*u.AU, star_mass=2*u.Msun))
print(kepler_period(sma=1*u.AU))

785.9815854803609 d
365.2568983840419 d

Extra arguments and keyword arguments, Packing, and Unpacking¶

In some cases, it is useful to define a function which takes in an arbitrary set of arguments. For example, if the output of a previous function produces an unknown number of items, we may wish to admit all of them to the next function we call.

We will handle this using a Python technique known as packing (and unpacking). This is a special operation that allows iterables (e.g. lists or dictionaries) to be created from function argument lists, or to be turned into inputs to a function.

Let’s see an example:

args = [1,2,3]
magnitude(*args)

3.3010299956639813

What happened here? Our magnitude() function, recall, takes 3 inputs. But args is a single list. Under the hood, the asterisk in the function call tells Python to unpack the list and feed each value as a separate argument to the function, in order.

We can play a similar game with dictionaries --- here, we will need a double-asterisk:

kwargs = {'flux_1':1,'flux_2':2,'zeropoint':3}
magnitude(**kwargs)

3.3010299956639813

Essentially, **kwargs in the function call gets translated to flux_1=1,flux_2=2,zeropoint=3, as we would normally provide the function.

These are examples of unpacking a single object into parts to feed into a function. What about the reverse? Let’s start with an example:

def dist_from_parallax(*args):
    """Simple function to return distance (pc) from a parallax (in ''). 
    Takes any number of input parallaxes and returns a list of distances.
    """
    return_list = []
    for parallax in args:
        return_list.append(1/parallax)
    return return_list

When we use the asterisk operator in the definition of a functions arguments, we do the reverse of the unpacking above. Here, we take any inputs to the function (an arbitrary number of inputs), and pack them into a list we are calling args. That list args is available inside the function, where we loop over it to calculate distances from the inputs (assuming they are parallaxes).

dist_from_parallax(1)

[1.0]

dist_from_parallax(1.6,4.5,3.2,7.6,10.1)

[0.625, 0.2222222222222222, 0.3125, 0.13157894736842105, 0.09900990099009901]

We can play the same game with keyword arguments (i.e., pack extra arguments given with names (like star_mass=10*u.Msun) into a dictionary accessible in the function. Here I’ll use an example more typical --- adding kwargs to a function that already has some arguments.

def get_solarsystem_periods(semi_major_axes:list,**kwargs)->list:
    periods = []
    for i in semi_major_axes:
        periods.append(kepler_period(i,**kwargs))
    return periods

Here, I’ve made what a coder would call a wrapper. This function’s goal is to get the periods for a list of input semimajor axes (i.e., feed in the sma for Mercury, Venus, Earth,...), then spit out a list with all the periods.

We know that our kepler_period() function needs that one input (here, fed in from the loop as i), but that it can take extra parameters for the star mass and gravitational constant. Instead of making those explicit arguments of our wrapper, we can use **kwargs to tell it to take any extra arguments and make a dictionary out of them.

Inside our function, there is now a variable called kwargs which is a dictionary. We could do stuff with it here, if we desired. But I simply pass it into our internal call to kepler_period(). Note that I use the ** operator again, now to unpack that dictionary back into functional arguments.

Summary¶

Functions allow us to encapsulate short (ish), general tasks into a clean, easy to debug framework
Because of their local scope, we can be sure all variables passed through are treated the same way
Functions take arguments, including required (positional) and optional (keyword) ones.
One can use *args and **kwargs to allow arbitrary inputs, or to unpack lists and dicts into function calls