Functions in programming languages work the say way as in mathematics. For example, you will be familiar with trigonometric functions:
In this case, we have some function
, , which has parenthesis defining its arguments (in this case, x
). In this case, we are returning , though we could have constructed an arbitrary function of other functions, e.g., .
Now let’s define a Python function. Let’s make a function that returns the combination of sin
and cos
squared above.
import numpy as np
def f(x):
answer = np.sin(x)*np.cos(x)**2
return answer
Here we are using the special Python reserved
word def
to define the name of the function (here f
), and in parenthesis, we add the arguments
to the function (here x
). We end the line in a colon (much like a loop).
“Inside” the function, we indent our code (four spaces or a tab) and then place any computations we want to carry out. Here, we compute answer
to be the multiplication of our two trig functions.
Finally, we use the reserved word return
to “return” this answer from the inside of the function back out into the rest of our code.
To use our function, we simply call it:
returned_answer = f(0.5)
print(returned_answer)
0.3692301313020644
As we can see, the answer
from the function gets stored in the variable returned_answer
as the output of the function. This should look familiar to how you call any Python function (including those imported from libraries, though they may be prepended with the library name, e.g., np.function()
).
Scope¶
Functions are useful for many reasons. To list some obvious ones, they allow a calculation to be encapsulated and the code reused. If we need the output of that multiplication many times in our code, we can call f(x)
instead of writing out np.sin(x)*np.cos(x)**2
. Now in this case, that’s not so bad --- but we can write functions with many more steps (lines of code).
Beyond that, functions are useful due to their local scope
. What does that mean?
So far, when we have written scripts in python, we define variables
directly on lines of the file (or in the interpreter). At any later time, we can access those variables via their names.
You should find that we can’t!
print(answer)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 1
----> 1 print(answer)
NameError: name 'answer' is not defined
The reason answer
is not defined is because any variable defined inside a function is local only to that function. You can think of it like an automotive assembly line at a factory. Raw materials (the arguments
) enter at one end, the factory uses a set of screws, bolts, etc., to build the car, and the final car is returned
. We are not concerned with the steps that happened inside the factory, and those variables (like tools) are reused every time the function is called.
More formally, Python defines several levels of scope
. Scope defines where a variable is allocated and accessible. Variables made in a Python script are in the scope of that module. When you run
that script, your interpreter can access any variable you made in the script.
a = 10
b = 20
print(a)
print(b)
10
20
Functions define their own local scope
, which is limited just to the function. Variables made there only “exist” inside the function. That said, a function can retrieve variables from a larger scope:
a = 10
def f(x):
return a + x
print(f(10))
20
Type Annotations and Docstrings¶
So far, we’ve defined our functions minimally. But there are two steps we can take to make our functions better documented. Documentation is extremely important in code; we’ve covered comments, and this is the next step for making clear the behavior of our code. Let’s start with a new, slightly more complex function:
def kepler_period(sma_au):
return np.sqrt(sma_au**3/7.496e-6)
kepler_period(1) #should give earth's year in days!
365.2457835357012
Technically, this function is already more documented than our previous functions, because its name and arguments are clearer in defining what they should be. But we can do more. First, we will annotate
the arguments:
def kepler_period(sma_au:float)->float:
return np.sqrt(sma_au**3/7.496e-6)
We add a colon to each argument and specify the type
(for example, float
, int
, str
, dict
), and at the end, we use an arrow (->) to indicate the return type of the function, before the final colon.
We’ll return to deeper questions about type annotations later (e.g., what if multiple types are acceptable?)
For now, let’s add a docstring
--- this is a specialized string within our function which can be used to verbose-ly define its arguments and returns.
def kepler_period(sma_au:float)->float:
"""
Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).
Parameters
----------
sma_au: float
body's semimajor axis (sma) in AU units
Returns
-------
period: float
the orbital period (in days).
"""
return np.sqrt(distance_au**3/7.496e-6)
The docstring, defined with triple quotes (which allows multiline strings) now gives a sentence-long description of the function, defines the arguments and returns, and their data types, as well as notating optional arguments and indicating their default values. For a one line function, this is a bit overkill, but for functions you write for your research, it can be very useful to have good documentation. We can see the documentation for any Python function via the help()
function:
help(kepler_period)
Help on function kepler_period in module __main__:
kepler_period(sma_au: float) -> float
Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).
Parameters
----------
sma_au: float
body's semimajor axis (sma) in AU units
Returns
-------
period: float
the orbital period (in days).
There’s even more we can add. For example, the docstring for any numpy
function will be very verbose:
import numpy as np
help(np.argmax)
Help on _ArrayFunctionDispatcher in module numpy:
argmax(a, axis=None, out=None, *, keepdims=<no value>)
Returns the indices of the maximum values along an axis.
Parameters
----------
a : array_like
Input array.
axis : int, optional
By default, the index is into the flattened array, otherwise
along the specified axis.
out : array, optional
If provided, the result will be inserted into this array. It should
be of the appropriate shape and dtype.
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the array.
.. versionadded:: 1.22.0
Returns
-------
index_array : ndarray of ints
Array of indices into the array. It has the same shape as `a.shape`
with the dimension along `axis` removed. If `keepdims` is set to True,
then the size of `axis` will be 1 with the resulting array having same
shape as `a.shape`.
See Also
--------
ndarray.argmax, argmin
amax : The maximum value along a given axis.
unravel_index : Convert a flat index into an index tuple.
take_along_axis : Apply ``np.expand_dims(index_array, axis)``
from argmax to an array as if by calling max.
Notes
-----
In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned.
Examples
--------
>>> a = np.arange(6).reshape(2,3) + 10
>>> a
array([[10, 11, 12],
[13, 14, 15]])
>>> np.argmax(a)
5
>>> np.argmax(a, axis=0)
array([1, 1, 1])
>>> np.argmax(a, axis=1)
array([2, 2])
Indexes of the maximal elements of a N-dimensional array:
>>> ind = np.unravel_index(np.argmax(a, axis=None), a.shape)
>>> ind
(1, 2)
>>> a[ind]
15
>>> b = np.arange(6)
>>> b[1] = 5
>>> b
array([0, 5, 2, 3, 4, 5])
>>> np.argmax(b) # Only the first occurrence is returned.
1
>>> x = np.array([[4,2,3], [1,0,3]])
>>> index_array = np.argmax(x, axis=-1)
>>> # Same as np.amax(x, axis=-1, keepdims=True)
>>> np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1)
array([[4],
[3]])
>>> # Same as np.amax(x, axis=-1)
>>> np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1).squeeze(axis=-1)
array([4, 3])
Setting `keepdims` to `True`,
>>> x = np.arange(24).reshape((2, 3, 4))
>>> res = np.argmax(x, axis=1, keepdims=True)
>>> res.shape
(2, 1, 4)
You can see the formatting is the same, but there are extra sections. You are welcome to add these to your own functions, but that is likely not neccesary until you begin distributing your code to others to use.
Optional Arguments¶
Thus far, our kepler function has only one argument: the semimajor axis. This is possible because we use the constant slope derived for kepler’s laws that is specific to our solar system (depends on the Sun’s mass) and our universe (depends on the magnitude of the gravitational constant ).
The formula itself for Kepler’s third law is
Let’s implement this in our function, but not require the user to supply and by default:
def kepler_period(sma_au:float,
star_mass: float = 1.989e+30,
G: float = 6.674e-11)->float:
"""
Returns the orbital period (in days) for solar system bodies given their semimajor axis length (in AU).
Parameters
----------
sma_au: float
body's semimajor axis (sma) in AU units
star_mass: float, optional
mass of the stellar (dominating) body, in kg, default 1.989e30 (sun's mass)
G: float, optional
gravitational constant in SI units, default 6.674e-11
Returns
-------
period: float
the orbital period (in days).
"""
sma_meters = 1.496e+11 * sma_au
P_squared = (4*np.pi**2*sma_meters**3)/(G*star_mass)
return np.sqrt(P_squared)*1.15741e-5
Let’s test that our function can still work with just a sma_au
provided (and of course that it is still correct!)
kepler_period(1)
365.21953254318385
With a bit of extra unit massaging (putting everything in SI units, then converting the end result from seconds to days), we see we get earth’s period out accurately. However, if we wished, we could now start playing with solar system parameters:
kepler_period(1,star_mass=5e30)
230.34902887820553
So if our sun were rather than , the earth’s year would only be 230 days long.
Notice that because star_mass
is a non-positional argument, we had to give its name explicitly when adding it in. This is why optional arguments are also called keyword
arguments --- we can give some, none, or all, and in any order (after the positional arguments).
As a (quick) preview of what is to come, here’s that same function again, but I’ll allow the astropy
package to handle all the unit conversions for us:
import astropy.units as u
import astropy.constants as ac
def kepler_period(sma:u.Quantity,
star_mass: u.Quantity = 1*u.Msun,
G: u.Quantity = ac.G)->u.Quantity:
"""
Returns the orbital period (in days) for solar system bodies given their semimajor axis length in any length units.
Parameters
----------
sma: u.Quantity
body's semimajor axis (sma) in any length units
star_mass: u.Quantity, optional
mass of the stellar (dominating) body, in any mass units
G: u.Quantity, optional
gravitational constant in any appropriate units.
Returns
-------
period: u.Quantity
the orbital period (in days).
"""
P_squared = (4*np.pi**2*sma**3)/(G*star_mass)
return np.sqrt(P_squared).to(u.day)
print(kepler_period(sma=2.1*u.AU, star_mass=2*u.Msun))
print(kepler_period(sma=1*u.AU))
785.9815854803609 d
365.2568983840419 d
Extra arguments and keyword arguments, Packing, and Unpacking¶
In some cases, it is useful to define a function which takes in an arbitrary set of arguments. For example, if the output of a previous function produces an unknown number of items, we may wish to admit all of them to the next function we call.
We will handle this using a Python technique known as packing (and unpacking).
This is a special operation that allows iterables (e.g. lists
or dictionaries
) to be created from function argument lists, or to be turned into inputs to a function.
Let’s see an example:
args = [1,2,3]
magnitude(*args)
3.3010299956639813
What happened here? Our magnitude()
function, recall, takes 3 inputs. But args
is a single list.
Under the hood, the asterisk in the function call tells Python to unpack the list and feed each value as a separate argument to the function, in order.
We can play a similar game with dictionaries
--- here, we will need a double-asterisk:
kwargs = {'flux_1':1,'flux_2':2,'zeropoint':3}
magnitude(**kwargs)
3.3010299956639813
Essentially, **kwargs
in the function call gets translated to flux_1=1,flux_2=2,zeropoint=3
, as we would normally provide the function.
These are examples of unpacking a single object into parts to feed into a function. What about the reverse? Let’s start with an example:
def dist_from_parallax(*args):
"""Simple function to return distance (pc) from a parallax (in '').
Takes any number of input parallaxes and returns a list of distances.
"""
return_list = []
for parallax in args:
return_list.append(1/parallax)
return return_list
When we use the asterisk operator in the definition of a functions arguments, we do the reverse of the unpacking above.
Here, we take any inputs to the function (an arbitrary number of inputs), and pack them into a list we are calling args
.
That list args
is available inside the function, where we loop over it to calculate distances from the inputs (assuming they are parallaxes).
dist_from_parallax(1)
[1.0]
dist_from_parallax(1.6,4.5,3.2,7.6,10.1)
[0.625, 0.2222222222222222, 0.3125, 0.13157894736842105, 0.09900990099009901]
We can play the same game with keyword arguments (i.e., pack extra arguments given with names (like star_mass=10*u.Msun
) into a dictionary accessible in the function.
Here I’ll use an example more typical --- adding kwargs
to a function that already has some arguments.
def get_solarsystem_periods(semi_major_axes:list,**kwargs)->list:
periods = []
for i in semi_major_axes:
periods.append(kepler_period(i,**kwargs))
return periods
Here, I’ve made what a coder would call a wrapper
. This function’s goal is to get the periods for a list of input semimajor axes (i.e., feed in the sma for Mercury, Venus, Earth,...), then spit out a list with all the periods.
We know that our kepler_period()
function needs that one input (here, fed in from the loop as i
), but that it can take extra parameters for the star mass and gravitational constant. Instead of making those explicit arguments of our wrapper, we can use **kwargs
to tell it to take any extra arguments and make a dictionary out of them.
Inside our function, there is now a variable called kwargs
which is a dictionary. We could do stuff with it here, if we desired. But I simply pass it into our internal call to kepler_period()
. Note that I use the **
operator again, now to unpack that dictionary back into functional arguments.
See Answer
Because kepler_period()
does not accept arbitrary kwargs, if we ran get_solar_system_periods([1*u.AU,2*u.AU],star_mass=10*u.Msun,bad_arg=10)
, we would get an error because bad_arg
would be passed to kepler_period()
which does not have that argument defined.
We could stay safe from this by adding **kwargs
as an argument to kepler_period()
, then any proper (defined) kwargs would be caught, and extra ones would get shoved into a dictionary that could be safely ignored.
Summary¶
Summary¶
Functions
allow us to encapsulate short (ish), general tasks into a clean, easy to debug framework- Because of their local scope, we can be sure all variables passed through are treated the same way
- Functions take arguments, including required (positional) and optional (keyword) ones.
- One can use
*args
and**kwargs
to allow arbitrary inputs, or to unpacklists
anddicts
into function calls