Last updated November 15, 2022

NumPy is a Python library for applying functions to multi-dimensional arrays.

The package includes operators for mathematical, statistical, logical, shape manipulation,and for sorting, selecting, and random simulation.

NumPy processes arrays fast thanks to the vectorization, which is running a single action across large arrays for data. In ‘single instruction, multiple data’ (SIMD) processing, a single command is executed simultaneously, in parallel. This is much faster than writing a loop, which could only execute against a single row of data.

Assuming you have Python already installed on your computer, download the numpy package from the command line, if you haven’t already (example is on a Windows machine):

c:\python -m pip install numpy

To use Numpy, first import the library into your program:

Import numpy as np

**LOADING DATA**

A standard Python list looks like this:

a_list = ["a", "b", "c", "d"]

You must make any list into a numpy array with np.array():

an_array = np.array(a_list)

In the line above,‘an_array’ is now a variable.

You can also create the list inline:

an_array = np.array(["a", "b", "c", "d"])

Multi-dimensional arrays, or nested arrays, can be captured as well, with each array separated by a comma, and a set of outer brackets to capture the array as a whole.

>>>ratings = np.array([[7, 10, 8], [6, 6, 7], [5, 3, 1]])

…or export it from a Comma Separated values (.csv) file, by way of np.genfromtxt:

Vehicles = np.genfromtxt('vehicles.csv', delimiter=',')

**OPERATIONS**

Once your data is loaded into numpy, you can enjoy a great variety of operations. Examples below use the IDLE IDE for Python, from the Python Foundation. The basic information here has been lifted from CodeAcademy's Numpy series of lessons, as well as from Numpy's own introduction.

Individual elements from an array can be queried, and operated on.

The first element of an_array (above) could be called like this

To select a range of elements in an array,
define the start and end points as per the
Python syntax
of *my_array[start:end]*:

To select a range of elements in a two-dimensional array,
use the syntax of (my_array[*x*,*x*]) where the first
*x* specifies the
row of the array, and the second *x* is the column
(the *:* delineator means "choose none").

A subset of an array can be defined as a variable, which can be operated on mathematically.

In Numpy, each dimension of an array is called an Axis. The column is the '0' axis, and row is the '1' axis. To execute an operation on individual rows or columns on a two-dimensional array, use 0 for column, 1 for row.

**STATISTICAL REASONING**

Numpy provides many built-in functions to operate on arrays directly.

With this one-dimensional array....

collection = np.array([7, 10, 8, 6, 6, 7, 5, 3, 1])

...We can determine the mean, the median, and the standard deviation. It can sort. If you provide a percentile, it will show the corresponding value:

**A bit more on percentiles**: A percentile
is a
"comparison score between a particular score and the scores of the rest of a group.
It shows the percentage of scores that a particular score surpassed." A score in the
"90th percentile" means that scoredid better than 90% of all the other scores in that group.

An "interquartile range" (IQR) is the middle 50% of all the values in an array. It is taken by calculating the 25th and the 75th percentile (the "first" and "third" quartile respectively), then subtracting the first from the third: