Python

Python Numpy for Windows

Last updated November 15, 2022

Donald Judd, Marfa

NumPy is a Python library for applying functions to multi-dimensional arrays.

The package includes operators for mathematical, statistical, logical, shape manipulation,and for sorting, selecting, and random simulation.

NumPy processes arrays fast thanks to the vectorization, which is running a single action across large arrays for data. In ‘single instruction, multiple data’ (SIMD) processing, a single command is executed simultaneously, in parallel. This is much faster than writing a loop, which could only execute against a single row of data.

Assuming you have Python already installed on your computer, download the numpy package from the command line, if you haven’t already (example is on a Windows machine):

c:\python -m pip install numpy

To use Numpy, first import the library into your program:

Import numpy as np
LOADING DATA

A standard Python list looks like this:

a_list = ["a", "b", "c", "d"]

You must make any list into a numpy array with np.array():

an_array = np.array(a_list)

In the line above,‘an_array’ is now a variable.

You can also create the list inline:

an_array = np.array(["a", "b", "c", "d"])

Multi-dimensional arrays, or nested arrays, can be captured as well, with each array separated by a comma, and a set of outer brackets to capture the array as a whole.

>>>ratings = np.array([[7, 10, 8], 
[6, 6, 7], 
[5, 3, 1]])

…or export it from a Comma Separated values (.csv) file, by way of np.genfromtxt:

Vehicles = np.genfromtxt('vehicles.csv',  delimiter=',')

OPERATIONS

Once your data is loaded into numpy, you can enjoy a great variety of operations. Examples below use the IDLE IDE for Python, from the Python Foundation. The basic information here has been lifted from CodeAcademy's Numpy series of lessons, as well as from Numpy's own introduction.

Individual elements from an array can be queried, and operated on.

The first element of an_array (above) could be called like this

Numpy array print element

To select a range of elements in an array, define the start and end points as per the Python syntax of my_array[start:end]:

Numpy array print

To select a range of elements in a two-dimensional array, use the syntax of (my_array[x,x]) where the first x specifies the row of the array, and the second x is the column (the : delineator means "choose none").

Nested array selection

A subset of an array can be defined as a variable, which can be operated on mathematically.

arrays can be operated on as variables

In Numpy, each dimension of an array is called an Axis. The column is the '0' axis, and row is the '1' axis. To execute an operation on individual rows or columns on a two-dimensional array, use 0 for column, 1 for row.

choose rows or columns by the axis element.

STATISTICAL REASONING

Numpy provides many built-in functions to operate on arrays directly.

With this one-dimensional array....

collection = np.array([7, 10, 8, 6, 6, 7, 5, 3, 1])

...We can determine the mean, the median, and the standard deviation. It can sort. If you provide a percentile, it will show the corresponding value:

array mean, medium, standard deviation, percentile and sorting.

A bit more on percentiles: A percentile is a "comparison score between a particular score and the scores of the rest of a group. It shows the percentage of scores that a particular score surpassed." A score in the "90th percentile" means that scoredid better than 90% of all the other scores in that group.

An "interquartile range" (IQR) is the middle 50% of all the values in an array. It is taken by calculating the 25th and the 75th percentile (the "first" and "third" quartile respectively), then subtracting the first from the third:

IQR

break

Programming and System Administration Home