A Quick Introduction to the NumPy Library
NumPy Library is a great resource for working with multidimensional arrays. Let us take a deeper dive into learning about NumPy library and the various functions that can be performed with it.
Understanding the NumPy library is an important aspect when it comes to handling data and working with multidimensional arrays. Before NumPy it was called Numeric - a library that was typically used in performing various mathematical functions. NumPy library is a great tool for performing various high-level mathematical and statistical operations.
In this article we’re going to take a look at how to use nd arrays with NumPy.
What are the nd arrays in NumPy?
NumPy’s signature ndarray allows quick and efficient handling of large and high-dimensional matrices. nd arrays are the essence of NumPy made possible by using Python’s built-in lists, and these arrays provide a deep view of the memory (if you're familiar with Java the int[] command offers a similar solution.)
A ndarray is a set of items, with multiple dimensions. The items are of the same type and size and the array usually is of fixed size. The number of dimensions and items in an array is defined by its shape. The shape of an array is a tuple of N non-negative integers that specify the sizes of each dimension. The data-type object (dtype) of the item indicates the type of items in the array.
Even though they are typed homogeneously, all the elements must be of the same type with a consistent stride resulting in lesser memory wastage and better access times.
The stride refers to the number of locations between the beginnings of two different adjacent elements within an array. It is measured either in bytes or units which are the size of the array elements. Strides can be larger or equal to the size of the element but not smaller, as it would naturally intersect the memory location of the next element. It is important to remember that NumPy arrays have a defined data type, which means you are unable to insert strings with an integer-type array.
NumPy is mainly used for double-precision data types and here are some of the built-in methods that we can use for the various exercises and activities. There are various implementations of all the different mathematical operations.
Mean:
NumPy is great as it provides unique implementations for all mathematical operations. Finding the mean can be a useful way of average weight. An example of how mean can be implemented in everyday scenarios is to imagine working for an air freight company and you need to work out the average package weight across 1000's of packages, using NumPy and an Nd array could help you solve this quickly.
Here is how you can find the Mean with the help of NumPy:
# mean value for the whole dataset
np.mean(dataset)
# mean value of the first row
np.mean(dataset[0])
# mean value of the whole first column
np.mean(dataset[:, 0]
# mean value of the first 10 elements of the second row
np.mean(dataset[1, 0:10])
Median:
A vast majority of mathematical operations have the same interface, and it makes it easy to interchange if necessary. It is possible to calculate median with the help of var, std, and median methods for various exercises and activities.
An example of how the median is commonly employed in everyday scenarios is if you want to calculate the median salary for a particular city or a country. It is possible to measure the average income of a country with the help of a median, representing the middle of a group.
Here is how you can find the Median with the help of NumPy:
# median value for the whole dataset
np.median(dataset)
# median value of the last row using reverse indexing
np.median(dataset[-1])
# median value of values of rows >5 in the first column
np.median(dataset[5:, 0])
It is possible to index elements from the end of a dataset by using reverse indexing, which is a simple way of getting the last few elements from a list.
Variance:
Variance describes how a quick set of numbers are spread out from their mean. Here is how we can calculate the variance by using var method of NumPy.
An example of Variance is in the field of Stock market and other investment returns. The stock market has a return on average of 7% per year. This does not mean that every year you get a 7% return, some years are more, and some years are less. This variability (called volatility in stock terms) is an example of variance.
Here is how you can find the Variance with the help of NumPy:
# variance value for the whole dataset
np.var(dataset)
# axis used to get variance per column
np.var(dataset, axis=0)
# axis used to get variance per row
np.var(dataset, axis=1)
Standard Deviation:
Std stands for standard deviation that represents the scalar system of the data, which means that the unit of the deviation will have the same unit as the data itself.
An example of how standard deviation applies in everyday life is that you drive to work every day and take the same route. There is both variation in the time it takes you to get to work (traffic, stop light timing etc.) and variation in the amount of gas you use (same factors of traffic, and stop light time affect this). All of this variability can be measured with standard deviation.
Here is an example of finding the Standard Deviation by using NumPy.
# standard deviation for the whole dataset
np.std(dataset)
# std value of values from the 2 first rows and columns
np.std(dataset[:2, :2])
# axis used to get standard deviation per row
np.std(dataset, axis=1)
As you can see there are countless examples of how NumPy’s library of commands can be used in various business scenarios. We hope you got a quick overview into understanding the different useful commands, and how you can go ahead and use them. From the stock market to calculating salaries and incomes in countries, the NumPy library is a great tool for calculating important functions such as standard deviation, variance, median, and mean.