Introduction to NumPy

Installation and Reference Methods

  1. pip install numpy
  2. import numpy as np

Usage Demonstration (Including Random Usage)

random.uniform(parameter1, parameter2) generates a floating-point number between 1 and 2. random.uniform(100.0, 200.0) generates a random floating-point number between 100 and 200. random.randint(parameter1, parameter2) generates a random integer between 1 and 2. np.array(list) converts the list to a numpy array.

ndarray Multidimensional Data Object

  1. Create ndarray: np.array(array_like)
  2. object.dtype() Check the type of the object
  3. np.array([[1, 2, 3], [4, 5, 6]]) multi-dimensional array
  4. How many elements does object.size have?
  5. object.shape returns the number of rows and columns (1, 2) 1 row 2 columns
  6. Transpose of the T array (for high-dimensional arrays)
  7. T array transpose
  8. .ndim Check the current dimensionality

ndarray - Creation

  1. array() converts a list to an array and optionally displays the specified dtype.
  2. arange() - NumPy version of range, supports floating-point numbers

arange(100) # Get data from 0 to 99 arange(0, 10) # Get 0-9 arange(2, 10, 3) # from 2 to 10 with a step of 3 (start value, end value, step size)

The step size can be a decimal.

Sure, please provide the content you would like translated to English. 4. reshape(rows, columns) Create a two-dimensional array with several rows and columns 5. python a = np.arange(20).reshape(4, 5) Create a 2D array with 4 rows and 5 columns, ranging from 0 to 19. [[ 0 1 2 3 4]] [5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19] Please provide the content you would like translated to English. 6. linspace() is similar to arange(), with the third parameter being the length of the array. 7. linspace(start value, end value, length) linspace(0, 10, 100) # Array range from 0 to 10 divided into 100 parts Sure, please provide the content you would like translated to English. 8. zeros() creates an array of all zeros with the specified shape and dtype. 9. ```python a = np.zeros(10, dtype='data type') Create a list of 10 elements of the specified data type. Sure, please provide the content you would like translated to English. 10. ones() creates an array of all ones with the specified shape and dtype. 11. empty() Create an empty array with random values according to the specified shape and dtype. 12. eye() Create an identity matrix with the specified side length and dtype

ndarray - Data Types

Boolean: bool_ 2. Integer: int_ int8 int16 int32 int64 3. Unsigned integer types: uint8 uint16 uint32 uint64 4. Floating-point: float_ float16 float32 float64 5. Complex: complex64 complex128

Differences between Arrays and Lists

The element types within an array object must be the same. The size of the array cannot be modified.

ndarray-batch operations

Operations between arrays and scalars

  • a + 1, a * 3, 1 // a, a ** 0.5, a > 5 Operations between arrays of the same size
  • a + b, a / b, a ** b, a % b, a == b

Indexing and Slicing

Index [row, column] while pandas objects are [column, row]

Index of one-dimensional array: a[5] 2. Indexing of multidimensional arrays: List-style notation: 2[2][3] New-style notation: a[2,3],

Slicing

a[start:4] 2. ```python [1, 2, 3, 4, 5, 6, 7, 8] a[0:4] # The resulting value is 1 2 3 4 a[4:] # Slice all from index 4 [[ 0 1 2 3 4]] [ 5 6 7 8 9 ] [10 11 12 13 14] row 0 to 2, column 0 to 2 Please provide the content you would like translated to English. 3. Starting from row 0 means slicing starts at 0, starting from row 1 means slicing starts at 1. 4. Slicing of one-dimensional arrays: a[5:8] slices from the 5th to the 7th row, a[4:] slices all columns from the 4th row, a[2:10] = 1 sets the values of the slice from the 2nd row to the 9th column to 1. 5. Slicing of multidimensional arrays: a[1:2, 3:4], a[:3:5], a[:, 1] 6. Differences between array slicing and list slicing: When slicing an array, it does not automatically copy (instead, it creates a view), so modifications to the sliced array will affect the original array. 7. The copy method can create a copy of the array. Boolean Indexing

  1. a[a > 5] returns the list of arrays where elements in a are greater than 5.

print(a[a > 3]) # Take elements where a > 3 b = a[a > 3] print(b[b % 2 == 0]) # Get even numbers Sure, please provide the content you would like translated to English.

ndarray-Fancy Indexing

python
a = np.arange(10)
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
What should I do if there is no pattern in choosing 1, 3, 4, 6, 7?
# Fancy Indexing
b = a[[1, 3, 4, 6, 7]]
a[0,2:4] # 0-2 then take from position 2 to position 4
a =
[[ 0  1  2  3  4]]
[5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
Take 6, 8, 16, 18
print(a[[1,3],:][:,[1,3]]) # First take the first row and all columns of the third row, then take all rows and the 1st column and the 3rd column.
[[ 6  8]]
[16 18]
a = [row, column]
Please provide the content you would like translated to English.
## Universal Functions for ndarrays
1. abs() returns the absolute value
2. The `numpy.sqrt(array[, out])` function is used to compute the element-wise positive square root of an array.
3. round() rounding off
4. np.floor() rounds down to the nearest integer
5. np.ceil() rounds up to the nearest integer
6. np.maximum(a, b)
7. ```python
```python
a = np.array([2, 5, 3, 4])

b = np.array([5, 2, 1, 6]) print(np.maximum(a, b)) Compare two arrays element by element and return the maximum value.

Please provide the content you would like translated to English.
8. np.minimum(a, b) returns the minimum value
## Numpy Statistics
1. sum() Summation
2. mean() Calculate the average value
3. min() Find the minimum value
4. max() finds the maximum value
5. var() variance
6. std() Standard deviation, the square root of variance
7. argmax() returns the index of the maximum value
8. argmin() returns the index of the minimum value
## Numpy - Random Number Generation
1. `random.rand` generates a random array of the given shape with numbers between 0 and 1.
2. random.randint(start, stop, size) Generate random integers with a given shape
3. ```python
random.randint(start_value, end_value, (3, 5)) 2D array with 3 rows and 5 columns
Sure, please provide the content you would like translated to English.
4. random.choice generates random choices given the shape
5. random.shuffle is the same as random.shuffle, which shuffles an array.
6. random.uniform generates a random array of the given shape
## histogram() function
Count the number in the statistical interval
For example: a1 = [1, 2, 4, 1, 8, 9]
np.linspace(1, 10, 4) divides the interval from 1 to 10 into 3 segments.
np.histogram(a1, np.linspace(1, 10, 4))
Interval 1-4-7-10
In items 1-4, there are [1,2,1] three times.
In 4-7, there is [4] 1.
In 7-10, there are 2 instances of 8 and 9 in a1.
# Introduction to the Series
## series
1. A series is an object similar to a one-dimensional array, consisting of a set of data and a set of associated data labels (indices).
2. Creation Method
3. ```python
a = pd.Series([4, 7, -5, 3])
0    4
1    7
2   -5
3    3
b = pd.Series([4,7,-5,3], index=['a','b','c','d'])
a    4
b    7
c -5
d    3
Please provide the content you would like translated to English.
4. Get the value array and index array: values property and index property
5. Series is like a combination of lists (arrays) and dictionaries.
6. Can perform operations with numbers, such as adding 2 to a series object of the same size.
7. Slicing can also be done as a[0:2], where a[starting position, how many].
8. Boolean filtering: a[a > 4]
9. Universal function np.abs(a)
10. ```python
```python
sr = pd.Series({'a': 1, 'b': 2})

print(sr) a 1 b 2 Can be created with a dictionary

Can be obtained through dictionary key values

Check if a key is in the dictionary

'a' in sr: True Sure, please provide the content you would like translated to English. 11. index Index attribute values Values attribute 12. sr['a':'c'] slicing

Series-Integer Indexing

```python
sr = pd.Series(np.arange(20))

print(sr) sr2 = sr[10:].copy() print(sr2) print(sr2[10]) # will be interpreted as a label, confusing with the index

Use loc to get with labels instead of indices

print(sr2.iloc[10])

Use iloc to get by index

print(sr2.iloc[0]) Sure, please provide the content you would like translated to English.

Series - Data Alignment

 When performing operations on two Series objects in pandas, the alignment is done based on the index before the calculation. C+C A+A D+D If there is an index mismatch between sr1 and sr2 when adding them, NaN values will be produced. Data Alignment 2 fill_value If sr1 + sr2 has unmatched values, fill the missing values with the available values. If there are no available values, fill with 0.

  • sr.isnull() checks if there are any null values in sr. Returns True if there are, and False otherwise.
  • sr.notnull() is the opposite of isnull()
  • sr.dropna() drops missing values
  • sr.fillna(0) All NaN values will become 0.
  • sr.fillna(sr.mean()) Replace NaN with the mean.

Series-Summary

Array + Dictionary 2. Integer index loc (interpreted as labels) and iloc (interpreted as indices) 3. Data Alignment NaN 4. Missing data handling: dropna, fillna

DataFrame object

DataFrame Object

Common Properties of DataFrame Objects

  1. index Retrieve the index
  2. Transpose of T swaps rows and columns.
one two
0    1    4
1    2    5
2    3    6
0  1  2
one 1 2 3
two four five six
Please provide the content you would like translated.
3. columns Get column index
4. values Get the value array (two-dimensional array)
5. describe() Get quick statistics
## Getting Values from DataFrame (Select Column First, Then Row)
```python
one two
0    1    4
1    2    5
2    3    6
df["one"]["0"] # Returns the "one" column of the 0th row
# Recommended to use loc and iloc
print(frame.loc[0, 'one'])
"# loc and iloc take rows first and then columns"
Get all values of the 0th label
print(frame.iloc[0,:])
one    1
two four
Please provide the content you would like translated.
## DataFrame - Indexing and Slicing
![DataFrame Object Indexing and Slicing](/media/blog/7e7d263b04ee4d7e/DataFrame对象索引和切片.png)
Flower index is equally applicable.
Sure, please provide the content you would like translated to English.
# Get the values of column two from row 0 and row 2
print(frame.loc[[0, 2], 'two'])
0    4
2    6
Sure, please provide the content you would like translated.
## DataFramed - Data Alignment and Missing Value Handling
Handling Missing Index in DataFrame
Please provide the content you would like translated to English.
# Align, if no value is present then it should be NaN
frame = pd.DataFrame({'one': [1, 2, 3], 'two': [4, 5, 6]}, index=[0, 1, 2])
frame2 = pd.DataFrame({'one': [1, 2, 3, 5], 'two': [4, 5, 6, 7]}, index=[1, 0, 2, 3])
print(frame)
print(frame2)
print(frame + frame2)
one two
0    1    4
1    2    5
2    3    6
one two
1    1    4
0    2    5
two three six
3    5    7
one two
0  3.0   9.0
1 3.0 9.0
2 6.0 12.0
3 NaN NaN
Please provide the content you would like translated to English.
1. `fillna(0)` handles missing values in the same way as Series objects.
2. drop() Remove rows or columns
3. dropna() If there is one missing value in the row, delete the entire row.
4. dropna(how='all') If all values are NaN, then delete.
5. The default value for how is 'any'. If there is any NaN, the entire row will be deleted.
6. dropna(axis=0) defaults to 0 for rows, axis=1 for columns, and other axes are opposite.
# Commonly Used Methods in Pandas
mean(axis=0, skipna=False) calculates the mean of columns (rows)
axis = 0 column = 1 row 0 means across rows 1 means across columns
sum(axis=0) sums over the columns (rows)
sort_values(by='column_name/index', ascending=True/False, axis=0/1) Sorts values by column name in ascending order; add ascending=False for descending order.
sort_index(axis=..., ascending=True/False) Sorts the column (row) index.
"drop and dropna default to deleting a row with axis=1, which is the opposite of other functions."
Universal functions of NumPy are also applicable to Pandas.
# Handling Pandas Time Objects
1. Time object library: datetime
2. Flexible handling of date objects: `dateutil.parser.parse()`
3. Group processing time objects: pandas pd.to_datetime()
4. `datetime.datetime.strptime('2020-01-01', '%Y-%m-%d')` converts the string to a datetime object (2020, 1, 1, 0, 0)
5. dateutil library dateutil.parser.parse('2020-01-01') direct conversion, (2020, 1, 1, 0, 0)
6. `pandas.to_datetime(['2020-01-01', '2021-01-01'])` converts to `DatetimeIndex(['2020-01-01', '2020-02-02'])`
## Generate Timestamp Object
date range
start time
end time
duration
freq time frequency default is 'D', optional H(hour), W(week), B(business), S(semi-)M(month), T(minute), S(second), A(year)
1. pd.date_range('Date1', 'Date2') generates the time from Date1 to Date2.
2. pd.date_range('date', periods=60) generates a date range from the given date to the 60th day.
3. pd.date_range('date', periods=60, freq='H') generates hourly
4. freq = 'W' weekly, 'W-MON' weekly on Monday
5. frea='B' weekday
6\. Object.to_pydatetime() converts the object to a date type.
# Pandas File Processing
1. `pd.read_csv('path')` defaults to a comma as the delimiter.
2. pd.read_table('path') Default delimiter is tab.
3. Specify a particular column as the row index: `index_col="column_name/0"`
4. parse_dates=True Parse time series parse_dates=['column_name'] specifies that the column should be converted to a datetime object.
5. header=None automatically generate column names, names=['Content'] specify generated column names
6. na_values=['None'] specifies which strings represent missing values
7. sep Specify the file delimiter
8. na_rep specifies the string to convert missing values to, default is an empty string.
9. header=False do not output the column name row index=False do not output the row index column
10. columns = [0, 1, 2, 3] specifies the output columns, pass in a list.