2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
1/7
In [1]:
In [2]:
Out[2]:
survived pclass sex age sibsp parch fare embarked c lass who adult_male
0 0 3 male 22.0 1 0 7.2500 S Third man True
1 1 1 female 38.0 1 0 71.2833 C First woman False
2 1 3 female 26.0 0 0 7.9250 S Third woman False
3 1 1 female 35.0 1 0 53.1000 S First woman False
4 0 3 male 35.0 0 0 8.0500 S Third man True
# Once you have loaded data into your Pandas dataframe, you
# might need to further manipulate the data and perform a
# variety of functions such as filtering certain columns, dropping
# the others, selecting a subset of rows or columns, sorting the
# data, finding unique values, and so on.
# Indexing refers to fetching data using index or column
#information of a Pandas dataframe. Slicing, on the other hand,
# refers to slicing a Pandas dataframe using indexing
# techniques.
import matplotlib.pyplot as plt
import seaborn as sns
# sets the default style for plotting
sns.set_style("darkgrid")
titanic_data = sns.load_dataset('titanic')
titanic_data.head()
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
2/7
In [3]:
0 Third
1 First
2 Third
3 First
4 Third
...
886 Second
887 First
888 Third
889 First
890 Third
Name: class, Length: 891, dtype: category
Categories (3, object): ['First', 'Second', 'Third']
Out[3]:
pandas.core.series.Series
# One of the simplest ways to select data from various columns
# is by using square brackets. To get column data in the form of
# a series from a Pandas dataframe, you need to pass the
# column name inside square brackets that follow the Pandas
# dataframe name.
# The following script selects records from the class column of
# the Titanic dataset.
print(titanic_data["class"])
type(titanic_data["class"])
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
3/7
In [4]:
<class 'pandas.core.frame.DataFrame'>
Out[4]:
class sex age
0 Third male 22.0
1 First female 38.0
2 Third female 26.0
3 First female 35.0
4 Third male 35.0
... ... ... ...
886 Second male 27.0
887 First female 19.0
888 Third female NaN
889 First male 26.0
890 Third male 32.0
891 rows × 3 columns
# You can select multiple columns by passing a list of column
# names inside a string to the square brackets. You will then get
# a Pandas dataframe with the specified columns, as shown
# below.
print(type(titanic_data[["class", "sex", "age"]]))
titanic_data[["class", "sex", "age"]]
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
4/7
In [5]:
In [6]:
Out[5]:
survived pclass s ex age sibsp parch fare embarked class who adult_male dec
0 0 3 male 22.0 1 0 7.2500 S Third man True Na
4 0 3 male 35.0 0 0 8.0500 S Third man True Na
5 0 3 male NaN 0 0 8.4583 Q Third man True Na
6 0 1 male 54.0 0 0 51.8625 S First man True
7 0 3 male 2.0 3 1 21.0750 S Third child False Na
Out[6]:
survived pclass s ex age sibsp parch fare embarked c lass who adult_male d
6 0 1 male 54.0 0 0 51.8625 S First man True
23 1 1 male 28.0 0 0 35.5000 S First man True
27 0 1 male 19.0 3 2 263.0000 S First man True
30 0 1 male 40.0 0 0 27.7208 C First man True N
34 0 1 male 28.0 1 0 82.1708 C First man True N
# You can also filter rows based on some column values. For
# doing this, you need to pass the condition to the filter inside
# the square brackets. For instance, the script below returns all
# records from the Titanic dataset where the sex column
# contains the value “male.”
my_df = titanic_data[titanic_data["sex"] == "male"]
my_df.head()
# You can specify multiple conditions inside the square
# brackets. The following script returns those records where the
# sex column contains the string “male,” while the class column
# contains the string “First.”
my_df = titanic_data[(titanic_data["sex"] == "male") &
(titanic_data["class"] == "First")]
my_df.head()
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
5/7
In [7]:
In [8]:
Out[7]:
survived pclass sex age sibsp parch fare embarked class who adult_male
0 0 3 male 22.0 1 0 7.25 S Third man True
12 0 3 male 20.0 0 0 8.05 S Third man True
37 0 3 male 21.0 0 0 8.05 S Third man True
51 0 3 male 21.0 0 0 7.80 S Third man True
56 1 2 female 21.0 0 0 10.50 S Second woman False
Out[8]:
Subject Score Grade Remarks
0 Mathematics 85 B Good
1 History 98 A Excellent
2 English 76 C Fair
3 Science 72 C Fair
4 Arts 95 A Excellent
# You can also use the isin() function to specify a range of
# values to filter records. For instance, the script below filters all
# records where the age column contains the values 20, 21, or
# 22.
ages = [20,21,22]
age_dataset = titanic_data[titanic_data["age"].isin(ages)]
age_dataset.head()
# Indexing and Slicing Using loc Function
# The loc function from the Pandas dataframe can also be used
# to filter records in the Pandas dataframe.
# To create a dummy dataframe used as an example in this
# section, run the following script:
import pandas as pd
scores = [
{'Subject':'Mathematics', 'Score':85, 'Grade': 'B', 'Remarks': 'Good',
},
{'Subject':'History', 'Score':98, 'Grade': 'A','Remarks':
'Excellent'},
{'Subject':'English', 'Score':76, 'Grade': 'C','Remarks': 'Fair'},
{'Subject':'Science', 'Score':72, 'Grade': 'C','Remarks': 'Fair'},
{'Subject':'Arts', 'Score':95, 'Grade': 'A','Remarks': 'Excellent'},
]
my_df = pd.DataFrame(scores)
my_df.head()
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
6/7
In [9]:
In [10]:
In [11]:
Subject English
Score 76
Grade C
Remarks Fair
Name: 2, dtype: object
Out[9]:
pandas.core.series.Series
Out[10]:
Subject Score Grade Remarks
2 English 76 C Fair
3 Science 72 C Fair
4 Arts 95 A Excellent
Out[11]:
Gra de Score
2 C 76
3 C 72
4 A 95
# Let’s now see how to filter records. To filter the row at the
# second index in the my_dfdataframe, you need to pass 2
# inside the square brackets that follow the loc function. Here is
# an example:
print(my_df.loc[2])
type(my_df.loc[2])
# In the output below, you can see data from the row at the
# second index (row 3) in the form of a series.
# You can also specify the range of indexes to filter records
# using the loc function. For instance, the following script filters
# records from index 2 to 4.
my_df.loc[2:4]
# Along with filtering rows, you can also specify which columns
# to filter with the loc function.
# The following script filters the values in columns Grade and
# Score in the rows from index 2 to 4.
my_df.loc[2:4, ["Grade", "Score"]]
2/10/22, 12:08 PM
Pandas Practice Code - Set 5 - Jupyter Notebook
localhost:8888/notebooks/Pandas Practice Code - Set 5.ipynb
7/7
Comments
Post a Comment