Data Analysis Note - Pandas Part 2 - Series

When working with Pandas’ specific column, you would have a new datatype called Series. A Series is a one-dimensional array of data. It can hold data of any type: string, integer, float, dictionaries, lists, booleans, and more.

The most commonly used methods for working with Series

1. Basic info checking:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

# only display the summary statistics of this selected column:
>> df.column_name.describe()

# Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
>> df.column_name.value_counts()

# sort the entire dataframe based on the order of ['column_1','column_2'] and return a Series (with 2 indexes) containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
df[['column_1','column_2']].value_counts()
# for instance, get the number of books sold in certain years based on different genre:
books[['Genre','Year']].value_counts()
>> result ->
>> Genre Year
Non Fiction 2015 33
2016 31
2019 30
2010 30
2018 29
Fiction 2014 29
Non Fiction 2012 29
2011 29
2017 26
2013 26
2009 26
Fiction 2017 24
2013 24
2009 24
2018 21
Non Fiction 2014 21
Fiction 2012 21
2011 21
2010 20
2019 20
2016 19
2015 17
Name: count, dtype: int64

# will return an array of unique value (default to include 'NaN' value)
>> df.column_name.unique()

# will return a number showing how many unique values are there, (default to NOT include 'NaN' value)
>> df.column_name.nunique()
# including 'NaN' value
>> df.column_name.nunique(dropna=False)

# return the nth of largest items;
df.column_name.nlargest(5)

# return the nth of smallest items;
df.column_name.nsmallest(5)

2. Filter / select from a series:

1
2
3
4
5
6
7
8
9
10
11
12

# filter/select rows with where, Series.where()
Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)[source]

# example that extract title column from dataframe, then filter which one contains string 'you' in the title.
title = df['title']
title.where(lambda x: x.str.contains('you'))


#while dataframe also has .where() method:
df.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
df.where(df<90, "A+") # find where df < 90,replace with "A+", default replace with 'NaN'

3. Converting a Series data obj to DataFrame:

1
2
#Convert Series to DataFrame.
Series.to_frame(name=None)