4.5. Getting subset of time series + Scatter plot

Here you can learn how to:

- plot time series in points (scatter plot)
- Get subset of time series data and plot them

First, some packages:

[3]:
# These packages are necessary later on. Load all the packages in one place for consistency
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
import datetime

Let’s load the data:

[5]:
#The path of the directory where all AMF data are
path_dir = Path.cwd()/'data'/'1'
name_of_site = 'CA-Obs_clean.csv.gz'
path_data = path_dir/name_of_site
path_data.resolve()
df_data = pd.read_csv(path_data, index_col='time',parse_dates=['time'])
df_data.head()
[5]:
WS RH TA PA WD P SWIN LWIN SWOUT LWOUT NETRAD H LE
time
1997-01-01 00:30:00 2.988 73.036 -24.570 93.942 102.84 NaN -0.14 213.39 0.01 214.83 -1.59 NaN NaN
1997-01-01 01:00:00 2.671 73.146 -24.562 93.887 96.09 NaN -0.03 216.73 0.03 215.03 1.64 NaN NaN
1997-01-01 01:30:00 2.303 73.151 -24.431 93.934 112.34 NaN 0.24 223.46 0.02 215.91 7.77 NaN NaN
1997-01-01 02:00:00 2.789 73.093 -24.379 93.917 109.16 NaN -0.04 218.32 0.00 215.72 2.56 NaN NaN
1997-01-01 02:30:00 2.274 73.140 -24.284 93.977 115.74 NaN 0.14 217.89 0.01 215.99 2.02 NaN NaN

Let’s have a plot of Net radiation:

[23]:
df_data['NETRAD'].plot(figsize=(12,5))
[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x126df3400>
../_images/tutorials_tutorial-5-Scatter_plot_datetime_7_1.png

If you like to focus in particular part of the data, you use the following feature of pandas package (DataFrame.loc(index1:index2)):

[27]:
df_sub=df_data.loc['2002 07 01':'2002 07 30']
df_sub.head()
[27]:
WS RH TA PA WD P SWIN LWIN SWOUT LWOUT NETRAD H LE
time
2002-07-01 00:00:00 3.508 51.637 14.268 93.494 266.48 0.0 0.13 286.54 -0.20 361.24 -74.37 -3.451 0.175
2002-07-01 00:30:00 3.183 53.221 13.799 93.509 265.86 0.0 0.07 282.77 -0.23 358.10 -75.02 NaN NaN
2002-07-01 01:00:00 3.759 55.664 13.182 93.506 266.07 0.0 -0.11 281.14 -0.14 356.13 -74.96 -14.770 1.628
2002-07-01 01:30:00 4.518 52.553 13.217 93.500 276.19 0.0 0.02 282.81 0.15 362.15 -79.47 -69.300 12.360
2002-07-01 02:00:00 3.940 52.343 13.150 93.484 275.58 0.0 -0.35 282.55 0.15 362.48 -80.43 -35.270 3.713
[28]:
df_sub.index
[28]:
DatetimeIndex(['2002-07-01 00:00:00', '2002-07-01 00:30:00',
               '2002-07-01 01:00:00', '2002-07-01 01:30:00',
               '2002-07-01 02:00:00', '2002-07-01 02:30:00',
               '2002-07-01 03:00:00', '2002-07-01 03:30:00',
               '2002-07-01 04:00:00', '2002-07-01 04:30:00',
               ...
               '2002-07-30 19:00:00', '2002-07-30 19:30:00',
               '2002-07-30 20:00:00', '2002-07-30 20:30:00',
               '2002-07-30 21:00:00', '2002-07-30 21:30:00',
               '2002-07-30 22:00:00', '2002-07-30 22:30:00',
               '2002-07-30 23:00:00', '2002-07-30 23:30:00'],
              dtype='datetime64[ns]', name='time', length=1440, freq=None)

as you can see, we have a subset of df_data which contains the data from July 2002. Now let’s plot it:

[75]:
df_sub['NETRAD'].plot(figsize=(15,5))
plt.ylabel('Temp ($\degree$C)')
[75]:
Text(0, 0.5, 'Temp ($\\degree$C)')
../_images/tutorials_tutorial-5-Scatter_plot_datetime_12_1.png

You can even specify the date and time of interest in the subset:

[74]:
df_sub.loc['2002 07 12 6:00:00':'2002 07 12 22:00:00']['TA'].plot(figsize=(15,5))
plt.ylabel('Temp ($\degree$C)')
[74]:
Text(0, 0.5, 'Temp ($\\degree$C)')
../_images/tutorials_tutorial-5-Scatter_plot_datetime_14_1.png

Sometimes we want to plot points instead of lines, specially for time series when you have a lot of missing points. You can do this by using plt.scatter as follows:

[76]:
Y=df_sub['TA']
X=df_sub.index

fig,ax=plt.subplots(1,1,figsize=(15,5))
plt.scatter(X,Y)
plt.xlim([df_sub.index.date.min(),df_sub.index.date.max()])
plt.ylabel('Temp ($\degree$C)')
[76]:
Text(0, 0.5, 'Temp ($\\degree$C)')
../_images/tutorials_tutorial-5-Scatter_plot_datetime_16_1.png

You can change the marker color and style (For different types of markers, please refer here):

[77]:
Y=df_sub['TA']
X=df_sub.index

fig,ax=plt.subplots(1,1,figsize=(15,5))
plt.scatter(X,Y,color='r',marker='+')
plt.xlim([df_sub.index.date.min(),df_sub.index.date.max()])
plt.ylabel('Temp ($\degree$C)')
[77]:
Text(0, 0.5, 'Temp ($\\degree$C)')
../_images/tutorials_tutorial-5-Scatter_plot_datetime_18_1.png