I just want to point out that as of pandas 2.1.0, query()
can be used even more terse. Basically, instead of chaining two conditions using &
, you can reduce it into a single expression:
df = pd.DataFrame({'date': pd.date_range('2000-1-1', None, 200), 'col': range(200)})
# rows between 2000-6-1 and 2000-6-10
df1 = df.query("'2000-6-1' <= date <= '2000-6-10'")
# using local variable
start, end = '2000-6-1', '2000-6-10' # using string
df1 = df.query("@start <= date <= @end")
from datetime import datetime
start, end = datetime(2000,6,1), datetime(2000,6,10) # using datetime/pd.Timestamp
df1 = df.query("@start <= index <= @end")
If the index is datetime, then we can use
df = pd.DataFrame({'col': range(200)}, index=pd.date_range('2000-1-1', None, 200))
df1 = df.query("'2000-6-1' <= index <= '2000-6-10'")
Then again, in that case, loc
slicing is probably more readable:
df1 = df.loc['2000-6-1':'2000-6-10']
df1 = df.loc[datetime(2000,6,1): datetime(2000,6,10)]
One important note is that these methods (in fact, all methods on this page) work only if the datetime column or the datetime index is datetime64[ns] dtype. If it's something else such as datetime.date
or object
etc, you must first convert to datetime64[ns]
using pd.to_datetime()
:
df['date'] = pd.to_datetime(df['date'])
# if it's the index:
df.index = pd.to_datetime(df.index)
If the data is read from a CSV file as in the OP, you can pass the parse_dates
kwarg. For example, if the datetime column should be the date
column in the CSV, you can use:
df = pd.read_csv("my_data.csv", parse_dates=['date'])
If you want to read the datetimes as index from a CSV, then use
df = pd.read_csv("my_data.csv", parse_dates=['date'], index_col=['date'])