pandas cheatsheet

Home

1 Pandas Overview

Will probably make you leave MS Excel, once you get good at it.

But there is a read.excel function (as many other types of read capabilities)

By convention, pandas is imported as 'pd', so you will see the following:

import pandas as pd

Likewise the convention is to import numpy as 'np' import numpy as np

And finally you will need mapplotlib.pyplot, which by convention is imported as plt import matplotlib.pyplot as plt

1.1 Reading in .csv or excel files. n

You can tell pandas to read various file types: excel, csv, pdf, numbers etc.. You read those files into what pandas calls a "dataframe"

covid_df = pd.read_excel('covid-stats.xls', 'covidbydate', index_col=None, na_values=['NA'])
covid_df.head()
covid_df.describe()

Could also pd.read_csv('file.tsf', sep='\t') where we override the default ',' separator with a tab separator \t

sep='\t' is the same as delimiter='\t'

2 dataframes

A pandas dataframe consists of three components.

the index can be just a whole number, but can also be one of the columns as long as the index is unique.
columns (appear at the top, and name each column)
rows are the actual data values.

2.1 df.columns

This just selects the columns

2.2 df.values

This is just the rows of data

2.3 df.index

This is just the index.

The last three are each one of the three compoenents that every dataframe has.

Some of these examples assume you are looking at the titanic database, a common public database Note to self, change this to covid examples.

Example: data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3]} df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']) df name reports year Cochice Jason 4 2012 Pima Molly 24 2012 Santa Cruz Tina 31 2013 Maricopa Jake 2 2014 Yuma Amy 3 2014

2.4 df.describe

covid_df.describe()

You can also just enter a method, without any arguments, to see what the method will do. For instance. pd.read_csv() with no other arguments will describe the read_csv pandas function.

2.5 df.shape

covid_df.shape Returns a count of the rows, and columns, so 1000,3 is 1000 rows of 3 columns. Note that this is an attribute, not a method, so covid_df.shape() will give an error

2.6 df.info()

Shows more than just the row and column counts.

In [19]: russ2.info()                                                                                                                                                        
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 687 entries, 0 to 686
Data columns (total 12 columns):
#   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
0   zone         687 non-null    object 
1   date         687 non-null    object 
2   cases        687 non-null    int64  
3   todayCases   687 non-null    int64  
4   deaths       687 non-null    int64  
5   todayDeaths  687 non-null    int64  
6   recovered    687 non-null    int64  
7   active       687 non-null    int64  
8   cases/1M     687 non-null    object 
9   deaths/1M    687 non-null    float64
10  tests        687 non-null    int64  
11  tests/1M     687 non-null    int64  
dtypes: float64(1), int64(8), object(3)
memory usage: 64.5+ KB

In [20]:

In the above, it is typical that 'object' types are strings when importing external datafiles, but they can be any pythyon object, including another dataframe.

2.7 df.drop

2.7.1 dropping columns

df.drop(['recovered', 'deaths']) Inside the () is a list, so square brackets. This will drop both columns, recovered, and deaths.

Drop a variable (column) Note: axis=1 denotes that we are referring to a column, not a row. As of pandas 0.23, the axis number can be 'index' or 'column' so axis = 0 same as axis='index' & axis = 1 same as axis='columns'

default is axis=0, so we are dropping rows by default.

df.drop(['', ''], axis='columns') pvactive.drop(['world', 'USA'], axis='columns')

df.drop('reports', axis='columns') Name year Cochice Jason 2012 Pima Molly 2012 Santa Cruz Tina 2013 Maricopa Jake 2014 Yuma Amy 2014

2.7.2 dropping rows by name

df.drop(['Cochice', 'Pima']) # axis=index is the default so left off.

name reports year Santa Cruz Tina 31 2013 Maricopa Jake 2 2014 Yuma Amy 3 2014

2.7.3 dropping rows by number

df.drop(df.index[2]) # drops row 3

name reports year Santa Cruz Tina 31 2013 Maricopa Jake 2 2014 Yuma Amy 3 2014

2.7.4 conditionally dropping rows

Conditionally drop a row based on a value Specifically: Create a new dataframe called df that includes all rows where the value of a cell in the name column does not equal “Tina”

df[df.name != 'Tina'] name reports year Cochice Jason 4 2012 Pima Molly 24 2012 Maricopa Jake 2 2014 Yuma Amy 3 2014

2.7.5 dropping rows from the end

df.drop(df.index[-2]) # drops the second last row.

2.7.6 keeping top 3

df[:3]

2.7.7 drop bottom 3

df[:-3]

2.8 df.head

covid_df.head() shows the first 10 entries in the data frame

2.9 df.tail

covid_df.tail()

2.10 df.columns

This just selects the columns

2.11 df.index

This is just the index.

The last three are each one of the three compoenents that every dataframe has.

2.12 df.values

This is just the rows of data. Each row is actually a numpy array. Remember that pandas was developed from numpy, and numpy arrarys.

When you are pasdsing dataframes to an external program that is older and does not understand panda dataframe formats, you can use df.values as that passes numpy arrays, a much older form.

So typically df.values or columns.values

.values is usually only used when handing off data to external programs. Internally, pandas will always understand

2.13 df.dtypes

reminds you of the types for each column. Good to check if you have dirty data.

3 Dataframe subsetting

Subsetting is selecting a subset of the dataframe. You can subset by row or by column.

3.1 selecting columns (a.k.a. subsetting our dataframes)

Can be done in three ways:

1) covid_df.recovered

using dot notation (subsetting by column)
2) covid_df['recovered']

square [] with columns as string (subsetting by column)

or to save it: recovered = covid_df['recovered']

type(recovered) –> pandas.core.series.Series which is an extension of a numpy array.

3.2 pandas.core.series.Series

To get a dataframe back from a column rather than this series, pandas.core.series.Series, we use double square brakets.

The first (outer) brackets tell pandas that we will be doing

subsetting,

The second (inner) brackets are a list, of one or more columns.

So recovered=covid_df[ ['recovered'] ] will give back a datatype pandas.dataframe

3) selecting rows using loc or iloc (see next section)

3.3 selecting rows .loc[] and .iloc[]

df.loc[0] is the first row

Can also pass a list so: df.loc[[0,1] ] is the first two rows.

If you subset a single row, you get back a numpy array, or a pandas series. If you subset multiple rows, you get back a dataframe. Just like for columns.

loc[rows, columns] (rows and columns are their labels not their index location.)
```
means "all"
```
loc is for selecting by row and column.

For instance ufo.loc[0, :] selects row 0, all columns. ufo.loc[0, 5] selects row 0, column 5. ufo.loc[0, [5:7]] selects row 0, columns 5 to 7 inclusive, i.e. 5,6,& 7 ufo.loc[0, :] selects row 0, all columns.

ufo.loc[[0,1,2], 4] selects rows 0, 1 and 2 and only column 4 ufo.loc[[0;2], 4] selects rows 0, 1 and 2 and only column 4 i.e. same thing. ufo.loc[[0:2], :] selects rows 0, 1 and 2 and all columns

ufo.loc['can', 'bc'] select row 'can', and columns 'bc'
ufo.loc[0:2] is bad form. It is equivalent to ufo.loc[0:2, :]
ufo.loc[:, 'City']

This returns the series from the City columns only, but ALL rows.

ufo.loc[:, ["City", "Province"] ] selects all rows, but only two columns, namely 'City' and 'Province'

ufo.loc[:, ["City":"Province"] ] selects all rows, and all columns between 'City' and 'Province'

ufo.loc[:, -1 ] This does NOT work. as there is no columns labelled '-1'

ufo[ufo.City='Toronto'] is the same as ufo.loc[ufo.City=='Toronto', :]

You could also select two or more columns like this: ufo.loc[ufo.City=='Oakland', 'State'] That is also equivalent to ufo[ufo.City=='Oakland'].State but it is considered bad form and may cause problems. Preferred approach is

In [86]: russ2      

Out[86]: 
zone      date   cases  todayCases  deaths  todayDeaths  recovered  active cases/1M  deaths/1M   tests  tests/1M
0             A         0       0           0       0            0          0       0        0        0.0       0         0
1         world  mar30-13  785777        1062   37815           32     165607  582355   100.82        0.0       0         0
2           USA  mar30-13  164253         409    3165            9       5506  155582      496       10.0       0         0
3         Italy  mar30-13  101739           0   11591            0      14620   75528     1683      192.0       0         0
4         Spain  mar30-13   87956           0    7716            0      16780   63460     1881      165.0       0         0
..          ...       ...     ...         ...     ...          ...        ...     ...      ...        ...     ...       ...
682    Michigan  apr25-11   36641           0    3085            0          0   30284        0        0.0  140130     14073
683     Florida  apr25-11   30839         306    1055            9          0   29098        0        0.0  334974     16262
684   Louisiana  apr25-11   26140           0    1660            0          0    9553        0        0.0  142056     30460
685       Texas  apr25-11   23170           0     601            0          0   14544        0        0.0  242547      8698
686  Washington  apr25-11   13176           0     723            0          0   10646        0        0.0  160324     21979

[687 rows x 12 columns]
==================================

         russ2.loc[russ2['zone'] == 'Canada', ['zone', 'date', 'cases', 'active']]       
In [85]: russ2.loc[russ2['zone'] == 'Canada', ['zone', 'date', 'cases', 'active']]                                                                                           
Out[85]: 
       zone      date  cases  active
13   Canada  mar30-13   7474    6268
33   Canada  apr15-21  28205   18262
51   Canada  apr16-21  28379   18390
68   Canada  apr17-09  30106   19182
86   Canada  apr17-09  31927   20074
120  Canada  apr17-18  31642   20004
154  Canada  apr18-20  33383   20706
188  Canada  apr19-13  33951   21235
222  Canada  apr19-18  34813   21423
256  Canada  apr20-11  35056   21626
290  Canada  apr20-22  36829   22553
324  Canada  apr21-10  36831   22555
358  Canada  apr21-23  38422   23400
392  Canada  apr22-07  38422   23400
426  Canada  apr22-13  38967   23447
460  Canada  apr23-10  40190   24230
494  Canada  apr24-09  42110   25202
528  Canada  apr24-10  42110   25202
562  Canada  apr24-20  43888   26117
596  Canada  apr25-11  43888   26117
630  Canada  apr25-11  43888   26117
664  Canada  apr25-11  43888   26117

** NOTE loc[0] is actually comparing the index to the string '0' NOT an index. That is why loc[-1] for the last row will NOT work, as a key '-1' does not exist.

That is why we have iloc, because that IS in fact the index position, NOT an index label numeric indexing and the iloc selector (subsetting by row) covid_df.iloc[:,6] # for the seventh row covid_dv.iloc[-1] is the last row.

Result is a panda series. Panda series are 1-dimensional arrays. Panda series can be summed, averaged, counted, medianed, and have missing values replaced!

Assume x is the series, then, This is done with the x.sum(), x.mean(), x.count(), x.median() is the syntax.

Here is a real example: covid_df.['recovered'].mean or covid_df.['cases'].sum

3.4 combining subsetting

The sytanx for subsetting rows and subsetting columsn can be combined. subst = df.loc[, ] to the left of the column is how to specifiy rows to the right of the comma, is how to specify columns.

So subset = df.loc[:, ['date', 'zone']]

so before the comma we have a : which means everything after the comma we have a list so we subset columns with heading "date" and "zone"

Note: again if you use iloc in place of loc, you will get an error, because 'date' and 'zone' are NOT valid index integers.

3.5 combining with boolean conditions.

df.loc[df['country'] == 'Canada']

== is an equality check (i.e. comparing)

So, this says, pick out rows from your dataset that have Canada as the value for the country column.

3.6 iloc[rows, columns] Where now rows and columns are the interger number.

With iloc, it is exclusive of the second number, inclusive of the first number ufo.iloc[:, 0:4] is for all rows, and columsn 0,1,2,&3 (NOT 4)

ufo.iloc[0:3, : is for rows 0, 1, and 2, and all columns.

3.7 shortcuts that are bad form (but good to know if you see it online)

ufo[['City, 'Province' ] ] don't do this, what is better is: ufo.loc[:, ['City', 'Province'] ] i.e. be explicit is the pythonic way.

ufo[0:2] is also bad form. It is rows, by integer position, but just avoid this. ufo.iloc[0:2, :] is the better form.

3.8 ilex

Allows you to mix integer references with label references.

drinks = pd.read_csv('http://bit.ly/drinksbycountry', index_col='country')

drinks.ix['Albania', 0] row matches index 'Albania', and first column (i.e. column 0)

drinks.ix[1, 'beer_servings']

4 appending

You can join two dataframes even if they have different columns. First the looking at the case when the two dataframes have the SAME columns:

zint = pd.DataFrame.from_dict({'A': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}, orient='index', columns=labels) Results in:

      date  cases  todayCases  deaths  todayDeaths  recovered  active  cases/1M  deaths/1M  tests  tests/1M
A     0      0           0       0            0          0       0         0          0      0         0

Then if russ was a dataframe like this:

                   date    cases  todayCases  deaths  todayDeaths  recovered   active cases/1M  deaths/1M    tests  tests/1M
world          apr24-10  2752899       29855  192267         1348     760900  1799138   353.21          0        0         0
USA            apr24-10   887622        1180   50283           47      89141   748198     2682        152  4785759     14458
Spain          apr24-10   219764        6740   22524          367      92355   104885     4700        482   930230     19896
Italy          apr24-10   189973           0   25549            0      57576   106848     3142        423  1579909     26131

Then, zint = zint.append(russ) would result in, zint being this:

                   date    cases  todayCases  deaths  todayDeaths  recovered   active cases/1M  deaths/1M    tests  tests/1M
A                     0        0           0       0            0          0        0        0          0        0         0
world          apr24-10  2752899       29855  192267         1348     760900  1799138   353.21          0        0         0
USA            apr24-10   887622        1180   50283           47      89141   748198     2682        152  4785759     14458
Spain          apr24-10   219764        6740   22524          367      92355   104885     4700        482   930230     19896
Italy          apr24-10   189973           0   25549            0      57576   106848     3142        423  1579909     26131

4.1 .groupby

titanic_df.groupby('sex').mean() Will group into men and women, and take the mean ages.

Can say titanic_df.groupby(['sex', 'age']) When you group by a field, you specify the field "title". That "title" name comes from __ ??

4.2 conditions

titanic_df[titanic_df['age']<18].groupby(['sex','pclass']).mean Will only consider data where age is < 18, then group them into boys and girls and class, and take the mean survival

5 Running panda commands on the dataframes:

5.1 value_counts

pd.value_counts(titanic_df['survived']).plot.bar() This works will if data has a boolean, (yes/no, true/false) field and you want to count who many of each.

6 Pandas as time series

Read the docs for version 25.3 at:version 0.25.3 as well as pydata.org home. panda.Series( data, index, dtype, copy)

data takes various forms, like ndarray, list, constans
index values must be unique and hashable, same length as data. default np.arrange(n) if no index is passwd
dtype is for data type. If None, data type will be inferred
copy data, Default is false.

6.1 For example, looking at the stock prices.

ms is the dataframe name, then: ms['Adj. Close'].plot() 3

7 Pandas indexing

Without specifying an index, pandas will generate a list of whole numbers as the index. You can reset and index and Pandas will add the whole numbers as a column that becomes the index. df.reset_index()

For example, I had my covid stats dataframe, called 'zint' as this:

In [61]: zint                                                                                                                                        
Out[61]: 
               date    cases  todayCases  deaths  todayDeaths  recovered   active cases/1M  deaths/1M    tests  tests/1M
A                 0        0           0       0            0          0        0        0        0.0        0         0
world      apr20-22  2481287         784  170436           39     646260  1663997   318.36        0.0        0         0
USA        apr20-22   792759           0   42514            0      72389   677856     2395      128.0  4026360     12164
Spain      apr20-22   200210           0   20852            0      80587    98771     4282      446.0   930230     19896
Italy      apr20-22   181228           0   24114            0      48877   108237     2997      399.0  1398024     23122
...             ...      ...         ...     ...          ...        ...      ...      ...        ...      ...       ...
Sweden     apr16-21    12540         613    1333          130        381    10826     1242      132.0    74600      7387
Finland    apr16-21     3369         132      75            3        300     2994      608       14.0    48828      8813
Estonia    apr16-21     1434          34      36            1        133     1265     1081       27.0    36024     27156
Lithuania  apr16-21     1128          37      30            0        178      920      414       11.0    48537     17829
Latvia     apr16-21      675           9       5            0         57      613      358        3.0    31302     16595

[551 rows x 11 columns]

The command: zint.reset_index() resulted in zint being this:

In [62]: russ2=zint.reset_index()                                                                                                                          
Out[62]: 
         index      date    cases  todayCases  deaths  todayDeaths  recovered   active cases/1M  deaths/1M    tests  tests/1M
0            A         0        0           0       0            0          0        0        0        0.0        0         0
1        world  apr20-22  2481287         784  170436           39     646260  1663997   318.36        0.0        0         0
2          USA  apr20-22   792759           0   42514            0      72389   677856     2395      128.0  4026360     12164
3        Spain  apr20-22   200210           0   20852            0      80587    98771     4282      446.0   930230     19896
4        Italy  apr20-22   181228           0   24114            0      48877   108237     2997      399.0  1398024     23122
..         ...       ...      ...         ...     ...          ...        ...      ...      ...        ...      ...       ...
546     Sweden  apr16-21    12540         613    1333          130        381    10826     1242      132.0    74600      7387
547    Finland  apr16-21     3369         132      75            3        300     2994      608       14.0    48828      8813
548    Estonia  apr16-21     1434          34      36            1        133     1265     1081       27.0    36024     27156
549  Lithuania  apr16-21     1128          37      30            0        178      920      414       11.0    48537     17829
550     Latvia  apr16-21      675           9       5            0         57      613      358        3.0    31302     16595

[551 rows x 12 columns]

7.1 renaming a column

The simplest way to rename a column is by editing the .column attribute. df.columns = [list that has as many entries as columns]

   In [71]: labels_12 = ['zone', 'date', 'cases', 'todayCases', 'deaths', 'todayDeaths', 'recovered', 'active', 
    ...:           'cases/1M', 'deaths/1M', 'tests', 'tests/1M']                                                                                     

   In [72]: russ2.columns = labels_12                                                                                                                   
   In [73]: russ2.head()                                                                                                                                
Out[73]: 
    zone      date    cases  todayCases  deaths  todayDeaths  recovered   active cases/1M  deaths/1M    tests  tests/1M
0      A         0        0           0       0            0          0        0        0        0.0        0         0
1  world  apr20-22  2481287         784  170436           39     646260  1663997   318.36        0.0        0         0
2    USA  apr20-22   792759           0   42514            0      72389   677856     2395      128.0  4026360     12164
3  Spain  apr20-22   200210           0   20852            0      80587    98771     4282      446.0   930230     19896
4  Italy  apr20-22   181228           0   24114            0      48877   108237     2997      399.0  1398024     23122

In [74]:

7.2 multilevel indexing:

From geeksforgeeks: In this example, 2 columns(First name and Gender) are added to the index column and later one level is removed by using reset_index() method.

import pandas as pd

empdatafr = pd.read_csv("employees.csv")

empdatafr.set_index(["First Name", "Gender"], inplace = True, append = True, drop= True)

empdatafr.reset_index(level = 2, inplace = True, col_level = 1)

data.head()

Output: The gender column in the index column was replaced as it’s level was 2.

Before reset:

                        Salary    Bonus   Management       Dept
First Name   Gender

0  Douglas    male      50000     20000   False         Electrical
1  Peter      male      51000     20000   False         Electrical
2  Susan      female    50000     20000   False         Electrical
3  Jack       male      65000     22000   True          Marketting
4  Ramesh     male      48000     20000   False         Marketting
5  Gertrude   female    94000     11000   True          Electrical

After reset:

             Gender     Salary    Bonus   Management       Dept
First Name   

0  Douglas    male      50000     20000   False         Electrical
1  Peter      male      51000     20000   False         Electrical
2  Susan      female    50000     20000   False         Electrical
3  Jack       male      65000     22000   True          Marketting
4  Ramesh     male      48000     20000   False         Marketting
5  Gertrude   female    94000     11000   True          Electrical

8 Panda reading in Dictionaries

A python dict can be passed as input, and if no index is specified, then the dictionary keys are taken in a sorted order to construct the pandas index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

9 plotting

9.1 .plot.bar()

For bar plot covid_df.plot.bar() covid_df.drop(['world','USA']).plot(kind='bar', y='deathsPerOneMillion', color='blue')

covid_df.drop(['world']).plot(kind='bar', y='deaths/1M', color='blue')

9.2 .plot.mean()

covid_df.plot.mean()

Old way of doing it as to import pylab, or from pylab import plot then running pylab.show() (after some plot object has been created with a message like: "<matplotlib.axes._{subplots.AxesSubplot} at 0x124e32610>"

pylab.show()

This would open up a new window showing the plot graphically

But, since pylab has been depracated, the proper way is to use matplotlib directly as so:

import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.show()

So basically plt.show() in place of pylab.show()

10 .plot

10.1 Simple plotting

df.plot(kind='scatter',x='column2',y='column3') plt.show()

df.plot(kind='scatter',x='num_children',y='num_pets',color='red') plt.show()

df.plot(kind='bar', y='deaths', color='red')
covid_df.drop(['world','USA']).plot(kind='bar', y='deathsPerOneMillion', color='blue')
russ.drop(['world']).plot(kind='bar', y='deaths/1M', color='blue')
russ.drop(['world']).plot(kind='line',

11 df.pivot

if your data frame is:

0 c1 c2 c3 1 c1.1 c2.1 c3.1 2 c1.2 c2.2 c3.2 3 c1.3 c2.3 c3.3 4 c1.4 c2.4 c3.4 5 c1.5 c2.5 c3.5 6 c1.6 c2.6 c3.6 7 c1.7 c2.7 c3.7 8 c1.8 c2.8 c3.8 9 c1.9 c2.9 c3.9

11.1 Examples from covid



df2 = toplot.pivot_table(values='cases', index='date', columns='zone')

In [84]: df2.drop('world', axis=1)                                                                                                                                                                                         
Out[84]: 
zone      Austria  Belgium   Brazil  California   Canada    China  Estonia  Finland  Florida    France  ...  New York  Pennsylvania  Portugal   Russia     Spain   Sweden  Switzerland    Texas       USA  Washington
date                                                                                                    ...                                                                                                          
apr15-21      NaN  33573.0  28320.0         NaN  28205.0  82295.0   1400.0   3237.0      NaN  147863.0  ...       NaN           NaN       NaN  24490.0  177633.0  11927.0          NaN      NaN  635843.0         NaN
apr16-21      NaN  34809.0  29015.0         NaN  28379.0  82341.0   1434.0   3369.0      NaN  147863.0  ...       NaN           NaN       NaN  27938.0  182816.0  12540.0          NaN      NaN  644806.0         NaN
apr17-09  14595.0  36138.0  32556.0     29175.0  31016.5  82705.5   1459.0   3489.0  24753.0  156498.0  ...  233951.0       29921.0   19022.0  32008.0  187893.5  12878.0      27078.0  17760.0  694115.5     11445.0
apr17-18  14595.0  36138.0  33682.0     28899.0  31642.0  82692.0   1459.0   3489.0  24753.0  147969.0  ...  226593.0       29441.0   19022.0  32008.0  188167.0  13216.0      27078.0  17371.0  700234.0     11152.0
apr18-20  14671.0  37183.0  36722.0     30718.0  33383.0  82719.0   1512.0   3681.0  25492.0  151793.0  ...  241041.0       31731.0   19685.0  36793.0  194416.0  13822.0      27404.0  18679.0  738792.0     11802.0
apr19-13  14749.0  38496.0  36925.0     30831.0  33951.0  82735.0   1528.0   3783.0  25996.0  152578.0  ...  247215.0       32734.0   20206.0  42853.0  195944.0  14385.0      27740.0  18923.0  754263.0     11802.0
apr19-18  14749.0  38496.0  38654.0     30965.0  34813.0  82735.0   1528.0   3783.0  26314.0  152894.0  ...  247215.0       32734.0   20206.0  42853.0  198674.0  14385.0      27740.0  18923.0  762896.0     11802.0
apr20-11  14795.0

In [48]: current                                                                                                                                                             
Out[48]: 
           zone      date  active
1         world  mar30-13  582355
2           USA  mar30-13  155582
3         Italy  mar30-13   75528
4         Spain  mar30-13   63460
5         China  mar30-13    2161
..          ...       ...     ...
682    Michigan  apr25-11   30284
683     Florida  apr25-11   29098
684   Louisiana  apr25-11    9553
685       Texas  apr25-11   14544
686  Washington  apr25-11   10646

[686 rows x 3 columns]

In [49]:                                    
# active-pivot = current.pivot_table(values='', index='', columns='')