Get list from pandas DataFrame column headers - 【StackMirror】|python|pandas|dataframe

726

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

For example, if I'm given a DataFrame like this:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would like to get a list like the one below:

>>> header_list
[y, gdp, cap]

2013-10-20 21:18
by natsuki_2002

my_dataframe.columns.tolist( - tagoma 2017-07-26 19:47

df.columns works in the latest pandas version - Nick 2018-12-10 14:14

1256

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use:

list(my_dataframe)

2013-10-20 21:23
by Simeon Visser

Why does this doc not have columns as an attribute - Tjorriemorrie 2014-11-21 08:30

@Tjorriemorrie: I'm not sure, it may have to do with the way they automatically generate their documentation. It is mentioned in other places though: http://pandas.pydata.org/pandas-docs/stable/basics.html#attributes-and-the-raw-ndarray- - Simeon Visser 2014-11-21 10:18

I would have expect something like df.column_names(). Is this answer still right or is it outdated - alvas 2016-01-13 06:48

@alvas there are various other ways to do it (see other answers on this page) but as far as I know there isn't a method on the dataframe directly to produce the list - Simeon Visser 2016-01-13 09:30

Importantly, this preserves the column order - WindChimes 2016-01-25 13:07

I tried using this with unittest assertListEqual to check the headers in a df matched an expected list, and it tells me it's not a list, but rather a sequence, it looks like array(['colBoolean','colTinyint', 'colSmallnt', ...], dtype=object)Davos 2018-05-02 07:20

df.keys().tolist() is more universal, because it works also for older versions of pandas than 0.16. - StefanK 2018-05-09 08:22

Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function:

frame.columns.tolist( - Igor Jakovljevic 2018-11-23 09:53

304

There is a built in method which is the most performant:

my_dataframe.columns.values.tolist()

.columns returns an Index, .columns.values returns an array and this has a helper function to return a list.

EDIT

For those who hate typing this is probably the shortest method:

list(df)

2013-10-20 22:25
by EdChum

Downvoter care to explain - EdChum 2016-01-08 08:48

Did not down vote, but want to explain: do not rely on implementation details, use "public interface" of DataFrame. Think about the beauty of df.keys( - Sascha Gottfried 2018-05-08 09:19

@SaschaGottfried the implementation of the DataFrame iterable has not changed since day one: http://pandas.pydata.org/pandas-docs/stable/basics.html#iteration. The iterable returned from a DataFrame has always been the columns so doing for col in df: should always behave the same unless the developers have a meltdown so list(df) is and should still be a valid method. Note that df.keys() is calling into the internal implementation of the dict-like structure returning the keys which are the columns. Inexplicable downvotes is the collateral damage to be expected on SO so don't worr - EdChum 2018-05-08 09:27

I was refering to the implementation details of columns attribute. An hour ago I read about Law of Demeter promoting that the caller should not depend on navigating the internal object model.

list(df) does explicit type conversion. Notable side effect: execution time and memory consumption increase with dataframe size

df.keys() method is part of the dict-like nature of a DataFrame. Notable fact: execution time for df.keys() is rather constant regardless of dataframe size - part of responsibility of pandas developers - Sascha Gottfried 2018-05-08 11:25

@SaschaGottfried I can add this to my answer and credit you seeing as no one else has included thi - EdChum 2018-05-08 12:16

I can see value in given answer as well as in comments - no need to change anything - Sascha Gottfried 2018-05-08 12:54

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist() is the fastest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(I still really like the list(dataframe) though, so thanks EdChum!)

2014-12-01 20:31
by tegan

Its gets even simpler (by pandas 0.16.0) :

df.columns.tolist()

will give you the column names in a nice list.

2015-04-07 14:50
by fixxxer

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

2015-05-28 15:58
by Alexander

Would that list(df) work only with autoincrement dataframes? Or does it work for all dataframes - alvas 2016-01-13 06:49

Should work for all. When you are in the debugger, however, you need to use a list comprehension [c for c in df] - Alexander 2016-01-13 07:28

Thanks, yep it works for all - alvas 2016-01-13 07:37

That's available as my_dataframe.columns.

2013-10-20 21:20
by BrenBarn

And explicitly as a list by header_list = list(my_dataframe.columns)yeliabsalohcin 2017-09-05 12:59

It's interesting but df.columns.values.tolist() is almost 3 times faster then df.columns.tolist() but I thought that they are the same:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop

2015-12-04 21:41
by Anton Protopopov

A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.

my_dataframe.keys()

Create a list of keys/columns - object method to_list() and pythonic way

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iteration on a DataFrame returns column labels

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

2014-01-23 17:23
by Sascha Gottfried

In the Notebook

For data exploration in the IPython notebook, my preferred way is this:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

In a code repository

In code I find it more explicit to do

df.columns

Because it tells others reading your code what you are doing.

2016-03-30 07:19
by firelynx

as answered by Simeon Visser...you could do

list(my_dataframe.columns.values)

list(my_dataframe) # for less typing.

But I think most the sweet spot is:

list(my_dataframe.columns)

It is explicit, at the same time not unnecessarily long.

2018-02-16 18:36
by Vivek

This gives us the names of columns in a list:

list(my_dataframe.columns)

Another function called tolist() can be used too:

my_dataframe.columns.tolist()

2018-08-22 20:23
by Harikrishna

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

2013-10-20 21:43
by user21988

please replace it with a list comprehension - Sascha Gottfried 2014-01-23 16:22

change your first 3 lines to [n for n in dataframe.columns]Anton Protopopov 2015-12-04 21:31

I feel question deserves additional explanation.

As @fixxxer noted, the answer depends on the pandas version you are using in your project. Which you can get with pd.__version__ command.

If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:

df.keys().tolist() because there is no df.columns method implemented yet.

The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.

2017-12-13 14:47
by StefanK

For a quick, neat, visual check, try this:

for col in df.columns:
    print col

2018-08-22 16:17
by Joseph True

This solution lists all the columns of your object my_dataframe:

print(list(my_dataframe))

2018-06-11 06:26
by Sunitha G

frame.columns.tolist()

2019-02-14 10:58
by Igor Jakovljevic

-2

can use index attributes

df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
                 index=['a', 'b', 'c'])

2016-12-01 14:21
by Anirudh k v