I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.
For example, if I'm given a DataFrame like this:
>>> my_dataframe
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
5 4 8 3
6 8 2 8
7 9 9 10
8 6 6 4
9 10 10 7
I would like to get a list like the one below:
>>> header_list
[y, gdp, cap]
You can get the values as a list by doing:
list(my_dataframe.columns.values)
Also you can simply use:
list(my_dataframe)
df.column_names()
. Is this answer still right or is it outdated - alvas 2016-01-13 06:48
array(['colBoolean','colTinyint', 'colSmallnt', ...], dtype=object)
Davos 2018-05-02 07:20
df.keys().tolist()
is more universal, because it works also for older versions of pandas than 0.16. - StefanK 2018-05-09 08:22
frame.columns.tolist( - Igor Jakovljevic 2018-11-23 09:53
There is a built in method which is the most performant:
my_dataframe.columns.values.tolist()
.columns
returns an Index
, .columns.values
returns an array
and this has a helper function to return a list
.
EDIT
For those who hate typing this is probably the shortest method:
list(df)
DataFrame
iterable has not changed since day one: http://pandas.pydata.org/pandas-docs/stable/basics.html#iteration. The iterable returned from a DataFrame has always been the columns so doing for col in df:
should always behave the same unless the developers have a meltdown so list(df)
is and should still be a valid method. Note that df.keys()
is calling into the internal implementation of the dict-like structure returning the keys which are the columns. Inexplicable downvotes is the collateral damage to be expected on SO so don't worr - EdChum 2018-05-08 09:27
columns
attribute. An hour ago I read about Law of Demeter promoting that the caller should not depend on navigating the internal object model.
list(df)
does explicit type conversion. Notable side effect: execution time and memory consumption increase with dataframe size
df.keys()
method is part of the dict-like nature of a DataFrame
. Notable fact: execution time for df.keys()
is rather constant regardless of dataframe size - part of responsibility of pandas developers - Sascha Gottfried 2018-05-08 11:25
Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()
is the fastest:
In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop
In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop
In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop
In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop
(I still really like the list(dataframe)
though, so thanks EdChum!)
Its gets even simpler (by pandas 0.16.0) :
df.columns.tolist()
will give you the column names in a nice list.
>>> list(my_dataframe)
['y', 'gdp', 'cap']
To list the columns of a dataframe while in debugger mode, use a list comprehension:
>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']
By the way, you can get a sorted list simply by using sorted
:
>>> sorted(my_dataframe)
['cap', 'gdp', 'y']
list(df)
work only with autoincrement dataframes? Or does it work for all dataframes - alvas 2016-01-13 06:49
[c for c in df]
- Alexander 2016-01-13 07:28
That's available as my_dataframe.columns
.
header_list = list(my_dataframe.columns)
yeliabsalohcin 2017-09-05 12:59
It's interesting but df.columns.values.tolist()
is almost 3 times faster then df.columns.tolist()
but I thought that they are the same:
In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 µs per loop
In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 µs per loop
A DataFrame follows the dict-like convention of iterating over the “keys” of the objects.
my_dataframe.keys()
Create a list of keys/columns - object method to_list()
and pythonic way
my_dataframe.keys().to_list()
list(my_dataframe.keys())
Basic iteration on a DataFrame returns column labels
[column for column in my_dataframe]
Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.
xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)
For data exploration in the IPython notebook, my preferred way is this:
sorted(df)
Which will produce an easy to read alphabetically ordered list.
In code I find it more explicit to do
df.columns
Because it tells others reading your code what you are doing.
as answered by Simeon Visser...you could do
list(my_dataframe.columns.values)
or
list(my_dataframe) # for less typing.
But I think most the sweet spot is:
list(my_dataframe.columns)
It is explicit, at the same time not unnecessarily long.
This gives us the names of columns in a list:
list(my_dataframe.columns)
Another function called tolist() can be used too:
my_dataframe.columns.tolist()
n = []
for i in my_dataframe.columns:
n.append(i)
print n
[n for n in dataframe.columns]
Anton Protopopov 2015-12-04 21:31
I feel question deserves additional explanation.
As @fixxxer noted, the answer depends on the pandas version you are using in your project.
Which you can get with pd.__version__
command.
If you are for some reason like me (on debian jessie I use 0.14.1) using older version of pandas than 0.16.0, then you need to use:
df.keys().tolist()
because there is no df.columns
method implemented yet.
The advantage of this keys method is, that it works even in newer version of pandas, so it's more universal.
For a quick, neat, visual check, try this:
for col in df.columns:
print col
This solution lists all the columns of your object my_dataframe:
print(list(my_dataframe))
Even though the solution that was provided above is nice. I would also expect something like frame.column_names() to be a function in pandas, but since it is not, maybe it would be nice to use the following syntax. It somehow preserves the feeling that you are using pandas in a proper way by calling the "tolist" function: frame.columns.tolist()
frame.columns.tolist()
can use index attributes
df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
index=['a', 'b', 'c'])