Let There Be Data (Frames)

Let There Be Data (Frames)#

In the previous tutorials we showed how the Pandas class objects (Series and Data Frames) are constructed from Numpy objects (arrays) and other attributes.

We focused on the maxims:

“a Pandas Series is a numpy array, plus a name attribute and an array-like index”

…and…

“a Pandas DataFrame is just a dictionary-like collection of Series”.

This page will look at several different ways of constructing Data Frames. All of these use the pd.DataFrame() constructor but supply it with different “ingredients”. This influences the specific collection of attributes that the resultant Data Frame will have.

# Import libraries
import numpy as np
import pandas as pd

Reading in data from a file#

The simplest, probably most common, and easiest way to create a Data Frame is to use a pd.read_* function to import data from a file.

.csv files are common way of storing data, and (as we have seen) can be imported using the creatively named pd.read_csv() function:

# Read in data the boring way
df_from_file = pd.read_csv('data/airline_passengers.csv')
df_from_file

	Month	Thousands of Passengers
0	1949-01	112
1	1949-02	118
2	1949-03	132
3	1949-04	129
4	1949-05	121
...	...	...
139	1960-08	606
140	1960-09	508
141	1960-10	461
142	1960-11	390
143	1960-12	432

144 rows × 2 columns

Pandas, as a major Python data science library, has a large array of read_* functions, for importing data stored in different formats.

# Names in Pandas module starting with "read_"
[k for k in dir(pd) if k.startswith('read_')]

['read_clipboard',
 'read_csv',
 'read_excel',
 'read_feather',
 'read_fwf',
 'read_hdf',
 'read_html',
 'read_iceberg',
 'read_json',
 'read_orc',
 'read_parquet',
 'read_pickle',
 'read_sas',
 'read_spss',
 'read_sql',
 'read_sql_query',
 'read_sql_table',
 'read_stata',
 'read_table',
 'read_xml']

In other situations, and to deepen our understanding of Data Frame construction, let’s look at more elaborate, artisanal ways of creating Data Frames…

Creating a blank Data Frame#

Another very simple way to create a Data Frame is by using the pd.DataFrame() constructor with no arguments:

# Calling the constructor with no arguments
blank_df = pd.DataFrame()
blank_df

Perhaps unsurprisingly, this returns a strange, blank output.

Again, unsurprisingly, many of the attributes of the Data Frame are also blank.

For instance, the index:

# Show the blank index
blank_df.index

RangeIndex(start=0, stop=0, step=1)

Ditto for the columns attribute:

# Show the blank columns.
blank_df.columns

RangeIndex(start=0, stop=0, step=1)

We can add new columns (e.g. new Pandas Series) into this blank Data Frame by using direct indexing on the left hand side (LHS). E.g.

# Create a new column in the Data Frame.
blank_df['new_column'] = np.array([1, 2, 3])
blank_df

	new_column
0	1
1	2
2	3

We used a Numpy array to construct this new column, however, as we know, Data Frames are a dictionary-like collection of Series, so Pandas can represent the data as a Pandas Series:

# Show the type of df['new_column'].
new_col = blank_df['new_column']
type(new_col)

pandas.Series

The string which we used as the column name (e.g. new_column) has become the name attribute of this new Series:

# Show the `name` of the column.
new_col.name

'new_column'

…and the numpy array we supplied has become the .values of the Series:

# Show the `values` in the column.
new_col.values

array([1, 2, 3])

Pandas has also automatically created a default RangeIndex for the Data Frame, because we did not specify what it should use as an index:

blank_df.index

RangeIndex(start=0, stop=3, step=1)

As you saw in The Pandas from Numpy page, Series extracted from Data Frames inherit the .index of the Data Frame:

new_col.index

RangeIndex(start=0, stop=3, step=1)

If we construct Data Frames using this method (“create a blank Data Frame, add the data later”), then any new columns we add must have equal numbers of elements. This must be so, in order that the new column can share an index with the old.

# Add another new column with correct number of elements.
blank_df['another_new_column'] = np.array(['A', 'B', 'C'])
blank_df

	new_column	another_new_column
0	1	A
1	2	B
2	3	C

If the number of elements differs, then Pandas will throw an error:

# ValueError from wrong number of elements on RHS.
blank_df['a_further_new_column'] = np.array([4, 5 , 6, 7])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 2
      1 # ValueError from wrong number of elements on RHS.
----> 2 blank_df['a_further_new_column'] = np.array([4, 5 , 6, 7])

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/frame.py:4672, in DataFrame.__setitem__(self, key, value)
   4669     self._setitem_array([key], value)
   4670 else:
   4671     # set column
-> 4672     self._set_item(key, value)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/frame.py:4872, in DataFrame._set_item(self, key, value)
   4862 def _set_item(self, key, value) -> None:
   4863     """
   4864     Add series to DataFrame in specified column.
   4865 
   (...)   4870     ensure homogeneity.
   4871     """
-> 4872     value, refs = self._sanitize_column(value)
   4874     if (
   4875         key in self.columns
   4876         and value.ndim == 1
   4877         and not isinstance(value.dtype, ExtensionDtype)
   4878     ):
   4879         # broadcast across multiple columns if necessary
   4880         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/frame.py:5742, in DataFrame._sanitize_column(self, value)
   5739     return _reindex_for_setitem(value, self.index)
   5741 if is_list_like(value):
-> 5742     com.require_length_match(value, self.index)
   5743 return sanitize_array(value, self.index, copy=True, allow_2d=True), None

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/common.py:601, in require_length_match(data, index)
    597 """
    598 Check the length of data matches the length of the index.
    599 """
    600 if len(data) != len(index):
--> 601     raise ValueError(
    602         "Length of values "
    603         f"({len(data)}) "
    604         "does not match length of index "
    605         f"({len(index)})"
    606     )

ValueError: Length of values (4) does not match length of index (3)

Notice the text of this error: ValueError: Length of values (4) does not match length of index (3). The error is caused because all columns must share an index, to facilitate the label-based indexing (via .loc) that we have seen on previous pages.

We want to avoid the pitfalls of integer indices, such as RangeIndex (e.g. misalignment between the integer location of data, and the numerical index label of that data). To do this, we can specify a non-integer values for the index, after we have created the Data Frame.

# Set the index
blank_df.index = ['Person_1', 'Person_2', 'Person_3']
blank_df

	new_column	another_new_column
Person_1	1	A
Person_2	2	B
Person_3	3	C

We can also specify the index directly when we make the “blank” Data Frame:

df_again = pd.DataFrame(index=['Person_1', 'Person_2', 'Person_3'])
df_again


Person_1
Person_2
Person_3

This creates a Data Frame with only an index, which data can then be added to:

df_again['new_column'] = np.array([1, 2, 3])
df_again

	new_column
Person_1	1
Person_2	2
Person_3	3

Because all Series/columns in the Data Frame must share an index, Pandas will predictably throw an error if try to use something that is the wrong length/shape to be a valid index:

# ValueError because we have specified the wrong number of index elements.
blank_df.index = ['Person_1', 'Person_2', 'Person_3', 'Person_4']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[18], line 2
      1 # ValueError because we have specified the wrong number of index elements.
----> 2 blank_df.index = ['Person_1', 'Person_2', 'Person_3', 'Person_4']

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/generic.py:6220, in NDFrame.__setattr__(self, name, value)
   6218 try:
   6219     object.__getattribute__(self, name)
-> 6220     return object.__setattr__(self, name, value)
   6221 except AttributeError:
   6222     pass

File pandas/_libs/properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/generic.py:766, in NDFrame._set_axis(self, axis, labels)
    761 """
    762 This is called from the cython code when we set the `index` attribute
    763 directly, e.g. `series.index = [1, 2, 3]`.
    764 """
    765 labels = ensure_index(labels)
--> 766 self._mgr.set_axis(axis, labels)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/internals/managers.py:273, in BaseBlockManager.set_axis(self, axis, new_labels)
    271 def set_axis(self, axis: AxisInt, new_labels: Index) -> None:
    272     # Caller is responsible for ensuring we have an Index object.
--> 273     self._validate_set_axis(axis, new_labels)
    274     self.axes[axis] = new_labels

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/internals/managers.py:288, in BaseBlockManager._validate_set_axis(self, axis, new_labels)
    285     pass
    287 elif new_len != old_len:
--> 288     raise ValueError(
    289         f"Length mismatch: Expected axis has {old_len} elements, new "
    290         f"values have {new_len} elements"
    291     )

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Again, the error that Pandas gives us here is informative: ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements. (Unfortunately, not all Pandas errors are as obvious as this one).

Constructing a Data Frame from an array#

Remember (from .loc and .iloc with Data Frames) that a Pandas Data Frame can be considered a view onto a two-dimensional array.

For example, the .values attribute of a Data Frame returns a two-dimensional Numpy array with a copy of the underlying data

# Select the first 10 rows of the loaded Data Frame for brevity
early_passengers_df = df_from_file.head(10)
# Show this as a 2D array.
early_passengers_df.values

array([['1949-01', 112],
       ['1949-02', 118],
       ['1949-03', 132],
       ['1949-04', 129],
       ['1949-05', 121],
       ['1949-06', 135],
       ['1949-07', 148],
       ['1949-08', 148],
       ['1949-09', 136],
       ['1949-10', 119]], dtype=object)

In a similar way, if you pass a Numpy array as the first argument to the Data Frame constructor, Pandas will assume you are passing this underlying 2D data array.

two_d_arr = np.array([[1, 2, 3], [11, 21, 31], [101, 102, 103]])
two_d_arr

array([[  1,   2,   3],
       [ 11,  21,  31],
       [101, 102, 103]])

# Construct Data Frame from data in two dimensional array.
default_df = pd.DataFrame(two_d_arr)
default_df

	0	1	2
0	1	2	3
1	11	21	31
2	101	102	103

Notice that Pandas constructed a default Index (integer row labels), because we did not pass one, and a default and corresponding set of column labels. In fact these default column labels are also integers, of which more soon. For now, let us make this Data Frame more standard by giving string column labels using the columns= argument to the constructor:

# Naming the columns when constructing from 2D array.
pd.DataFrame(two_d_arr, columns=['First', 'Second', 'Third'])

	First	Second	Third
0	1	2	3
1	11	21	31
2	101	102	103

Better still, we can add meaningful row labels by using the index= argument:

# Naming the columns and rows when constructing from 2D array.
pd.DataFrame(two_d_arr,
             columns=['First', 'Second', 'Third'],
             index=['Row 1', 'Row 2', 'Row 3'])

	First	Second	Third
Row 1	1	2	3
Row 2	11	21	31
Row 3	101	102	103

If you pass a 1D array to the constructor, it assumes you mean this as one column of a 2D array:

pd.DataFrame([10, 20, 20])

	0
0	10
1	20
2	20

Constructing a Data Frame from a dictionary of Numpy arrays#

Another common way to construct Data Frames is to use a dictionary.

When we do this, the keys of the dictionary become the column names (and therefore the name attribute of the Series that constitutes a given column); and the values of the dictionary become the values attribute of a given column.

First, let’s make a dictionary:

# Make a dictionary, using the keys "A" and "B" and two Numpy arrays for the values
dictionary = {'A': np.array([1, 2, 3, 4]),
              'B': np.array([5, 6, 7, 8])}
dictionary

{'A': array([1, 2, 3, 4]), 'B': array([5, 6, 7, 8])}

Here are the keys and values of the dictionary, containing this toy data:

# Show the keys of the dictionary
dictionary.keys()

dict_keys(['A', 'B'])

# Show the values of the dictionary
dictionary.values()

dict_values([array([1, 2, 3, 4]), array([5, 6, 7, 8])])

We can pass this dictionary to the pd.DataFrame() constructor. As noted above, the keys will become the name attribute of each column (where each column is a Pandas Series). The values will become the .values attribute of each column:

# Construction from a dictionary
df3 = pd.DataFrame(dictionary)
df3

	A	B
0	1	5
1	2	6
2	3	7
3	4	8

As we know, the Data Frame itself is just a dictionary-like collection of Series:

# Show one column/Series
df3['A']

  1
  2
  3
  4
Name: A, dtype: int64

Each Series inherits its name attribute from its key in the original dictionary:

df3['A'].name

'A'

…and its .values attribute from the values in the original dictionary:

df3['A'].values

array([1, 2, 3, 4])

Constructing a Data Frame from a dictionary of Pandas series#

We can also use Pandas Series as the values in a dictionary (rather than Numpy arrays), in order to build a Data Frame. Because Pandas Series contain a Numpy array plus additional attributes, like an index, we need to be aware of this when using them to create Data Frames, as conflicts between the indexes of different Series can lead to errors.

Let’s build a Series with the familiar three-letter country codes, the country names, and the HDI data:

# Make an array containing the country codes
country_codes_array = np.array(['AUS', 'BRA', 'CAN',
                                'CHN', 'DEU', 'ESP',
                                'FRA', 'GBR', 'IND',
                                'ITA', 'JPN', 'KOR',
                                'MEX', 'RUS', 'USA'])

# Make an array containing the country names
country_names_array = np.array(['Australia', 'Brazil', 'Canada',
                                'China', 'Germany', 'Spain',
                                'France', 'United Kingdom', 'India',
                                'Italy', 'Japan', 'South Korea',
                                'Mexico', 'Russia', 'United States'])

As previously, we will use the country codes as an index:

# Build a Series of the country names
country_names_series = pd.Series(country_names_array,
                                index=country_codes_array)
country_names_series

AUS         Australia
BRA            Brazil
CAN            Canada
CHN             China
DEU           Germany
ESP             Spain
FRA            France
GBR    United Kingdom
IND             India
ITA             Italy
JPN             Japan
KOR       South Korea
MEX            Mexico
RUS            Russia
USA     United States
dtype: str

Now, let’s do the same for the HDI scores:

# Human Development Index Scores for each country
hdis_array = np.array([0.896, 0.668, 0.89 , 0.586,
                       0.844, 0.89 , 0.49 , 0.842,
                       0.883, 0.709, 0.733, 0.824,
                       0.828, 0.863, 0.894])

Here also we will use the country codes as the index:

hdi_series = pd.Series(hdis_array, index=country_codes_array)
hdi_series

AUS    0.896
BRA    0.668
CAN    0.890
CHN    0.586
DEU    0.844
ESP    0.890
FRA    0.490
GBR    0.842
IND    0.883
ITA    0.709
JPN    0.733
KOR    0.824
MEX    0.828
RUS    0.863
USA    0.894
dtype: float64

We can then create the Data Frame by using the Series as values in a dictionary, and passing that dictionary to the pd.DataFrame() constructor:

df4 = pd.DataFrame({'country_names': country_names_series,
                    'HDI': hdi_series})
df4

	country_names	HDI
AUS	Australia	0.896
BRA	Brazil	0.668
CAN	Canada	0.890
CHN	China	0.586
DEU	Germany	0.844
ESP	Spain	0.890
FRA	France	0.490
GBR	United Kingdom	0.842
IND	India	0.883
ITA	Italy	0.709
JPN	Japan	0.733
KOR	South Korea	0.824
MEX	Mexico	0.828
RUS	Russia	0.863
USA	United States	0.894

However, it is very important when using this method to ensure that all the Series share an index.

Strange things can happen if they do not.

Let’s adjust the hdi_series to give it a numerical index:

# Adjust the `hdi_series` to have a numerical index
# Copy the Series with the Series `.copy` method.
hdi_with_int_index = hdi_series.copy()
hdi_with_int_index.index = np.arange(len(hdi_series))
hdi_with_int_index

   0.896
   0.668
   0.890
   0.586
   0.844
   0.890
   0.490
   0.842
   0.883
   0.709
  0.733
  0.824
  0.828
  0.863
  0.894
dtype: float64

For the latest Pandas (2.2.3 at time of writing), Pandas will give an error if we try to construct a Data Frame from a dictionary with these two Series as the values:

# TypeError if we construct a Data Frame using Series without matching indexes
df5 = pd.DataFrame({'country_names': country_names_series,
                    'HDI': hdi_with_int_index})

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 2
      1 # TypeError if we construct a Data Frame using Series without matching indexes
----> 2 df5 = pd.DataFrame({'country_names': country_names_series,
      3                     'HDI': hdi_with_int_index})

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/frame.py:769, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    763     mgr = self._init_mgr(
    764         data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
    765     )
    767 elif isinstance(data, dict):
    768     # GH#38939 de facto copy defaults to False only in non-dict cases
--> 769     mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy)
    770 elif isinstance(data, ma.MaskedArray):
    771     from numpy.ma import mrecords

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/internals/construction.py:447, in dict_to_mgr(data, index, columns, dtype, copy)
    428 if copy:
    429     # We only need to copy arrays that will not get consolidated, i.e.
    430     #  only EA arrays
    431     arrays = [
    432         (
    433             x.copy()
   (...)    444         for x in arrays
    445     ]
--> 447 return arrays_to_mgr(arrays, columns, index, dtype=dtype, consolidate=copy)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/internals/construction.py:112, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, consolidate)
    109 if verify_integrity:
    110     # figure out the index, if necessary
    111     if index is None:
--> 112         index = _extract_index(arrays)
    113     else:
    114         index = ensure_index(index)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/internals/construction.py:614, in _extract_index(data)
    611     raise ValueError("If using all scalar values, you must pass an index")
    613 if have_series:
--> 614     index = union_indexes(indexes)
    615 elif have_dicts:
    616     index = union_indexes(indexes, sort=False)

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/indexes/api.py:261, in union_indexes(indexes, sort)
    259         index = index.append(diff.unique())
    260     if sort:
--> 261         index = index.sort_values()
    262 else:
    263     index = indexes[0]

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/indexes/base.py:5974, in Index.sort_values(self, return_indexer, ascending, na_position, key)
   5971 # GH 35584. Sort missing values according to na_position kwarg
   5972 # ignore na_position for MultiIndex
   5973 if not isinstance(self, ABCMultiIndex):
-> 5974     _as = nargsort(
   5975         items=self, ascending=ascending, na_position=na_position, key=key
   5976     )
   5977 else:
   5978     idx = cast(Index, ensure_key_mapped(self, key))

File /opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/pandas/core/sorting.py:442, in nargsort(items, kind, ascending, na_position, key, mask)
    440     non_nans = non_nans[::-1]
    441     non_nan_idx = non_nan_idx[::-1]
--> 442 indexer = non_nan_idx[non_nans.argsort(kind=kind)]
    443 if not ascending:
    444     indexer = indexer[::-1]

TypeError: '<' not supported between instances of 'int' and 'str'

Exercise 10

In the cell above, at least at time of writing, you get the following error:

TypeError: '<' not supported between instances of 'int' and 'str'

This occurred when we passed one Series with int-type Index values, and another with str-type Index values.

Reflect back on the first exercise in the Pandas from Numpy page. Why do you think Pandas is comparing ints to strs as it creates the Data Frame?

Solution to Exercise 10

Working through the indices exercise should have revealed that Pandas follows something like the following algorithm, when dealing with the .index of different Series intended for a Data Frame:

First check if the Series Indices are the same. If so, use the Index of any Series.
If they are not the same, first sort all Series by their Index values, and use the resulting sorted Index.

The first of these two steps will involve comparing int to str, hence the error.

Remember each index label is a identifier for each row of the Data Frame. Pandas is trying to compare the indices of the two series in order to match corresponding rows, and failing, because it cannot compare the string index of country_names_series to the (newly set) integer series of hdi_series.

Later on we will see further signs that Pandas is trying to match rows between series by using the index.

Constructing a Data Frame from a single Pandas series#

pd.DataFrame has a special case in which you pass a single Series as the data argument.

df_single = pd.DataFrame(hdi_series)
df_single

	0
AUS	0.896
BRA	0.668
CAN	0.890
CHN	0.586
DEU	0.844
ESP	0.890
FRA	0.490
GBR	0.842
IND	0.883
ITA	0.709
JPN	0.733
KOR	0.824
MEX	0.828
RUS	0.863
USA	0.894

Be careful - as you will see below, if you pass a sequence of Series, then the Series become the rows. Here, the single Series becomes a single column in the Data Frame.

The column name comes from the Series name:

hdi_series.name

As you remember, Series have an optional .name (for which the default is None). For example:

hdi_series_no_name = pd.Series(hdis_array, index=country_codes_array)
hdi_series_no_name.name is None

True

If you pass a Series with no .name (.name == None) then Panda must make a default column name. It uses the same default for column names as it does for row names, that is, a RangeIndex containing integers, where, in this case, it only contains the integer value 0:

df_single_no_name = pd.DataFrame(hdi_series_no_name)
df_single_no_name

	0
AUS	0.896
BRA	0.668
CAN	0.890
CHN	0.586
DEU	0.844
ESP	0.890
FRA	0.490
GBR	0.842
IND	0.883
ITA	0.709
JPN	0.733
KOR	0.824
MEX	0.828
RUS	0.863
USA	0.894

df_single_no_name.columns

RangeIndex(start=0, stop=1, step=1)

Indexing for this column, with an integer label, is likely to become confusing:

# Getting the column by label.
df_single_no_name.loc[:, 0]

AUS    0.896
BRA    0.668
CAN    0.890
CHN    0.586
DEU    0.844
ESP    0.890
FRA    0.490
GBR    0.842
IND    0.883
ITA    0.709
JPN    0.733
KOR    0.824
MEX    0.828
RUS    0.863
USA    0.894
Name: 0, dtype: float64

Or even this (which is very confusing - direct indexing with column name):

# Direct indexing using column name, where name is integer 0
df_single_no_name[0]

AUS    0.896
BRA    0.668
CAN    0.890
CHN    0.586
DEU    0.844
ESP    0.890
FRA    0.490
GBR    0.842
IND    0.883
ITA    0.709
JPN    0.733
KOR    0.824
MEX    0.828
RUS    0.863
USA    0.894
Name: 0, dtype: float64

It’s usually advisable to either - set the Series name when constructing the Series, or later, with (e.g.) hdi_series.name = 'Human Development Index' - or set the name explicitly to pd.DataFrame using the columns= argument:

# Setting the column name or names on constructing the Data Frame.
df_single_now_named = pd.DataFrame(hdi_series_no_name,
                                   columns=['My HDI'])
df_single_now_named

	My HDI
AUS	0.896
BRA	0.668
CAN	0.890
CHN	0.586
DEU	0.844
ESP	0.890
FRA	0.490
GBR	0.842
IND	0.883
ITA	0.709
JPN	0.733
KOR	0.824
MEX	0.828
RUS	0.863
USA	0.894

Constructing a Data Frame from a sequence of Pandas series#

Series have an optional .name (for which the default is None).

If we specify a .name for each Series, then we can pass a sequence of these named Series to pd.DataFrame; Pandas interprets these Series as rows in the Data Frame. For example:

# Set not-default names for the Series.
country_names_series.name = 'country_names'
hdi_series.name = 'HDI'
df5 = pd.DataFrame([country_names_series, hdi_series])
df5

	AUS	BRA	CAN	CHN	DEU	ESP	FRA	GBR	IND	ITA	JPN	KOR	MEX	RUS	USA
country_names	Australia	Brazil	Canada	China	Germany	Spain	France	United Kingdom	India	Italy	Japan	South Korea	Mexico	Russia	United States
HDI	0.896	0.668	0.89	0.586	0.844	0.89	0.49	0.842	0.883	0.709	0.733	0.824	0.828	0.863	0.894

Notice the .names of the Series become the .index values of the Data Frame (the row labels). The .index of the two Series become the column labels. To get the same effect as we have had, up until now, we can transpose the Data Frame, so that the rows become columns, and the columns become the rows:

# .T is the transpose attribute of the Data Frame.  It returns a new, transposed Data Frame.
df6 = df5.T
df6

	country_names	HDI
AUS	Australia	0.896
BRA	Brazil	0.668
CAN	Canada	0.89
CHN	China	0.586
DEU	Germany	0.844
ESP	Spain	0.89
FRA	France	0.49
GBR	United Kingdom	0.842
IND	India	0.883
ITA	Italy	0.709
JPN	Japan	0.733
KOR	South Korea	0.824
MEX	Mexico	0.828
RUS	Russia	0.863
USA	United States	0.894

Summary#

This page has looked at different methods of constructing Data Frames, and how these affect different attributes of the Pandas Series that constitute each Data Frame.