Open13

pandasとplotlyを触る

Kumamoto-HamachiKumamoto-Hamachi

DataFrame Series column row

Import the package, aka import pandas as pd
A table of data is stored as a pandas DataFrame
Each column in a DataFrame is a Series
You can do things by applying a method to a DataFrame or Series

When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the DataFrame.

https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html

Each column in a DataFrame is a Series

When selecting a single column of a pandas DataFrame, the result is a pandas Series.

df = pd.DataFrame(
    {
        "Name": [
            "Braund, Mr. Owen Harris",
            "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
        "Age": [22, 35, 58],
        "Sex": ["male", "male", "female"],
    }
)

print(df['Age'])  # debug

0    22
1    35
2    58
Name: Age, dtype: int64

A pandas Series has no column labels, as it is just a single column of a DataFrame. A Series does have row labels.

Kumamoto-HamachiKumamoto-Hamachi

How do I read and write tabular data?

Getting data in to pandas from many different file formats or data sources is supported by read_* functions.
Exporting data out of pandas is provided by different to_*methods.
The head/tail/info methods and the dtypes attribute are convenient for a first check.

Whereas read_* functions are used to read data to pandas, the to_* methods are used to store data.

Kumamoto-HamachiKumamoto-Hamachi

How do I select a subset of a DataFrame?

The selection returned a DataFrame with 891 rows and 2 columns. Remember, a DataFrame is 2-dimensional with both a row and column dimension.

In [11]: titanic[["Age", "Sex"]].shape
Out[11]: (891, 2)

titanic["Age"] > 35 checks for which rows the Age column has a value larger than 35:

Kumamoto-HamachiKumamoto-Hamachi

How do I select a subset of a DataFrame?

The selection returned a DataFrame with 891 rows and 2 columns. Remember, a DataFrame is 2-dimensional with both a row and column dimension.

In [11]: titanic[["Age", "Sex"]].shape
Out[11]: (891, 2)

titanic["Age"] > 35 checks for which rows the Age column has a value larger than 35: