In this article, we will look at different ways to adding new column to existing DataFrame in Pandas. Show
Let us create a simple DataFrame that we will use as a reference throughout this article to demonstrate adding new columns into Pandas DataFrame. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df)Output team points runrate wins 0 India 10 0.5 5 1 South Africa 8 1.4 4 2 New Zealand 3 2.0 2 3 England 5 -0.6 2Now that we have created a DataFrame let’s assume that we need to add a new column called “lost”, which holds the count of total matches each team has lost. Method 1: Declare and assign a new list as a columnThe simplest way is to create a new list and assign the list to the new DataFrame column. Let us see how we can achieve this with an example. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # declare a new list and add the values into the list match_lost = [2, 1, 3, 4] # assign the list to the new DataFrame Column df["lost"] = match_lost # Print the new DataFrame print(df)Output team points runrate wins lost 0 India 10 0.5 5 2 1 South Africa 8 1.4 4 1 2 New Zealand 3 2.0 2 3 3 England 5 -0.6 2 4Method 2: Using the DataFrame.insert() methodThe disadvantage of the above approach is that we cannot add the column at the specified position, and by default, the column is inserted towards the end, making it the last column. We can overcome the issue using the pandas.DataFrame.insert() method. This method is useful when you need to insert a new column in a specific position or index. In the below example, let us insert the new column “lost” before the “wins” column. We can achieve this by inserting a new column at index 2. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # insert the new column at the specific position df.insert(3, "lost", [2, 1, 3, 4], True) # Print the new DataFrame print(df)Output team points runrate lost wins 0 India 10 0.5 2 5 1 South Africa 8 1.4 1 4 2 New Zealand 3 2.0 3 2 3 England 5 -0.6 4 2Method 3: Using the DataFrame.assign() methodThe pandas.DataFrame.assign() method is used if we need to create multiple new columns in a DataFrame. This method returns a new object with all original columns in addition to new ones. All the existing columns that are re-assigned will be overwritten. In the below example, we are adding multiple columns to Pandas DataFrame. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # append multiple columns to Pandas DataFrame df2 = df.assign(lost=[2, 1, 3, 4], matches_remaining=[2, 3, 1, 1]) # Print the new DataFrame print(df2)Output team points runrate wins lost matches_remaining 0 India 10 0.5 5 2 2 1 South Africa 8 1.4 4 1 3 2 New Zealand 3 2.0 2 3 1 3 England 5 -0.6 2 4 1Method 4: Using the pandas.concat() methodWe can also leverage the pandas.concat() method to concatenate a new column to a DataFrame by passing axis=1 as an argument. This method returns a new DataFrame after concatenating the columns. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # create a new DataFrame df2 = pd.DataFrame([[1, 2], [2, 1], [3, 4], [0, 3]], columns=['matches_left', 'lost']) # concat and Print the new DataFrame print(pd.concat([df, df2], axis=1))Output team points runrate wins matches_left lost 0 India 10 0.5 5 1 2 1 South Africa 8 1.4 4 2 1 2 New Zealand 3 2.0 2 3 4 3 England 5 -0.6 2 0 3Method 5: Using the DictionaryAnother trick is to create a dictionary to add a new column in Pandas DataFrame. We can use the existing columns as Key to the dictionary and assign values respectively to the new column. # import pandas library import pandas as pd # create pandas DataFrame df = pd.DataFrame({'team': ['India', 'South Africa', 'New Zealand', 'England'], 'points': [10, 8, 3, 5], 'runrate': [0.5, 1.4, 2, -0.6], 'wins': [5, 4, 2, 2]}) # print the DataFrame print(df) # Create a new dictionary with keys as existing column # and the values of new column match_lost = {2: 'India', 1: 'South Africa', 3: 'New Zealand', 0: 'England'} # assign the dictionary to the DataFrame Column df['lost'] = match_lost # print Dataframe print(df)Output team points runrate wins lost 0 India 10 0.5 5 2 1 South Africa 8 1.4 4 1 2 New Zealand 3 2.0 2 3 3 England 5 -0.6 2 0ConclusionIn this article, we saw the 5 approaches creating and assigning a list, insert(), assign(), concat() and dictionary to insert new columns into Pandas DataFrame or overwrite the existing ones. Depending on the need and the requirement, you can choose one of the methods specified which are more suitable.
Our quick data wrangling recipe today covers the topic of adding Python lists to Pandas DataFrames as columns and rows. Creating the dataWe’ll start with a simple dataset that we’ll use throughout this tutorial examples. Go ahead and copy this code into your data analysis Python environment. #Python3 # Import Pandas import pandas as pd # Now, let's create the dataframe sales = pd.DataFrame({ "person": ["Debbie", "Kim", "Dorothy", "Tim"], "budget": [20000, 30000, 35000, 17000]}) sales.head()Let’s also define a simple list for us to later on insert into the DataFrame: actuals_list = [25000,45000, 72000, 85000]List as new column in Pandas DataFrameThe simplest way to insert the list is to assign it to a new column: sales['actuals_1'] = actuals_listNote: Trying to assign a list that doesn’t match the lenght of the Dataframe will result in the following error: ValueError: Length of values does not match length of index List to Dataframe SeriesAn alternative method is to first convert our list into a Pandas Series and then assign the values to a column. #2. Convert to Series actuals_s = pd.Series(actuals_list) # Then assign to the df sales['actuals_2'] = actuals_sInserting the list into specific locations in your DataFrameSo far, the new columns were appended to the rightmost part of the dataframe. That said, we can use the insert method to insert the new data into arbitrary column index position. Example: # insert to index=3 sales.insert(loc=3, column='actuals_3',value = actuals_s) sales,head()Here’s our current data:
Hopefully this was useful, feel free to leave me a comment in case of questions or remarks. |