Categories

See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More
( 4 months ago )

Fill nan values with random value from another DataFrame pandas

General Tech Technology & Software
Max. 2000 characters
Replies

usr_profile.png
Dipti Singh

User

( 4 months ago )

 

I come up with this solution by using random.choice

import random

s=df1.groupby('Area').Company.apply(list).reindex(df.Area).apply(lambda x :random.choice(x) )
s.index=df.index

df.Company=df.Company.fillna(s)

df
Out[200]: 
    index   Company        Area
0       0    Google  Technology
1       1  CocaCola      Drinks
2       2  CocaCola      Drinks
3       3     Apple  Technology
4       4    Google  Technology
5       5  Gatorade      Drinks
6       6      Dell  Technology
7       7     Apple  Technology
8       8  CocaCola      Drinks
9       9     Pepsi      Drinks
10     10    Google  Technology

usr_profile.png
Sophia Chang

User

( 4 months ago )

 

I have a DataFrame with millon of rows and a lot of NaN values. Some example:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     NaN            Drinks
    3     Apple          Technology
    4     NaN            Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     NaN            Drinks
    10    Google         Technology

My idea is to fill Companies NaN values with one of the 2 most common values for its Area.

From example: If the most frequent Companies in Technology area are Apple and Google, I Would like to fill the "df['Area'] == 'Technology'" NaN values with one of that values (randomly)

I've already created a Group By DataFrame with the most common values, it is something like this:

Area          Company
Technology    Google
Technology    Apple
Drinks        Coca Cola
Drinks        Pepsi

The result should be something like this:

index     Company        Area
    0     Google         Technology
    1     Coca Cola      Drinks
    2     Pepsi          Drinks
    3     Apple          Technology
    4     Google         Technology
    5     Gatorade       Drinks
    6     Dell           Technology
    7     Apple          Technology
    8     Coca Cola      Drinks
    9     Pepsi          Drinks
    10    Google         Technology

I hope you can help me.

Thanks!!!

what's your interest


forum_ban8_5d8c5fd7cf6f7.gif