python - Creating pandas DataFrame column from selected values from another column -


say, have dataframe this:

df = pd.dataframe({'a' : list('abcdefghij'), 'b' : (5*[2] + 5*[3])}) 

and want create column contains values column 'a' indexed in column 'b' (5 times 'c' , 5 times 'd'). then, seem natural me this:

df['c'] = df['a'].iloc[df['b']] 

but produces error:

cannot reindex duplicate axis 

my question is

a) how can that?

b) can learn actual mechanics of pandas indices, opposed intuition?

if understand correctly want this:

in [219]: df['c'] = df.loc[df['b'],'a'].values df  out[219]:     b  c 0   2  c 1  b  2  c 2  c  2  c 3  d  2  c 4  e  2  c 5  f  3  d 6  g  3  d 7  h  3  d 8   3  d 9  j  3  d 

as why 'cannot reindex duplicate axis' if observe it's returning:

in [220]: df.loc[df['b'],'a']  out[220]: 2    c 2    c 2    c 2    c 2    c 3    d 3    d 3    d 3    d 3    d name: a, dtype: object 

then should clear why moans, index values repeating , pandas trying align index against original df, around can raw values np array calling .values attribute:

in [221]: df.loc[df['b'],'a'].values  out[221]: array(['c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd'], dtype=object) 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -