python - Creating pandas DataFrame column from selected values from another column -
say, have dataframe this:
df = pd.dataframe({'a' : list('abcdefghij'), 'b' : (5*[2] + 5*[3])})
and want create column contains values column 'a'
indexed in column 'b'
(5 times 'c' , 5 times 'd'). then, seem natural me this:
df['c'] = df['a'].iloc[df['b']]
but produces error:
cannot reindex duplicate axis
my question is
a) how can that?
b) can learn actual mechanics of pandas indices, opposed intuition?
if understand correctly want this:
in [219]: df['c'] = df.loc[df['b'],'a'].values df out[219]: b c 0 2 c 1 b 2 c 2 c 2 c 3 d 2 c 4 e 2 c 5 f 3 d 6 g 3 d 7 h 3 d 8 3 d 9 j 3 d
as why 'cannot reindex duplicate axis' if observe it's returning:
in [220]: df.loc[df['b'],'a'] out[220]: 2 c 2 c 2 c 2 c 2 c 3 d 3 d 3 d 3 d 3 d name: a, dtype: object
then should clear why moans, index values repeating , pandas trying align index against original df, around can raw values np array calling .values
attribute:
in [221]: df.loc[df['b'],'a'].values out[221]: array(['c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd'], dtype=object)
Comments
Post a Comment