python - Pandas: Difference between pivot and pivot_table. Why is only pivot_table working? -


i have following dataframe.

df.head(30)       struct_id  resnum score_type_name  score_value 0   4294967297       1           omega     0.064840 1   4294967297       1          fa_dun     2.185618 2   4294967297       1      fa_dun_dev     0.000027 3   4294967297       1     fa_dun_semi     2.185591 4   4294967297       1             ref    -1.191180 5   4294967297       2            rama    -0.795161 6   4294967297       2           omega     0.222345 7   4294967297       2          fa_dun     1.378923 8   4294967297       2      fa_dun_dev     0.028560 9   4294967297       2      fa_dun_rot     1.350362 10  4294967297       2         p_aa_pp    -0.442467 11  4294967297       2             ref     0.249477 12  4294967297       3            rama     0.267443 13  4294967297       3           omega     0.005106 14  4294967297       3          fa_dun     0.020352 15  4294967297       3      fa_dun_dev     0.025507 16  4294967297       3      fa_dun_rot    -0.005156 17  4294967297       3         p_aa_pp    -0.096847 18  4294967297       3             ref     0.979644 19  4294967297       4            rama    -1.403292 20  4294967297       4           omega     0.212160 21  4294967297       4          fa_dun     4.218029 22  4294967297       4      fa_dun_dev     0.003712 23  4294967297       4     fa_dun_semi     4.214317 24  4294967297       4         p_aa_pp    -0.462765 25  4294967297       4             ref    -1.960940 26  4294967297       5            rama    -0.600053 27  4294967297       5           omega     0.061867 28  4294967297       5          fa_dun     3.663050 29  4294967297       5      fa_dun_dev     0.004953 

according pivot documentation, should able reshape on score_type_name using pivot function.

df.pivot(columns='score_type_name',values='score_value',index=['struct_id','resnum']) 

but, following.

enter image description here

however, pivot_table function seems work:

pivoted = df.pivot_table(columns='score_type_name',                          values='score_value',                          index=['struct_id','resnum']) 

enter image description here

but not lend itself, me atleast, further analysis. want have struct_id, resnum, , score_type_name columns instead of stacking score_type_name on top of other columns. additionally, want struct_id every row, , not aggregate in joined row table.

so can tell me how can nice dataframe want using pivot? additionally, documentation, can't tell why pivot_table works , pivot doesn't. if @ first example of pivot, looks need.

p.s. did post question in reference problem, did such poor job of demonstrating output, deleted , tried again using ipython notebook. apologize in advance if seeing twice.

here notebook full reference

edit - desired results (made in excel):

structid    resnum  pdb_residue_number  chain_id    name3   fa_dun  fa_dun_dev  fa_dun_rot  fa_dun_semi omega   p_aa_pp rama    ref 4294967297  1   99    asn 2.1856  0.0000      2.1856  0.0648          -1.1912 4294967297  2   100   met 1.3789  0.0286  1.3504      0.2223  -0.4425 -0.7952 0.2495 4294967297  3   101   val 0.0204  0.0255  -0.0052     0.0051  -0.0968 0.2674  0.9796 4294967297  4   102   glu 4.2180  0.0037      4.2143  0.2122  -0.4628 -1.4033 -1.9609 4294967297  5   103   gln 3.6630  0.0050      3.6581  0.0619  -0.2759 -0.6001 -1.5172 4294967297  6   104   met 1.5175  0.2206  1.2968      0.0504  -0.3758 -0.7419 0.2495 4294967297  7   105   3.6987  0.0184      3.6804  0.0547  0.4019  -0.1489 0.3883 4294967297  8   106   thr 0.1048  0.0134  0.0914      0.0003  -0.7963 -0.4033 0.2013 4294967297  9   107   asp 2.3626  0.0005      2.3620  0.0521  0.1955  -0.3499 -1.6300 4294967297  10  108   ile 1.8447  0.0270  1.8176      0.0971  0.1676  -0.4071 1.0806 4294967297  11  109   ile 0.1276  0.0092  0.1183      0.0208  -0.4026 -0.0075 1.0806 4294967297  12  110   ser 0.2921  0.0342  0.2578      0.0342  -0.2426 -1.3930 0.1654 4294967297  13  111   leu 0.6483  0.0019  0.6464      0.0845  -0.3565 -0.2356 0.7611 4294967297  14  112   trp 2.5965  0.1507      2.4457  0.5143  -0.1370 -0.5373 1.2341 4294967297  15  113   asp 2.6448  0.1593          0.0510      -0.5011  

i'm not sure understand, i'll give try. use stack/unstack instead of pivot, closer want?

df.set_index(['struct_id','resnum','score_type_name']).unstack()                    score_value                                               score_type_name        fa_dun fa_dun_dev fa_dun_rot fa_dun_semi     omega    struct_id  resnum                                                            4294967297 1         2.185618   0.000027        nan    2.185591  0.064840               2         1.378923   0.028560   1.350362         nan  0.222345               3         0.020352   0.025507  -0.005156         nan  0.005106               4         4.218029   0.003712        nan    4.214317  0.212160               5         3.663050   0.004953        nan         nan  0.061867      score_type_name     p_aa_pp      rama       ref   struct_id  resnum                                 4294967297 1            nan       nan -1.191180              2      -0.442467 -0.795161  0.249477              3      -0.096847  0.267443  0.979644              4      -0.462765 -1.403292 -1.960940              5            nan -0.600053       nan   

i'm not sure why pivot isn't working (kinda seems me should, wrong), seem work (or @ least not give error) if leave off 'struct_id'. of course, that's not useful solution full dataset have more 1 different values 'struct_id'.

df.pivot(columns='score_type_name',values='score_value',index='resnum')  score_type_name    fa_dun  fa_dun_dev  fa_dun_rot  fa_dun_semi     omega   resnum                                                                      1                2.185618    0.000027         nan     2.185591  0.064840    2                1.378923    0.028560    1.350362          nan  0.222345    3                0.020352    0.025507   -0.005156          nan  0.005106    4                4.218029    0.003712         nan     4.214317  0.212160    5                3.663050    0.004953         nan          nan  0.061867     score_type_name   p_aa_pp      rama       ref   resnum                                          1                     nan       nan -1.191180   2               -0.442467 -0.795161  0.249477   3               -0.096847  0.267443  0.979644   4               -0.462765 -1.403292 -1.960940   5                     nan -0.600053       nan   

edit add: reset_index() convert multi-index (hierarchical) flatter style. there still hierarchy in column names, easiest way rid of df.columns=['var1','var2',...] although there more sophisticated ways if searching.

df.set_index(['struct_id','resnum','score_type_name']).unstack().reset_index()

                  struct_id resnum score_value                             score_type_name                         fa_dun fa_dun_dev fa_dun_rot    0                4294967297      1    2.185618   0.000027        nan    1                4294967297      2    1.378923   0.028560   1.350362    2                4294967297      3    0.020352   0.025507  -0.005156    3                4294967297      4    4.218029   0.003712        nan    4                4294967297      5    3.663050   0.004953        nan    

Comments

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

apple push notifications - Confusion about IOS certificates, APNS mainly -