python - Pandas: Conditionally generate descriptions from column content -


i trying iron out issues function uses pandas regex via str.extract each row in column "name" generate column "description". using regex , not split since code must able manage variety of formattings.

the function must modified acknowledge various conditions.

dataframe:

import pandas pd import re  df = pd.dataframe(["long axp un x3 von", "short bidu un 5x von", "short goog von", "long goog von"], columns=["name"]) 

input:

name "long axp un x3 von" "short bidu un 5x von" "short goog von" "long goog von" 

current code:

description_map = {"axp":"american express", "bidu":"baidu"} sign_map = {"long": "", "short": "-"} def f(strseries):     stock = strseries.str.extract(r"\s(\s+)\s").map(description_map)     leverage = strseries.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase)     sign = strseries.str.extract(r"(\s+)\s").map(sign_map)     return "tracks " + stock + " " + sign + leverage + " leverage"  df["description"] = f(df["name"]) 

current output:

name                        description "long axp un x3 von"        "tracks american express x3 leverage" "short bidu un 5x von"      "tracks baidu -5x leverage" "short goog von"            "" "long goog von"             "" 

desired output:

name                        description "long axp un x3 von"        "tracks american express 3x leverage" "short bidu un 5x von"      "tracks baidu inversely -5x leverage" "short goog von"            "tracks inversely" "long goog von"             "tracks" 

implications:

  • if sign "-", how can make add direction = "inversely" string?
  • if no stock matched in name dictionary description_map: set stock = "" , return string.
  • if no leverage found in name: ignore part "with" + sign + leverage + " leverage".
  • split , reorder sign + leverage displays in order -5x" regardless of if inputted "short x5".

i spent time writing function:

description_map = {"axp":"american express", "bidu":"baidu"} sign_map = {"long": "", "short": "-"}  stock_match = re.compile(r"\s(\s+)\s") leverage_match = re.compile("[0-9]x|x[0-9]|x[0-9]|[0-9]x")  def f(value):      f1 = lambda x: description_map[stock_match.findall(x)[0]] if stock_match.findall(x)[0] in description_map else ''     f2 = lambda x: leverage_match.findall(x)[0] if len(leverage_match.findall(x)) > 0 else ''     f3 = lambda x: '-' if 'short' in x else ''      stock = f1(value)     leverage = f2(value)     sign = f3(value)      statement = "tracks " + stock      if stock == "":         if sign == '-':             return statement + "{}".format('inversely')         else:             return "tracks"      if leverage[0].replace('x','x') == 'x':         leverage = leverage[1]+leverage[0].replace('x','x')      if leverage != '' , sign == '-':         statement += " {} {}{} leverage".format('inversely', sign, leverage)     elif leverage != '' , sign == '':         statement += " {} leverage".format(leverage)     else:         if sign == '-':             statement += " {} ".format('inversely')      return statement  df["description"] = df["name"].map(lambda x:f(x)) 

output:

in [97]: %paste import pandas pd import re  df = pd.dataframe(["long axp un x3 von", "short bidu un 5x von", "short goog von", "long goog von"], columns=["name"])  ## -- end pasted text --  in [98]: df out[98]:                     name 0    long axp un x3 von 1  short bidu un 5x von 2        short goog von 3         long goog von  in [99]: df["description"] = df["name"].map(lambda x:f(x))  in [100]: df out[100]:                     name                               description 0    long axp un x3 von  tracks american express 3x leverage 1  short bidu un 5x von  tracks baidu inversely -5x leverage 2        short goog von                          tracks inversely 3         long goog von                                    tracks 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -