python - Pandas str.extract: AttributeError: 'str' object has no attribute 'str' -
i'm trying repurpose function using split
using str.extract
(regex) instead.
def bull_lev(x): spl = x.rsplit(none, 2)[-2].strip("xx") if spl.str.isdigit(): return "+" + spl + "00" return "+100" def bear_lev(x): spl = x.rsplit(none, 2)[-2].strip("xx") if spl.str.isdigit(): return "-" + spl + "00" return "-100" df["leverage"] = df["name"].map(lambda x: bull_lev(x) if "bull" in x else bear_lev(x) if "bear" in x else "+100"
i using pandas
dataframe
handling:
import pandas pd df = pd.dataframe(["bull axp un x3 von", "bear estox 12x s"], columns=["name"])
desired output:
name leverage "bull axp un x3 von" "+300" "bear estox 12x s" "-1200"
faulty regex attempt "bull"
:
def bull_lev(x): #spl = x.rsplit(none, 2)[-2].strip("xx") spl = x.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).strip("x") if spl.str.isdigit(): return "+" + spl + "00" return "+100" df["leverage"] = df["name"].map(lambda x: bull_lev(x) if "bull" in x else bear_lev(x) if "bear" in x else "+100")
produces error:
traceback (most recent call last): file "toolkit.py", line 128, in <module> df["leverage"] = df["name"].map(lambda x: bull_lev(x) file "/python/virtual/py2710/lib/python2.7/site-packages/pandas/core/series.py", line 2016, in map mapped = map_f(values, arg) file "pandas/src/inference.pyx", line 1061, in pandas.lib.map_infer (pandas/lib.c:58435) file "toolkit.py", line 129, in <lambda> if "bull" in x else bear_lev(x) if "bear" in x else "+100") file "toolkit.py", line 123, in bear_lev spl = x.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).strip("x") attributeerror: 'str' object has no attribute 'str'
i assuming due str.extract
capturing list while split
works directly string?
you can handle positive case using following:
in [150]: import re df['fundleverage'] = '+' + df['name'].str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x') + '00' df out[150]: name fundleverage 0 bull axp un x3 von +300 1 bull estox x12 s +1200
you can use np.where
handle both cases in 1 liner:
in [151]: df['fundleverage'] = np.where(df['name'].str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x').str.isdigit(), '+' + df['name'].str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x') + '00', '+100') df out[151]: name fundleverage 0 bull axp un x3 von +300 1 bull estox x12 s +1200
so above uses vectorised str
methods strip
, extract
, isdigit
achieve want.
update
after changed requirements (which should not future reference) can mask df bull , bear cases:
in [189]: import re df = pd.dataframe(["bull axp un x3 von", "bear estox 12x s"], columns=["name"]) bull_mask_name = df.loc[df['name'].str.contains('bull', case=false), 'name'] bear_mask_name = df.loc[df['name'].str.contains('bear', case=false), 'name'] df.loc[df['name'].str.contains('bull', case=false), 'fundleverage'] = np.where(bull_mask_name.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x').str.isdigit(), '+' + bull_mask_name.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x') + '00', '+100') df.loc[df['name'].str.contains('bear', case=false), 'fundleverage'] = np.where(bear_mask_name.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x').str.isdigit(), '-' + bear_mask_name.str.extract(r"(x\d+|\d+x)\s", flags=re.ignorecase).str.strip('x') + '00', '-100') df out[189]: name fundleverage 0 bull axp un x3 von +300 1 bear estox 12x s -1200
Comments
Post a Comment