regex - How to use separate() properly? -
i have difficulties extract id in form:
27da12ce-85fe-3f28-92f9-e5235a5cf6ac
from data frame:
a<-c("name_27da12ce-85fe-3f28-92f9-e5235a5cf6ac_thomas_myr", "name_94773a8c-b71d-3be6-b57e-db9d8740bb98_thimo", "name_1ed571b4-1aef-3fe2-8f85-b757da2436ee_alex", "name_9fbeda37-0e4f-37aa-86ef-11f907812397_john_tya", "name_83ef784f-3128-35a1-8ff9-daab1c5f944b_bishop", "name_39de28ca-5eca-3e6c-b5ea-5b82784cc6f4_due_to", "name_0a52a024-9305-3bf1-a0a6-84b009cc5af4_wis_michal", "name_2520ebbb-7900-32c9-9f2d-178cf04f7efc_sarah_lu_van_gar/thomas")
basically thing between first , second underscore.
usually approach by:
library(tidyr) df$a<-as.character(df$a) df<-df[grep("_", df$a), ] df<- separate(df, a, c("id","name") , sep = "_") df$a<-as.numeric(df$id)
however time there many underscores...and approach fails. there way extract id?
i think should use extract
instead of separate
. need specify patterns want capture. i'm assuming here id
starts number i'm capturing after first number until next _
, after
df <- data.frame(a) df <- df[grep("_", df$a),, drop = false] extract(df, a, c("id", "name"), "[a-za-z].*?(\\d.*?)_(.*)") # id name # 1 27da12ce-85fe-3f28-92f9-e5235a5cf6ac thomas_myr # 2 94773a8c-b71d-3be6-b57e-db9d8740bb98 thimo # 3 1ed571b4-1aef-3fe2-8f85-b757da2436ee alex # 4 9fbeda37-0e4f-37aa-86ef-11f907812397 john_tya # 5 83ef784f-3128-35a1-8ff9-daab1c5f944b bishop # 6 39de28ca-5eca-3e6c-b5ea-5b82784cc6f4 due_to # 7 0a52a024-9305-3bf1-a0a6-84b009cc5af4 wis_michal # 8 2520ebbb-7900-32c9-9f2d-178cf04f7efc sarah_lu_van_gar/thomas
Comments
Post a Comment