javascript - REGEX to capture sentences with quotes -
i having trouble putting regex match quotes , sentences. here (simplified) specs trying meet:
a sentence chain of characters followed punctuation mark (a dot, keep things simple) or newline.
a quote chain of characters between 2
"
.each sentence should new match.
a sentence can contain quotes, , quotes can contain sentences. last sentence in quote should end capture.
so far have come this: \s*((?:("[^"]*")|[^.\n])*\.+"?)\s*
test case: regex101
as can see can't separate quotes sentences. example:
§2: "your lordship," mya informed lord robert, "lady waynwood’s banners have been seen hour down road. here soon, cousin harry. want greet them"
should full match, regex gives me 3 , captures next paragraph.
§3: "they invited," said uncertainly, "for tourney. don’t..."
should stop full match , regex goes on capture alayne closed book.
i can't figure out going wrong, appreciated.
edit: desired output
((?![.\n\s])[^.\n"]*(?:"[^\n"]*[^\n".]"[^.\n"]*)*(?:"[^"\n]+\."|\.|(?=\n)))
splitting up:
(?![.\n\s])
- first check starting valid character (not whitespace or end of sentence.[^.\n"]*
- match text not surrounded in quotes not contain sentence terminator.(?:"[^\n"]*[^\n".]"[^.\n"]*)
- match (in non-capturing group) quote contains @ least 1 character , not contain newline , not end quote sentence terminator - followed zero-or-more characters not in quote , not contain sentence terminator.*
- previous non-capturing group can repeated 0 (so there can sentences without quotes) -or-more times.(?:"[^"\n]+\."|\.|(?=\n))
- finally, include either quote terminates full stop or full stop @ end of sentence or check ending newline.
Comments
Post a Comment