python - Can re.findall() return only the part of the regex in parens? -
looping through data, want capture string of numbers appear page ids (with more 1 per line.) however, want match number strings part of particular url, don't want record url, number.
urls relative, digits strings of variable length, of form
/view/123456.htm
data returned here '123456'
i using re.findall identify right urls, , re.sub extract number strings.
views = re.findall(r"/view/\d*?.htm", line) view in views: view = re.sub(r"/view/(\d+).htm", r"\1", view) pagelist.append(view)
is there way
views = re.findall(r"/view/(\d*?).htm", r"\1", line) #i know doesn't work
where original findall() returns part of match in parens?
can re.findall() return part of regex in parens?
it not can, does:
>>> import re >>> re.findall(r"/view/(\d*?).htm", "/view/123.htm /view/456.htm") ['123', '456']
did not try it? the documentation describes well.
Comments
Post a Comment