regex - tokenizing with regular expression python -

- March 15, 2015

i have following code want tokenize text located in directory regular expression

def tokenize():     infile = codecs.open('test_test.txt', 'r', encoding='utf-8')     text = infile.read()     infile.close()     words = []     io.open('test_test.txt', 'r', encoding='utf-8') csvfile:         text = unicode_csv_reader(csvfile, delimiter=',', quotechar='"')         item in text:             word in item:                 words.append(word)                 tregex = re.compile(ur'[?&/\'\r\n]', re.ignorecase)                 newtext1 = tregex.sub(' ', text)                 newtext = re.sub(' +', ' ', newtext1)                 words = re.split(r' ', newtext)                 print words

but error

 traceback (most recent call last): file "d:\kksc\kksc.py", line 150, in oncheckspell tokenize() file "d:\kksc\kksc.py", line 32, in tokenize newtext1 = tregex.sub(' ', text)

typeerror: expected string or buffer

newtext1 = tregex.sub(' ', text)

text 2 dimensional array of strings, while sub expects string. mean:

newtext1 = tregex.sub(' ', word) ?

Search This Blog

harsh

regex - tokenizing with regular expression python -

Comments

Post a Comment

Popular posts from this blog

Java 3D LWJGL collision -

spring - SubProtocolWebSocketHandler - No handlers -

methods - python can't use function in submodule -