When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']? -

April 15, 2014

i using split('\n') lines in 1 string, , found ''.split() returns empty list, [], while ''.split('\n') returns ['']. there specific reason such difference?

and there more convenient way count lines in string?

question: using split('\n') lines in 1 string, , found ''.split() returns empty list [], while ''.split('\n') returns [''].

the str.split() method has 2 algorithms. if no arguments given, splits on repeated runs of whitespace. however, if argument given, treated single delimiter no repeated runs.

in case of splitting empty string, first mode (no argument) return empty list because whitespace eaten , there no values put in result list.

in contrast, second mode (with argument such \n) produce first empty field. consider if had written '\n'.split('\n'), 2 fields (one split, gives 2 halves).

question: there specific reason such difference?

this first mode useful when data aligned in columns variable amounts of whitespace. example:

>>> data = '''\ shasta      california     14,200 mckinley    alaska         20,300 fuji        japan          12,400 ''' >>> line in data.splitlines():         print line.split()  ['shasta', 'california', '14,200'] ['mckinley', 'alaska', '20,300'] ['fuji', 'japan', '12,400']

the second mode useful delimited data such csv repeated commas denote empty fields. example:

>>> data = '''\ guido,bdfl,,amsterdam barry,flufl,,usa tim,,,usa ''' >>> line in data.splitlines():         print line.split(',')  ['guido', 'bdfl', '', 'amsterdam'] ['barry', 'flufl', '', 'usa'] ['tim', '', '', 'usa']

note, number of result fields 1 greater number of delimiters. think of cutting rope. if make no cuts, have 1 piece. making 1 cut, gives 2 pieces. making 2 cuts, gives 3 pieces. , python's str.split(delimiter) method:

>>> ''.split(',')       # no cuts [''] >>> ','.split(',')      # 1 cut ['', ''] >>> ',,'.split(',')     # 2 cuts ['', '', '']

question: , there more convenient way count lines in string?

yes, there couple of easy ways. 1 uses str.count() , other uses str.splitlines(). both ways give same answer unless final line missing \n. if final newline missing, str.splitlines approach give accurate answer. faster technique accurate uses count method corrects final newline:

>>> data = '''\ line 1 line 2 line 3 line 4'''  >>> data.count('\n')                               # inaccurate 3 >>> len(data.splitlines())                         # accurate, slow 4 >>> data.count('\n') + (not data.endswith('\n'))   # accurate , fast 4

question @kaz: why heck 2 different algorithms shoe-horned single function?

the signature str.split 20 years old, , number of apis era strictly pragmatic. while not perfect, method signature isn't "terrible" either. part, guido's api design choices have stood test of time.

the current api not without advantages. consider strings such as:

ps_aux_header  = "user               pid  %cpu %mem      vsz" patient_header = "name,age,height,weight"

when asked break these strings fields, people tend describe both using same english word, "split". when asked read code such fields = line.split() or fields = line.split(','), people tend correctly interpret statements "splits line fields".

microsoft excel's text-to-columns tool made similar api choice , incorporates both splitting algorithms in same tool. people seem mentally model field-splitting single concept though more 1 algorithm involved.

Search This Blog

Three

When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']? -

Comments

Post a Comment

Popular posts from this blog

.htaccess - First slash is removed after domain when entering a webpage in the browser -

Automatically create pages in phpfox -

c# - Farseer ContactListener is not working -