The solution of @sparkandshine seems to be short and to the point for the specific sort pattern. Also the one offered by @j-f-sebastian to me looks excellent, terse and with important hints/links on internationalization and off-memory sorting strategies.
Maybe the following more explicit show case offers additional helpful info for the OP or someone with similar tasks to adapt to their needs. Please see comments in the mostly pep8 conforming code:
#! /usr/bin/env python
"""Provide a show case for hierarchical sort, that offers flexible
hierarchical lexcical, numeric column sort mixes at runtime.
Hopefully this draft solution offers ideas for helping migrate
the sort shell level operation into a pythonic solution - YMMV."""
from __future__ import print_function
from functools import partial # We use this to tailor the key function
def text_in_lines_gen(text_in_lines):
"""Mock generator simulating a line source for the data."""
for line in text_in_lines.split('\n'):
if line:
yield line.split()
def sort_hier_gen(iterable_lines, hier_sort_spec):
"""Given iterator of text lines, sort all lines based on
sort specification in hier_sort_spec.
Every entry in hier_sort_spec is expected to be a pair with first value
integer for index in columns of text blocks lines and second entry
type of sorting in ('int', 'float') numeric or any other for text
(lexical) ordering regime."""
num_codes = ('int', 'float')
converter_map = dict(zip(num_codes, (int, float)))
# Extract facts from sort spec, prepare processing:
key_ordered = tuple(k for k, _ in hier_sort_spec)
# Prepare key function: Step 1 ...
def _key_in(selected, r):
"""Inject the indexing into the key at sort time
via partial application, as key function in sort
has single argument only."""
return tuple(r[k] for k in selected)
_key = partial(_key_in, key_ordered) # ... step 2
convert_these_by = {}
for k, t in hier_sort_spec:
if t in num_codes:
convert_these_by[k] = converter_map[t]
if not convert_these_by: # early out
for row in sorted(iterable_lines, key=_key):
yield row
else:
def flow_converter(row_iter, converter_map):
"""Row based converter - Don't block the flow ;-)."""
for row in row_iter:
for k, convert in converter_map.items():
row[k] = convert(row[k])
yield row
for row in sorted(flow_converter(iterable_lines,
convert_these_by), key=_key):
yield row
def main():
"""Drive the hierarchical text-int-int sort."""
data_1 = """Scf_3R 8599253 8621866 FBgn0000014 FBgn0191744 -0.097558026153
Scf_3R 8497493 8503049 FBgn0000015 FBgn0025043 0.437973284047
Scf_3L 16209309 16236428 FBgn0000017 FBgn0184183 -1.19105585707
Scf_2L 10630469 10632308 FBgn0000018 FBgn0193617 0.073153454539
Scf_3R 12087670 12124207 FBgn0000024 FBgn0022516 -0.023946795475
Scf_X 14395665 14422243 FBgn0000028 FBgn0187465 0.00300558969397
Scf_3R 25163062 25165316 FBgn0000032 FBgn0189058 0.530118698187
Scf_3R 19757441 19808894 FBgn0000036 FBgn0189822 -0.282508464261"""
bar = []
x = 0
for a in range(3, 0, -1):
for b in range(3, 0, -1):
for c in range(3, 0, -1):
x += 1
bar.append('a_%d %d %0.1f %d' % (a, b, c * 1.1, x))
data_2 = '\n'.join(bar)
hier_sort_spec = ((0, 't'), (1, 'int'), (2, 'int'))
print("# Test data set 1 and sort spec={0}:".format(hier_sort_spec))
for sorted_row in sort_hier_gen(text_in_lines_gen(data_1), hier_sort_spec):
print(sorted_row)
hier_sort_spec = ((0, 't'), (1, None), (2, False))
print("# Test data set 1 and sort spec={0}:".format(hier_sort_spec))
for sorted_row in sort_hier_gen(text_in_lines_gen(data_1), hier_sort_spec):
print(sorted_row)
hier_sort_spec = ((0, 't'), (2, 'float'), (1, 'int'))
print("# Test data set 2 and sort spec={0}:".format(hier_sort_spec))
for sorted_row in sort_hier_gen(text_in_lines_gen(data_2), hier_sort_spec):
print(sorted_row)
if __name__ == '__main__':
main()
On my machine the three test cases (including the questions sample data) yield:
First:
# Test data set 1 and sort spec=((0, 't'), (1, 'int'), (2, 'int')):
['Scf_2L', 10630469, 10632308, 'FBgn0000018', 'FBgn0193617', '0.073153454539']
['Scf_3L', 16209309, 16236428, 'FBgn0000017', 'FBgn0184183', '-1.19105585707']
['Scf_3R', 8497493, 8503049, 'FBgn0000015', 'FBgn0025043', '0.437973284047']
['Scf_3R', 8599253, 8621866, 'FBgn0000014', 'FBgn0191744', '-0.097558026153']
['Scf_3R', 12087670, 12124207, 'FBgn0000024', 'FBgn0022516', '-0.023946795475']
['Scf_3R', 19757441, 19808894, 'FBgn0000036', 'FBgn0189822', '-0.282508464261']
['Scf_3R', 25163062, 25165316, 'FBgn0000032', 'FBgn0189058', '0.530118698187']
['Scf_X', 14395665, 14422243, 'FBgn0000028', 'FBgn0187465', '0.00300558969397']
Second:
# Test data set 1 and sort spec=((0, 't'), (1, None), (2, False)):
['Scf_2L', '10630469', '10632308', 'FBgn0000018', 'FBgn0193617', '0.073153454539']
['Scf_3L', '16209309', '16236428', 'FBgn0000017', 'FBgn0184183', '-1.19105585707']
['Scf_3R', '12087670', '12124207', 'FBgn0000024', 'FBgn0022516', '-0.023946795475']
['Scf_3R', '19757441', '19808894', 'FBgn0000036', 'FBgn0189822', '-0.282508464261']
['Scf_3R', '25163062', '25165316', 'FBgn0000032', 'FBgn0189058', '0.530118698187']
['Scf_3R', '8497493', '8503049', 'FBgn0000015', 'FBgn0025043', '0.437973284047']
['Scf_3R', '8599253', '8621866', 'FBgn0000014', 'FBgn0191744', '-0.097558026153']
['Scf_X', '14395665', '14422243', 'FBgn0000028', 'FBgn0187465', '0.00300558969397']
Third:
# Test data set 2 and sort spec=((0, 't'), (2, 'float'), (1, 'int')):
['a_1', 1, 1.1, '27']
['a_1', 2, 1.1, '24']
['a_1', 3, 1.1, '21']
['a_1', 1, 2.2, '26']
['a_1', 2, 2.2, '23']
['a_1', 3, 2.2, '20']
['a_1', 1, 3.3, '25']
['a_1', 2, 3.3, '22']
['a_1', 3, 3.3, '19']
['a_2', 1, 1.1, '18']
['a_2', 2, 1.1, '15']
['a_2', 3, 1.1, '12']
['a_2', 1, 2.2, '17']
['a_2', 2, 2.2, '14']
['a_2', 3, 2.2, '11']
['a_2', 1, 3.3, '16']
['a_2', 2, 3.3, '13']
['a_2', 3, 3.3, '10']
['a_3', 1, 1.1, '9']
['a_3', 2, 1.1, '6']
['a_3', 3, 1.1, '3']
['a_3', 1, 2.2, '8']
['a_3', 2, 2.2, '5']
['a_3', 3, 2.2, '2']
['a_3', 1, 3.3, '7']
['a_3', 2, 3.3, '4']
['a_3', 3, 3.3, '1']
Updated to use mostly generators for only having one copy of the data "around" as this is needed anyway (in-memory) for the global sort, but no need for having more copies ;-)
Added also functools.partial
as this for me is the fastest way, to adapt a key function to the flexible sort order.
One last update removed the remaining non-generator copy cruft in the case where a conversion is implemented, by defining a local generator function ofr the row based conversions.
HTH.
sort
is not a bash command. It's part of GNU coreutils. That you run it frombash
doesn't make it abash
command, you can run any program frombash
. – Islasx, y
that "compare equal" their relative order is preserved by the sorting process. This means that in order to sort by multiple keys you could simply sort by each key separately, from the last to the first. In your case:data.sort(key=lambda x: int(x[2])); data.sort(key=lambda x: int(x[1])); data.sort(key=lambda x: x[0])
. This will first the rows by the value ofx[0]
then by comparing numerically the valuesx[1]
and finally numerically thex[2]
s. – Primrosesort()
call:data.sort(key=lambda x: (x[0], int(x[1]), int(x[2])))
? – Maisel