Python Chunks

當我們要把list分成好幾個chunk時的幾種做法

yield

def chunks1(input_list, n):
    for i in range(0, len(input_list), n):
        yield input_list[i:i + n]

input_list = [i for i in range(0, 15)]
print(list(chunks(input_list, 4)))

# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14]]

一行for迴圈

input_list = [i for i in range(0, 15)]

n = 3
output_list = [input_list[i:i+ n] for i in range(0, len(input_list), n)]
print(output_list)

# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14]]

iterable

針對任何iterable

from itertools import islice

def chunks2(input_iter, n):
    input_list = iter(input_iter)
    return iter(lambda: tuple(islice(input_list, n)), ())

input_list = [i for i in range(0, 15)]

n = 4
print(list(chunks2(input_list, n)))

# [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14)]

Numpy

import numpy as np

input_list = [i for i in range(0, 15)]
np.array_split(input_list, 5)

# [array([0, 1, 2]),
# array([3, 4, 5]),
# array([6, 7, 8]),
# array([ 9, 10, 11]),
# array([12, 13, 14])]

上述幾種簡單的方式皆可達成

幾個注意的地方

  1. 針對的輸入類型是哪種,只能list還是也可以接受其他輸入格式?
  2. 每個例子都有用到n,n所控制的數字是甚麼?
  3. 一些特例處理

如果今天輸入檔案太大必須分批處理,試試panda的read_csv()

發表留言