Python threading VS multiprocessing(施工中)

Threading vs Multiprocessing

Threading:

  • A new thread is spawned within the existing process
  • Starting a thread is faster than starting a process
  • Memory is shared between all threads
  • Mutexes often necessary to control access to shared data
  • One GIL (Global Interpreter Lock) for all threads

Multiprocessing:

  • A new process is started independent from the first process
  • Starting a process is slower than starting a thread
  • Memory is not shared between processes
  • Mutexes not necessary (unless threading in the new process)
  • One GIL (Global Interpreter Lock) for each process

跑Threading时,只有一个CPU占满,因为GIL锁住了;Multiprocessing时,所有core都占满了。虽然Threading不可以in parallel, 它可以concurrently,通过切片来回跳。

Threading适合I/O bound的task,因为需要等待磁盘或者网络响应。如果是CPU bound的任务,则便宜不大。

传统做法示例代码:

import threading
import time

start = time.perf_counter()

def do_something(seconds):
    print(f'Sleeping {seconds} second...')
    time.sleep(seconds)
    print('Done Sleeping...')

threads = []
    
for _ in range(10):
    #先设置,还没跑
    t = threading.Thread(target=do_something, args=[1.5])
    #开始跑
    t.start()
    #不能在loop内join是因为它会join on the thread before looping through and creating and starting the next thread,本质上与synchronously跑无异
    #所以需要办法先start threads in one loop,然后再次loop through the threads again and then run the join method on them
    #做法是append去一个list
    threads.append(t)

for thread in threads:
    thread.join()

    
finish = time.perf_counter()

print (f'Finished in {round(finish-start,2)} second(s)')

Python3.2之后有了thread pool executor, 更方便高效也可以更方便切换去multiple processes,如果有需求的话:

import concurrent.futures
import time

start = time.perf_counter()

def do_something(seconds):
    print(f'Sleeping {seconds} second...')
    time.sleep(seconds)
    return f'Done Sleeping...{seconds}'

with concurrent.futures.ThreadPoolExecutor() as executor:
    #submit方法schedules a function to be executed and returns a future object.
    #future object encapsulates executation of our function, 还允许你在scheduled后查看它, run, done,或者result  
    
    secs = [5,4,3,2,1]
    #此处用到List comprehension
    results = [executor.submit(do_something, sec) for sec in secs]
    
    for f in concurrent.futures.as_completed(results):
        print(f.result())
    
#     f1 = executor.submit(do_something, 1)
#     f2 = executor.submit(do_something, 1)
#     print(f1.result())
#     print(f2.result())
    
    
finish = time.perf_counter()

print (f'Finished in {round(finish-start,2)} second(s)')

(还可以加上multiprocessing的例子,索性讲讲透)

Reference:
https://www.youtube.com/watch?v=ecKWiaHCEKs
https://www.youtube.com/watch?v=IEEhzQoKtQU&t=19s
https://www.liaoxuefeng.com/wiki/1016959663602400/1017628290184064
https://www.liaoxuefeng.com/wiki/1016959663602400/1017629247922688
https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/