Python threading VS multiprocessing(施工中)
Threading vs Multiprocessing
Threading:
- A new thread is spawned within the existing process
- Starting a thread is faster than starting a process
- Memory is shared between all threads
- Mutexes often necessary to control access to shared data
- One GIL (Global Interpreter Lock) for all threads
Multiprocessing:
- A new process is started independent from the first process
- Starting a process is slower than starting a thread
- Memory is not shared between processes
- Mutexes not necessary (unless threading in the new process)
- One GIL (Global Interpreter Lock) for each process
跑Threading时,只有一个CPU占满,因为GIL锁住了;Multiprocessing时,所有core都占满了。虽然Threading不可以in parallel, 它可以concurrently,通过切片来回跳。
Threading适合I/O bound的task,因为需要等待磁盘或者网络响应。如果是CPU bound的任务,则便宜不大。
传统做法示例代码:
import threading
import time
start = time.perf_counter()
def do_something(seconds):
print(f'Sleeping {seconds} second...')
time.sleep(seconds)
print('Done Sleeping...')
threads = []
for _ in range(10):
#先设置,还没跑
t = threading.Thread(target=do_something, args=[1.5])
#开始跑
t.start()
#不能在loop内join是因为它会join on the thread before looping through and creating and starting the next thread,本质上与synchronously跑无异
#所以需要办法先start threads in one loop,然后再次loop through the threads again and then run the join method on them
#做法是append去一个list
threads.append(t)
for thread in threads:
thread.join()
finish = time.perf_counter()
print (f'Finished in {round(finish-start,2)} second(s)')
Python3.2之后有了thread pool executor, 更方便高效也可以更方便切换去multiple processes,如果有需求的话:
import concurrent.futures
import time
start = time.perf_counter()
def do_something(seconds):
print(f'Sleeping {seconds} second...')
time.sleep(seconds)
return f'Done Sleeping...{seconds}'
with concurrent.futures.ThreadPoolExecutor() as executor:
#submit方法schedules a function to be executed and returns a future object.
#future object encapsulates executation of our function, 还允许你在scheduled后查看它, run, done,或者result
secs = [5,4,3,2,1]
#此处用到List comprehension
results = [executor.submit(do_something, sec) for sec in secs]
for f in concurrent.futures.as_completed(results):
print(f.result())
# f1 = executor.submit(do_something, 1)
# f2 = executor.submit(do_something, 1)
# print(f1.result())
# print(f2.result())
finish = time.perf_counter()
print (f'Finished in {round(finish-start,2)} second(s)')
(还可以加上multiprocessing的例子,索性讲讲透)
Reference:
https://www.youtube.com/watch?v=ecKWiaHCEKs
https://www.youtube.com/watch?v=IEEhzQoKtQU&t=19s
https://www.liaoxuefeng.com/wiki/1016959663602400/1017628290184064
https://www.liaoxuefeng.com/wiki/1016959663602400/1017629247922688
https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/