《Effective Python》学习笔记
- 1.Pythonic Thinking
- Item 1: Know Which Version of Python You’re Using
- Item 2: Follow the PEP 8 Style Guide
- Item 3: Know the Differences Between bytes, str, and unicode
- Item 4: Write Helper Functions Instead of Complex Expressions
- Item 5: Know How to Slice Sequences
- Item 6: Avoid Using start, end, and stride in a Single Slice
- Item 7: Use List Comprehensions Instead of map and filter
- Item 8: Avoid More Than Two Expressions in List Comprehensions
- Item 9: Consider Generator Expressions for Large Comprehensions
- Item 10: Prefer enumerate Over range
- Item 11: Use zip to Process Iterators in Parallel
- Item 12: Avoid else Blocks After for and while Loops
- Item 13: Take Advantage of Each Block in try/except/else/finally
- 2.Functions
- Item 14: Prefer Exceptions to Returning None
- Item 15: Know How Closures Interact with Variable Scope
- Item 16: Consider Generators Instead of Returning Lists
- Item 17: Be Defensive When Iterating Over Arguments
- Item 18: Reduce Visual Noise with Variable Positional Arguments
- Item 19: Provide Optional Behavior with Keyword Arguments
- Item 20: Use None and Docstrings to Specify Dynamic Default Arguments
- Item 21: Enforce Clarity with Keyword-Only Arguments
- 3. Classes and Inheritance
- Item 22: Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples
- Item 23: Accept Functions for Simple Interfaces Instead of Classes
- Item 24: Use @classmethod Polymorphism to Construct Objects Generically
- Item 25: Initialize Parent Classes with super
- Item 26: Use Multiple Inheritance Only for Mix-in Utility Classes
- Item 27: Prefer Public Attributes Over Private Ones
- Item 28: Inherit from collections.abc for Custom Container Types
- 4. Metaclasses and Attributes
- Item 29: Use Plain Attributes Instead of Get and Set Methods
- Item 30: Consider @property Instead of Refactoring Attributes
- Item 31: Use Descriptors for Reusable @property Methods
- Item 32: Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes
- Item 33: Validate Subclasses with Metaclasses
- Item 34: Register Class Existence with Metaclasses
- Item 35: Annotate Class Attributes with Metaclasses
- 5. Concurrency and Parallelism
- Item 36: Use subprocess to Manage Child Processes
- Item 37: Use Threads for Blocking I/O, Avoid for Parallelism
- Item 38: Use Lock to Prevent Data Races in Threads
- Item 39: Use Queue to Coordinate Work Between Threads
- Item 40: Consider Coroutines to Run Many Functions Concurrently
- Item 41: Consider concurrent.futures for True Parallelism
- 6. Built-in Modules
- Item 42: Define Function Decorators with functools.wraps
- Item 43: Consider contextlib and with Statements for Reusable try/finally Behavior
- Item 44: Make pickle Reliable with copyreg
- Item 45: Use datetime Instead of time for Local Clocks
- Item 46: Use Built-in Algorithms and Data Structures
- Item 47: Use decimal When Precision Is Paramount
- Item 48: Know Where to Find Community-Built Modules
- 7. Collaboration
- Item 49: Write Docstrings for Every Function, Class, and Module
- Item 50: Use Packages to Organize Modules and Provide Stable APIs
- Item 51: Define a Root Exception to Insulate Callers from APIs
- Item 52: Know How to Break Circular Dependencies
- Item 53: Use Virtual Environments for Isolated and Reproducible Dependencies
- 8. Production
- Item 54: Consider Module-Scoped Code to Configure Deployment Environments
- Item 55: Use repr Strings for Debugging Output
- Item 56: Test Everything with unittest
- Item 57: Consider Interactive Debugging with pdb
- Item 58: Profile Before Optimizing
- Item 59: Use tracemalloc to Understand Memory Usage and Leaks
1.Pythonic Thinking
Item 1: Know Which Version of Python You’re Using
- 使用python -version查看当前Python版本
- Python的运行时版本:CPython,JyPython,IronPython和PyPy等
Item 2: Follow the PEP 8 Style Guide
可以在线查看:http://www.python.org/dev/peps/pep-0008/
- Whitespace:
- 使用四个空格缩进,使用四个空格对长表达式换行缩进
- class和funciton之间用两空行,class的method之间用一个空行
- list索引和函数调用,关键字参数赋值不用空格,变量赋值前后都只用一个空格
- Naming:
- protected attribute用_leading_underscore格式,private attribute用__double_leading_underscore格式
- Expressions and Statements:
- 不要使用肯定表达式的负:if not a is b
- 不要判断空值([]和’‘)的长度,if 空值就是False
- import模块顺序:标准模块,第三方库,自己的模块,并且进行字母排序
Item 3: Know the Differences Between bytes, str, and unicode
- Python3两种字符串类型:bytes和str,bytes表示8-bit的二进制值,str表示unicode字符
- Python2两种字符串类型:str和unicode,str表示8-bit的二进制值,unicode表示unicode字符
- 二进制值和unicode字符需要经过encode和decode转换,Python2的unicode和Python3的str没有关联二进制编码,通常使用UTF-8
- Python2转换函数:
-
to_unicode
# Python 2 def to_unicode(unicode_or_str): if isinstance(unicode_or_str, str): value = unicode_or_str.decode(‘utf-8’) else: value = unicode_or_str return value # Instance of unicode
-
to_str
# Python 2 def to_str(unicode_or_str): if isinstance(unicode_or_str, unicode): value = unicode_or_str.encode(‘utf-8’) else: value = unicode_or_str return value # Instance of str
-
- Python2,如果str只包含7-bit的ascii字符,unicode和str是一样的类型,所以:
- 使用+连接:str + unicode
- 可以对str和unicode进行比较
- unicode可以使用格式字符串,’%s’
- 上面的规则Python3不能用
-
使用open返回的文件操作,在Python3是默认进行UTF-8编码,但在Pyhton2是二进制编码
with open(‘/tmp/random.bin’, ‘w’) as f: f.write(os.urandom(10)) # >>> #TypeError: must be str, not bytes
Item 4: Write Helper Functions Instead of Complex Expressions
Item 5: Know How to Slice Sequences
- list,str,bytes和实现__getitem__和__setitem__的类都支持slice操作
- somelist[start:end],不包括end,-1表示最后一个
- slice list是shadow copy,somelist[-0:]会复制原list
- slice赋值会修改slice list,即使长度不一致(增删改)
Item 6: Avoid Using start, end, and stride in a Single Slice
- 避免同时使用start,end和stride进行一次slice操作,可以拆分成两次slice操作
Item 7: Use List Comprehensions Instead of map and filter
- map和filter需要lambda函数,使得代码更不可读
Item 8: Avoid More Than Two Expressions in List Comprehensions
- 使用list comprehensions目的就是扁平化和可读性,避免使用多于两个表达式,if条件后置
-
prefer
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] flat = [x for row in matrix for x in row]
-
not
squared = [[x**2 for x in row] for row in matrix]
Item 9: Consider Generator Expressions for Large Comprehensions
- list comprehension有个问题:会一次生成全部元素。使用generator表达式避免大量内存的分配
-
generator表达式还可以组合使用,而且执行非常快:
it = [100, 57, 15, 1, 12, 75, 5, 86, 89, 11] roots = ((x, x**0.5) for x in it) print(next(roots)) # (15, 3.872983346207417)
- The only gotcha is that the iterators returned by generator expressions are stateful, so you must be careful not to use them more than once
Item 10: Prefer enumerate Over range
- enumerate wraps any iterator with a lazy generator.
-
Prefer
for i, flavor in enumerate(flavor_list): print(‘%d: %s’ % (i + 1, flavor))
-
not
for i in range(len(flavor_list)): flavor = flavor_list[i] print(‘%d: %s’ % (i + 1, flavor))
- enumerate不但可以获得索引,还能指定开始索引
Item 11: Use zip to Process Iterators in Parallel
-
Prefer
names = [‘Cecilia’, ‘Lise’, ‘Marie’] letters = [len(n) for n in names] for name, count in zip(names, letters): if count > max_letters: longest_name = name max_letters = count
-
not
for i, name in enumerate(names): count = letters[i] if count > max_letters: longest_name = name max_letters = count
-
In Python 3, zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator.
Item 12: Avoid else Blocks After for and while Loops
- 循环后面的else块的行为会造成困扰
Item 13: Take Advantage of Each Block in try/except/else/finally
UNDEFINED = object()
def divide_json(path):
handle = open(path, ‘r+’) # May raise IOError
try:
data = handle.read() # May raise UnicodeDecodeError
op = json.loads(data) # May raise ValueError
value = (
op[‘numerator’] /
op[‘denominator’]) # May raise ZeroDivisionError
except ZeroDivisionError as e:
return UNDEFINED
else:
op[‘result’] = value
result = json.dumps(op)
handle.seek(0)
handle.write(result) # May raise IOError
return value
finally:
handle.close() # Always runs
2.Functions
Item 14: Prefer Exceptions to Returning None
- 不能用None代替Exception返回
Item 15: Know How Closures Interact with Variable Scope
- Python编译器变量查找域的顺序:
- The current function’s scope
- Any enclosing scopes (like other containing functions)
- The scope of the module that contains the code (also called the global scope)
- The built-in scope (that contains functions like len and str)
- 如果变量有赋值,这个变量就是当前作用域定义的新变量(覆盖上一层),注意UnboundLocalError
- Python3 nonlocal, Python2使用可变变量(list等)绕过
Item 16: Consider Generators Instead of Returning Lists
Item 17: Be Defensive When Iterating Over Arguments
-
generator不能重用:
it = read_visits(‘/tmp/my_numbers.txt’) print(list(it)) print(list(it)) # Already exhausted # >>> [15, 35, 80] []
- for,list等Python标准库会捕获StopIteration异常
-
list复制generator iterator就可以多次遍历
def normalize_copy(numbers): numbers = list(numbers) # Copy the iterator total = sum(numbers) result = [] for value in numbers: percent = 100 * value / total result.append(percent) return result
-
每次调用都创建iterator避免上面list分配内存
def normalize_func(get_iter): total = sum(get_iter()) # New iterator result = [] for value in get_iter(): # New iterator percent = 100 * value / total result.append(percent) return result
- for循环会调用内置iter函数,进而调用对象的__iter__方法,__iter__会返回iterator对象(实现__next__方法)
-
用iter函数检测iterator:
def normalize_defensive(numbers): if iter(numbers) is iter(numbers): # An iterator — bad! raise TypeError(‘Must supply a container’) total = sum(numbers) result = [] for value in numbers: percent = 100 * value / total result.append(percent) return result visits = [15, 35, 80] normalize_defensive(visits) # No error it = iter(visits) normalize_defensive(it) # >>> TypeError: Must supply a container
Item 18: Reduce Visual Noise with Variable Positional Arguments
- def function(*args)需要注意两点:
- 如果*args传入generator会生成所有元素,造成内存消耗
- 给函数新增*args参数,调用没有及时跟进,会出现很难发现的bug
Item 19: Provide Optional Behavior with Keyword Arguments
- 默认参数传值最好加上关键字,防止多个默认参数干扰
Item 20: Use None and Docstrings to Specify Dynamic Default Arguments
- 默认参数的值只会在函数模块加载时候生成,对于可变类型会产生奇怪的行为
- 所以,使用None作为默认参数
-
prefer
def log(message, when=None): when = datetime.now() if when is None else when print(‘%s: %s’ % (when, message)) log(‘Hi there!’) sleep(0.1) log(‘Hi again!’) # >>> # 2014-11-15 21:10:10.472303: Hi there! # 2014-11-15 21:10:10.573395: Hi again!
-
not
def log(message, when=datetime.now()): print(‘%s: %s’ % (when, message)) log(‘Hi there!’) sleep(0.1) log(‘Hi again!’) # >>> # 2014-11-15 21:10:10.371432: Hi there! #2014-11-15 21:10:10.371432: Hi again!
-
Item 21: Enforce Clarity with Keyword-Only Arguments
3. Classes and Inheritance
Item 22: Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples
- 避免dict嵌套dict或大tuple
- 轻量不可变容器可以用namedtuple
- 拆分多个层辅助类代替复杂dict
Item 23: Accept Functions for Simple Interfaces Instead of Classes
- 实例对象也可以当做函数调用,并且会执行__call__函数
- 如果需要维护函数的状态,可以定义一个有__call__的类代替闭包:
-
prefer
class BetterCountMissing(object): def __init__(self): self.added = 0 def __call__(self): self.added += 1 return 0 counter = BetterCountMissing() counter()
-
not
def increment_with_report(current, increments): added_count = 0 def missing(): nonlocal added_count # Stateful closure added_count += 1 return 0 result = defaultdict(missing, current) for key, amount in increments: result[key] += amount return result, added_count
-
Item 24: Use @classmethod Polymorphism to Construct Objects Generically
- Python只允许一个__init__方法,使用@classmethod定义构造函数实现多态
Item 25: Initialize Parent Classes with super
- 理解MRO规则:深度优先,从左到右,如果有重复保留后面的,C3 linearization
- 始终使用super初始化父类
Item 26: Use Multiple Inheritance Only for Mix-in Utility Classes
- 只有Mix-in类使用多重继承 http://blog.csdn.net/gzlaiyonghao/article/details/1656969
Item 27: Prefer Public Attributes Over Private Ones
- __private_field可以通过_classname__private_filed在外部访问,尽可能不用
Item 28: Inherit from collections.abc for Custom Container Types
- 自定义容器类继承collections.abc抽象类确保实现所有接口和行为
4. Metaclasses and Attributes
Item 29: Use Plain Attributes Instead of Get and Set Methods
- 使用public属性避免set和get方法,@property定义一些特别的行为
- 确保@property方法是快速的,如果是慢或者复杂的工作用正常的方法
Item 30: Consider @property Instead of Refactoring Attributes
- 使用@property给已有属性扩展新需求,当@property太复杂了才考虑重构
Item 31: Use Descriptors for Reusable @property Methods
- descriptor:
- def __get__(*args,**kwargs)
- def __set__(*args,**kwargs)
- 需要大量@property方法的类可以使用descriptor实现
- 用WeakKeyDictionary确保descriptor类不会引起内存泄露
Item 32: Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes
- obj.name,getattr和hasattr都会调用__getattribute__方法,如果name不在obj.__dict__里面,还会调用__getattr__方法,如果没有自定义__getattr__方法会AttributeError异常
- 只要有赋值操作(=,setattr)都会调用__setattr__方法(包括a = A())
Item 33: Validate Subclasses with Metaclasses
- 使用元类对类型对象进行验证
- 元类的__new__会在类语句全部执行完后调用
Item 34: Register Class Existence with Metaclasses
- 使用元类进行类信息注册,序列化,orm
Item 35: Annotate Class Attributes with Metaclasses
- 利用元类修改类属性,元类和descriptor对类属性进行伪装匿名
5. Concurrency and Parallelism
Item 36: Use subprocess to Manage Child Processes
- 使用subprocess模块运行子进程管理自己的输入和输出流
- subprocess可以并行执行最大化CPU的使用
- communicate的timeout参数避免死锁和被挂起的子进程
Item 37: Use Threads for Blocking I/O, Avoid for Parallelism
- 因为GIL,Python thread并不能并行运行多段代码
- Python保留thread的两个原因:1.可以模拟多线程,2.多线程可以处理I/O阻塞的情况
- Python thread可以并行执行多个系统调用,这个可以用来做并行计算
Item 38: Use Lock to Prevent Data Races in Threads
- 虽然Python thread不能同时执行,但是Python解释器还是会打断操作数据的两个字节码指令,所以还是需要锁
- thread模块的Lock类是Python的互斥锁实现
Item 39: Use Queue to Coordinate Work Between Threads
- Queue类具备构建健壮并发管道的特性:阻塞操作,缓存大小和连接
Item 40: Consider Coroutines to Run Many Functions Concurrently
- 线程有三个大问题:
- 需要特定工具去确定安全性
- 单个线程需要8M的内存
- 线程启动消耗
- coroutine只有1kb的内存消耗
-
generator可以通过send方法把值传递给yield
def my_coroutine(): while True: received = yield print("Received:", received) it = my_coroutine() next(it) it.send("First") ('Received:', 'First')
- Python2不支持直接yield generator,可以使用for循环yield
Item 41: Consider concurrent.futures for True Parallelism
- CPU瓶颈模块使用C扩展
- concurrent.futures的multiprocessing可以并行处理一些任务,Python2没有这个模块
6. Built-in Modules
Item 42: Define Function Decorators with functools.wraps
- 装饰器可以对函数进行封装,但是会改变函数信息
-
使用functools的warps可以解决这个问题
def trace(func): @wraps(func) def wrapper(*args, **kwargs): # … return wrapper @trace def fibonacci(n): # …
Item 43: Consider contextlib and with Statements for Reusable try/finally Behavior
- 使用with语句代替try/finally,增加代码可读性
- 使用contextlib提供的contextmanager装饰函数就可以被with使用
- with和yield返回值使用
Item 44: Make pickle Reliable with copyreg
- pickle模块只能序列化和反序列化确认没有问题的对象
- copyreg的pickle支持属性丢失,版本和导入类表信息
Item 45: Use datetime Instead of time for Local Clocks
- 不要使用time模块在转换不同时区的时间
- 而用datetime配合pytz转换,总数保持UTC时间,最后面在输出本地时间
Item 46: Use Built-in Algorithms and Data Structures
- 使用内置的算法和数据结构:
- collections.deque
- collections.OrderedDict
- collection.defaultdict
-
heapq模块操作list(优先队列):heappush,heappop和nsmallest
a = [] heappush(a, 5) heappush(a, 3) heappush(a, 7) heappush(a, 4) print(heappop(a), heappop(a), heappop(a), heappop(a)) # >>> # 3 4 5 7
- bisect模块:bisect_left可以对有序列表进行高效二分查找
- itertools模块(Python2不一定支持):
- 连接迭代器:chain,cycle,tee和zip_longest
- 过滤:islice,takewhile,dropwhile,filterfalse
- 组合不同迭代器:product,permutations和combination
Item 47: Use decimal When Precision Is Paramount
- 高精度要求的使用Decimal处理
Item 48: Know Where to Find Community-Built Modules
- 在 https://pypi.python.org 查找通用模块,并且用pip安装
7. Collaboration
Item 49: Write Docstrings for Every Function, Class, and Module
Item 50: Use Packages to Organize Modules and Provide Stable APIs
Item 51: Define a Root Exception to Insulate Callers from APIs
Item 52: Know How to Break Circular Dependencies
Item 53: Use Virtual Environments for Isolated and Reproducible Dependencies
8. Production
Item 54: Consider Module-Scoped Code to Configure Deployment Environments
Item 55: Use repr Strings for Debugging Output
- repr作用于内置类型会产生可打印的字符串,eval可以获得这个字符串的原始值
- __repr__自定义上面输出的字符串
Item 56: Test Everything with unittest
- 使用unittest编写测试用例,不光是单元测试,集成测试也很重要
- 继承TestCase,并且每个方法名都以test开始
Item 57: Consider Interactive Debugging with pdb
- 启用pdb,然后在配合shell命令调试 import pdb; pdb.set_trace();
Item 58: Profile Before Optimizing
- cProfile比profile更精准
- ncalls:调用次数
- tottime:函数自身耗时,不包括调用函数的耗时
- cumtime:包括调用的函数耗时
Item 59: Use tracemalloc to Understand Memory Usage and Leaks
- gc模块可以知道有哪些对象存在,但是不知道怎么分配的
- tracemalloc可以得到内存的使用情况,但是只在Python3.4及其以上版本提供