Unnecessary Iteration
Avoid unnecessary iterations. Use vectorized solutions instead of loops.
Avoid unnecessary iterations. Use vectorized solutions instead of loops.
Be careful when using the NaN equivalence comparison in NumPy and Pandas.
Avoid using chain indexing in Pandas.
Explicitly select columns and set DataType in Pandas.
When a new empty column is needed in a DataFrame in Pandas, use the NaN value in Numpy instead of using zeros or empty strings.
Explicitly specify on, how and validate parameter for df.merge() API in Pandas for better readability.
Remember to assign the result of an operation to a variable or set the in-place parameter in the API.
Use df.to_numpy() in Pandas instead of df.values() for transform a DataFrame to a NumPy array.
When the multiply operation is performed on two-dimensional matrixes, use np.matmul() instead of np.dot() in NumPy for better semantics.
Check whether feature scaling is added before scaling-sensitive operations.
Hyperparameters should be set explicitly.
Free memory in time.
Set deterministic algorithm option to True during the development process, and use the option that provides better performance in the production.
Set random seed explicitly during the development process whenever a possible random procedure is involved in the application.
Add a mask for possible invalid values. For example, developers should add a mask for the input for tf.log() API.
Use the broadcasting feature in TensorFlow 2 to be more memory efficient.
Use tf.TensorArray() in TensorFlow 2 if the value of the array will change in the loop.
Call the training mode in the appropriate place in PyTorch code to avoid forgetting to toggle back the training mode after the inference step.
Use self.net() in PyTorch to forward the input to the network instead of self.net.forward().
Use optimizer.zero_grad(), loss_fn.backward(), optimizer.step() together in order in PyTorch. Do not forget to use optimizer.zero_grad() before loss_fn.backward() to clear gradients.
Use Pipeline() API in Scikit-Learn or check data segregation carefully when using other libraries to prevent data leakage.
Use threshold-independent metrics instead of threshold-dependent ones in model evaluation.