Unnecessary Iteration

Avoid unnecessary iterations. Use vectorized solutions instead of loops.

NaN Equivalence Comparison Misused

Be careful when using the NaN equivalence comparison in NumPy and Pandas.

Columns and DataType Not Explicitly Set

Explicitly select columns and set DataType in Pandas.

Empty Column Misinitialization

When a new empty column is needed in a DataFrame in Pandas, use the NaN value in Numpy instead of using zeros or empty strings.

Merge API Parameter Not Explicitly Set

Explicitly specify on, how and validate parameter for df.merge() API in Pandas for better readability.

In-Place APIs Misused

Remember to assign the result of an operation to a variable or set the in-place parameter in the API.

No Scaling Before Scaling-sensitive Operation

Check whether feature scaling is added before scaling-sensitive operations.

Hyperparameter not Explicitly Set

Hyperparameters should be set explicitly.

Memory not Freed

Free memory in time.

Deterministic Algorithm Option Not Used

Set deterministic algorithm option to True during the development process, and use the option that provides better performance in the production.

Randomness Uncontrolled

Set random seed explicitly during the development process whenever a possible random procedure is involved in the application.

Missing the Mask of Invalid Value

Add a mask for possible invalid values. For example, developers should add a mask for the input for tf.log() API.

Broadcasting Feature Not Used

Use the broadcasting feature in TensorFlow 2 to be more memory efficient.

Training / Evaluation Mode Improper Toggling

Call the training mode in the appropriate place in PyTorch code to avoid forgetting to toggle back the training mode after the inference step.

Data Leakage

Use Pipeline() API in Scikit-Learn or check data segregation carefully when using other libraries to prevent data leakage.

Threshold-Dependent Validation

Use threshold-independent metrics instead of threshold-dependent ones in model evaluation.