Code Smells

Unnecessary Iteration

Avoid unnecessary iterations. Use vectorized solutions instead of loops.

NaN Equivalence Comparison Misused

Be careful when using the NaN equivalence comparison in NumPy and Pandas.

Chain Indexing

Avoid using chain indexing in Pandas.

Columns and DataType Not Explicitly Set

Explicitly select columns and set DataType in Pandas.

Empty Column Misinitialization

When a new empty column is needed in a DataFrame in Pandas, use the NaN value in Numpy instead of using zeros or empty strings.

Merge API Parameter Not Explicitly Set

Explicitly specify on, how and validate parameter for df.merge() API in Pandas for better readability.

In-Place APIs Misused

Remember to assign the result of an operation to a variable or set the in-place parameter in the API.

Dataframe Conversion API Misused

Use df.to_numpy() in Pandas instead of df.values() for transform a DataFrame to a NumPy array.

Matrix Multiplication API Misused

When the multiply operation is performed on two-dimensional matrixes, use np.matmul() instead of np.dot() in NumPy for better semantics.

No Scaling Before Scaling-sensitive Operation

Check whether feature scaling is added before scaling-sensitive operations.

Hyperparameter not Explicitly Set

Hyperparameters should be set explicitly.

Memory not Freed

Free memory in time.

Deterministic Algorithm Option Not Used

Set deterministic algorithm option to True during the development process, and use the option that provides better performance in the production.

Randomness Uncontrolled

Set random seed explicitly during the development process whenever a possible random procedure is involved in the application.

Missing the Mask of Invalid Value

Add a mask for possible invalid values. For example, developers should add a mask for the input for tf.log() API.

Broadcasting Feature Not Used

Use the broadcasting feature in TensorFlow 2 to be more memory efficient.

TensorArray Not Used

Use tf.TensorArray() in TensorFlow 2 if the value of the array will change in the loop.

Training / Evaluation Mode Improper Toggling

Call the training mode in the appropriate place in PyTorch code to avoid forgetting to toggle back the training mode after the inference step.

Pytorch Call Method Misused

Use self.net() in PyTorch to forward the input to the network instead of self.net.forward().

Gradients Not Cleared before Backward Propagation

Use optimizer.zero_grad(), loss_fn.backward(), optimizer.step() together in order in PyTorch. Do not forget to use optimizer.zero_grad() before loss_fn.backward() to clear gradients.

Data Leakage

Use Pipeline() API in Scikit-Learn or check data segregation carefully when using other libraries to prevent data leakage.

Threshold-Dependent Validation

Use threshold-independent metrics instead of threshold-dependent ones in model evaluation.