When training machine learning models and applying them to inference tasks, it's not uncommon to encounter issues, especially when switching computing environments (e.g., training on a GPU and inferring on a CPU). This often results in unstable predictions, such as alternating between two classes (0 and 1).
Here, I'll briefly summarize the issues I encountered and how I resolved them.
1. Model Loading Mistake
I built an inference endpoint on a CPU (ml.m5.large
) and found that the model, which was trained on a GPU (g4dn.2xlarge
), did not produce the expected inference results.
Upon checking the logs, I encountered the following warning:
2024-04-25T05:47:04,365 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Some weights of the model checkpoint at /opt/ml/model/code/pytorch_model.bin were not used when initializing BertModel...
This issue occurred when I attempted to directly load the pytorch_model.bin
file, which was output from the fine-tuned PredictionModel
, using BertModel.from_pretrained
. The BertModel.from_pretrained
method assumes the structure of the basic BERT model, thus it neglected the parameters of the LSTM and linear layers added to the PredictionModel
, resulting in important parameters being overlooked.
pretrained_config = path.join("/opt/ml/model/code/", "config.json")
pretrained_model = path.join("/opt/ml/model/code/", "pytorch_model.bin")
config = BertConfig.from_pretrained(pretrained_config)
model = PredictionModel(config=config, pretrained_model=pretrained_model)
The issue was resolved by properly loading the state from the fine_tuning_model.pt
, which contained all the parameters of the model:
model = PredictionModel(config=config, pretrained_model=None)
model_path = path.join("/opt/ml/model/code/", "fine_tuning_model.pt")
model.load_state_dict(torch.load(model_path))
2. Device Assignment for the Model
The model was initially set to use a CUDA device by default, which led to errors in an environment not supporting CUDA.
2024-04-28T06:59:31,905 [INFO ] W-9001-model_1.0-stdout MODEL_LOG - Exception in model fn Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False...
In an environment with only CPUs, it was necessary to appropriately assign the model to the correct device:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
3. Disabling Gradient Calculations
During inference, it is recommended to disable gradient calculations when model input data is provided as tensors. While I believe this had no effect on the inference results, failing to do so led to unnecessary memory use and increased computation time.
with torch.no_grad():
model_input = torch.tensor([preprocessed_data.getitem()], dtype=torch.long).to(device)
model_output = model(model_input)
Conclusion
Successfully transitioning from training to inference requires addressing key challenges like model loading, device compatibility, and efficient resource management. These solutions ensure more accurate and efficient machine learning applications.