How to improve ML Model Accuracy for Text Classification?

Vikram Bhagat - Aug 6 - - Dev Community

Hi Experts,

We are dealing with Text Classification Problem. We have around 80K records with around 50 classes. The data is highly imbalanced. It has 2 columns one for description and other contains class.
Till now we have tried following models and techniques:

  1. Data Preprocessing: a. Lowercase conversion, removed numeric texts, removed punctuations b. Removed unimportant words and stop words c. Lemmatization
  2. TFIDF transformation
  3. Using SKLEARN Models: a. Linear SVC b. Linear Regression c. Logistic Regression d. Decision Trees e. Random Forest
  4. Using Huggingface Transformers: a. Google Bert b. Distil Bert
  5. SMOTE sampling

It is observed that the maximum accuracy we got is 70% (Random Forest and Google Bert).
Is there any scope to improve accuracy?
If yes, what other techniques or models we can use to improve accuracy?

.
Terabox Video Player