2022
Topic Modelling-Based Approach for Clustering Legal Documents
This paper proposes a unique approach to cluster these documents using the mini batch k-means algorithm on dimensionally reduced sentence embeddings generated with the use of DistilBERT and UMAP. The proposed approach has been compared to state-of-the-art topic modelling and clustering approaches and has outperformed them.
2022
Insights into NoSQL databases using financial data: A comparative analysis
This paper compares types of NoSQL databases based on certain metrics like data model, indexing methods, atomicity, integrity and several more. It demonstrates the implementation of three NoSQL databases namely, MongoDB, Cassandra and Redis, using financial data. Experiments were performed to compare the performance of the aforementioned databases when using fundamental READ queries to retrieve the complete dataset and complex READ queries to retrieve a specific section. Aggregation operations were also implemented on the data. Fundamental WRITE queries to load the entire dataset and complex WRITE queries to update particular parts of it were also performed.
2021
Sentiment Analysis of Twitch.tv Livestream Messages using Machine Learning Methods
In this paper, a methodology to perform sentiment analysis with a set of machine learning-based models on livestream messages from Twitch.tv is proposed. Machine Learning models like Support Vector Classifier, Logistic Regression, Decision Tree Classifier, Random Forest Classifier and Multinomial Naïve Bayes were implemented. Among these models, the Support Vector Classifier outperformed the current state-of-the-art model and displayed a 10.3% rise in accuracy, a 9.1% rise in recall and a 7.5% rise in the F1-score.