Every second is worth a million dollars in today’s fast-paced financial world. Traders, analysts, and investors need real-time insights to make data-driven decisions, especially how the news can flip an entire market. Therefore, classifying news sentiment in real-time is crucial for financial professionals to understand market trends and make wise and fast decisions. But how do you process the news in real-time and analyze its sentiment? It requires powerful and flexible technologies to give a constant stream of data. Imagine, you have the flash carrying all the trendy news to the users immediately.
In this guide, I’ll walk you through the quick setup of a real-time sentiment analysis system for financials using Readpanda, PyTorch, and a pre-tuned sentiment classifier. Whether you are unfamiliar with these technologies or are looking for a simple way to incorporate them into your workflow, we will classify financial news into positive, negative, or neutral sentiment.
Why Should you care about Redpanda and PyTorch?
First, let’s start with the reason why we are using these technologies:
- Redpanda: is a top-notch streaming data platform that handles the real-time data stream which is compatible with Kafka but much easier to use. Redpanda provides low latency, high-throughput streaming data processing, making it excellent for developing real-time applications and event-driven architectures, especially for financial markets.
- Pytorch: is a machine learning library used for tasks like building and training deep learning models. In our case, it helps us easily classify the sentiment of news articles using a pre-trained model. We will gain flexibility and power without rebuilding it from the beginning.
These two main components of this guidance are essential for financial markets. This system combines one of the best technologies: fast data management via Redpanda and advanced machine learning with PyTorch.
An overview of the model: Meta’s RoBERTa
In this demo, we will use a fined-tuned version of Meta’s RoBERTa model which is available from Hugging Face user mrm8488. Since this model is fined-tuned to understand the subtleties in financial news, it can generate sentiment predictions. This model can classify news into three categories-positive, negative, or neutral making it highly beneficial for making decisions in finance.
Step-by-Step: Setting Up Your Environment
Let’s set up our working environment. You have two options here-either you can install everything locally, or you can use Docker for a more streamlined process.
What you need:
- Python 3.12 (if you only have 3.1, please check out the Docker section)
- Git lts (to handle large files)
- Go 1.22 or so
- Redpanda or Redpanda Serverless
- Rpk
- Docker (optional, but highly recommended)
Option 1: Local Installation
The following steps are how you set up everything in your local machine
1. Clone the project:
Copied!git clone https://github.com/voutilad/redpanda-pytorch-demo cd redpanda-pytorch-demo git submodule update --init --recursive
- Set up Python Environment (virtualenv and dependencies):
Copied!python3 -m venv venv source venv/bin/activate pip install -U pip pip install -r requirements.txt
- Build the Redpanda Connect component
Copied!CGO_ENABLED=0 go build -C rpcp
Option 2: Docker Installation
For a simpler, more portable setup, we can use Docker. This will save a lot of our time from managing Python versions and system dependencies.
- Build the Docker Image
Copied!docker build . -t redpanda-torch
2. Run the image
Copied!docker run --rm -it -p 8080:8080 redpanda-torch
Deploying the API for Sentiment Classification
Once our working environment is all set, we can deploy the HTTP API that will handle real-time news classification.
Prepare Redpanda Topics for Streaming Data
First, we need to set up Redpanda topics to handle the incoming and outgoing data. The following command is how we create them
Copied!rpk topic create \ news positive-news negative-news neutral-news unknown-news -p 5
* Note: if you’re using Redpanda Serverless, you should use rpk auth login to create your profile.
These created topics will handle the news data, and the resulting classified sentiment will be stored in the positive news, negative news, or neutral news topics respectively.
Running the HTTP Service
After the Redpanda Topics Preparation, this demo needs to rely on some environment variables for some runtime configuration. The HTTP API will receive the news articles, classify them, and return the sentiment in real-time.
- Set environment variables to connect to Redpanda:
Copied!export REDPANDA_BROKERS=localhost:9092 export REDPANDA_TOPIC=news
- REDPANDA_BROKERS: list of seed brokers (defaults to “localhost:9092”)
- REDPANDA_TOPIC: Base name of the topics (defaults to “news”)
2. Start the API Server:
Copied!./rpcp/rp-connect-python run -r python.yaml http-server.yaml
A picture showing what it looks like when the API server is up
3. Time to test it out! You can now send an HTTP request to the API to classify news:
Positive News
Copied!curl -s -X POST \ -d "The stock market is showing strong gains today" \ 'http://localhost:8080/sentiment' | jq
Negative news
Copied!curl -s -X POST \ -d "The latest recall of Happy Fun Ball has sent ACME's stock plummeting." \ 'http://localhost:8080/sentiment' | jq
Neutral news
Copied!curl -s -X POST \ -d "The Federal Reserve kept interest rates unchanged, citing stable economic conditions." \ 'http://localhost:8080/sentiment' | jq
Building the Streaming Pipeline
Instead of processing each new article individually through the API, we can easily build a data enrichment pipeline sourcing data from an input Redpanda topic.
The following steps are how we can set up a streaming pipeline:
- Run the pipeline (similar to the above steps)
Copied!./rpcp/rp-connect-python run -r python.yaml enrichment.yaml
2. Produce data to the pipeline
Copied!echo 'The Dow closed at a record high today on news that aliens are real' \ | rpk topic produce news
This process will send the data to the news topic where it will be classified and sent to either positive news, negative news, or neutral news topics based on the sentiment.
3. Consume the output:
Copied!rpk topic consume \ positive-news negative-news neutral-news unknown-news \ --offset :end
How it All Words: Under the Hood
This section will walk us through how the system allows us to deploy a pre-tuned sentiment classifier for financial news using Redpanda Connect and PyTorch. I try to simplify the process to make it more approachable.
- Receiving HTTP POST Requests:
The process begins when the system receives a request from clients about news articles through API. The API server is configured to accept POST requests. It listens for incoming news articles sent by users. When the system receives the requests, the news will be delivered to the next stage in the pipeline for sentiment analysis.
2. Using Caching to Save Resources:
The system is designed to check if the same article has already been analyzed by looking it up in the cache before it starts the process of sentiment analysis. It is significantly useful to prevent the system from re-running the model on the same article. The cache stores all previous news articles in memory for a set of time. If a new article is found here, the system will automatically retrieve the result and skip the analysis process. Each new article is hashed using the SHA-1 algorithm, the system will compare the hash to other records in the cache. The corresponding result will be retrieved if it’s found, otherwise, the system will move on to the next step.
3. Analyzing Sentiment with PyTorch / Hugging Face
In the case, that the new article cannot be found in the cache, the system runs it through a pre-trained sentiment analysis model built using PyTorch. As I’ve mentioned above in this blog, the system uses a pre-trained RoBERTa model, which has been fine-tuned for financial news. The model will process the text and classify the sentiment.
Copied!## Our Python processor that uses a fine-tuned sentiment model to classify ## financial news. processor_resources: - label: python python: script: | from classifier import get_pipeline from os import environ device = environ.get("DEMO_PYTORCH_DEVICE", "cpu") text = content().decode() pipeline = get_pipeline(device=device) root.text = text scores = pipeline(text) if scores: root.label = scores[0]["label"] root.score = scores[0]["score"] else: root.label = "unlabeled" root.score = 0.0
The model runs within the Redpanda pipeline using a Python script that interacts with the PyTorch framework. The script is responsible for loading the model and processing the incoming news article.
4. Updating the cache
Once the sentiment analysis is finished, the system saves the result in the memory cache. Thus, if the same new article is being sent again, the system can take it out directly from the cache rather than re-running the model.
Conclusion
This blog is a complete walkthrough that helps you quickly set up and deploy a real-time sentiment analysis tool for financial news using Redpanda Connect and PyTorch. By combining the real-time streaming data (Redpanda) and the flexibility of PyTorch machine learning framework, users can gain a lot of benefits in the fast-paced markets nowadays. With an easy and fast setup, we can analyze the news and classify them based on the sentiment which is useful information for people from different industries to come up with a good decision.
Resources
For more information related to this blog: https://github.com/voutilad/redpanda-pytorch-demo/tree/main
Featured image: https://www.redpanda.com/blog/data-engineering-tools-strategies
- PyTorch Documentation: https://pytorch.org/tutorials/beginner/basics/intro.html
- RoBERTa: https://huggingface.co/FacebookAI/roberta-base
- Redpanda Installation: https://cloud.redpanda.com/clusters/crjeid5md2kpp14dr8g0/overview
- Hugging Face: https://huggingface.co/
- Redpanda Guidance for Beginner: https://www.redpanda.com/blog/data-streaming-with-redpanda?_gl=1*173ezxq*_gcl_au*MTIxMjQ5ODI4LjE3MjYxOTc2NDc.
- Real-time predictions for ML apps with Redpanda: https://www.redpanda.com/blog/real-time-predictions-machine-learning-applications-wasm
- What is streaming data: https://www.redpanda.com/blog/streaming-data-examples-best-practices-tools#what-is-streaming-data
- Docker Documentation: https://docs.docker.com/
- Fine-tuning overview: https://www.ibm.com/topics/fine-tuning
Leave a Reply