Predicting Stock Prices with a Machine Learning Model in Python

In the world of finance, predicting stock prices is a highly sought-after task. Accurate stock price predictions can help investors make informed decisions about when to buy and sell stocks, as well as help traders, develop profitable trading strategies. In this blog post, we will explore how to develop a machine-learning model to predict stock prices using Python.

Prerequisites:

Before we begin, it is important to have a basic understanding of machine learning concepts and Python programming. If you are new to machine learning, it is recommended that you first familiarize yourself with the basics before proceeding. In terms of Python, it is helpful to have a working knowledge of the following libraries:

  • NumPy: for numerical computing with Python
  • Pandas: for data manipulation and analysis
  • Matplotlib: for data visualization
  • Scikit-learn: for machine learning in Python

Data collection:

The first step in developing a machine learning model is to collect the data that we will use to train and test our model. In this case, we will be using historical stock price data for a particular company. There are a number of ways to obtain this data, including downloading it from a financial website or using an API to retrieve it programmatically.

For this example, we will use the Yahoo Finance API to retrieve stock price data for Apple Inc. (AAPL). To use the API, we will need to install the yfinance library and import it into our Python script.

!pip install yfinance
import yfinance as yf

Next, we will use the Ticker class to retrieve the stock data for AAPL.

ticker = yf.Ticker("AAPL")

To retrieve the stock data, we will use the history method of the Ticker class. This method takes a number of arguments, including the start and end dates for the data, the frequency of the data (daily, weekly, etc.), and whether to include dividends and splits.

For this example, we will retrieve daily stock data for the past 5 years (60 months).

data = ticker.history(period="60mo", interval="1d")

The history method returns a Pandas dataframe containing the stock data. We can use the head method to take a look at the first few rows of the data.

data.head()

The resulting dataframe contains a number of columns, including the date, open price, close price, high price, low price, volume, and dividends.

Data preprocessing:

Before we can train a machine learning model on our stock data, we need to perform some preprocessing steps to prepare the data for modelling.

First, we need to select the features that we will use to predict the stock price. In this case, we will use the open price, close price, high price, low price, and volume as our features. We will store these features in a separate dataframe.

features = data[["Open", "Close", "High", "Low", "Volume"]]

Next, we need to select the target variable that we will be trying to predict. In this case, we will use the close price as our target.

target = data["Close"]

It is also a good idea to normalize the data to ensure that the scale of the features does not affect the model’s performance. We can use the MinMaxScaler from Scikit-learn to normalize the data.

from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() f eatures_scaled = scaler.fit_transform(features)

Finally, we need to split the data into training and testing sets. We will use 80% of the data for training and 20% for testing.

from sklearn.model_selection  import train_test_split X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.2,random_state=42)

Model development:

Now that our data is prepared, we can start developing our machine-learning model. For this example, we will use a multi-layer perceptron (MLP) model, which is a type of neural network.

To create the MLP model, we will use the MLPRegressor class from Scikit-learn. We will set the number of hidden layers to 2 and the number of neurons in each layer to 100.

from sklearn.neural_networkimport MLPRegressor mlp = MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=1000)

Next, we will fit the model to the training data by calling the fit method of the MLPRegressor object.

mlp.fit(X_train, y_train)

Model evaluation:

Now that our model is trained, we can evaluate its performance on the test data. We will start by predicting the stock prices on the test data using the predict method of the MLPRegressor object.

y_pred = mlp.predict(X_test)

To evaluate the model’s performance, we can use a number of different metrics. One common metric is the root mean squared error (RMSE), which measures the difference between the predicted and actual values. A lower RMSE indicates a better fit.

We can calculate the RMSE using the mean_squared_error function from Scikit-learn.

from sklearn.metrics import mean_squared_error rmse = np.sqrt(mean_squared_error(y_test, y_pred)) print("RMSE:", rmse)

Another metric we can use is the coefficient of determination, or R^2, which measures the proportion of the variance in the target variable that is explained by the model. An R^2 of 1 indicates a perfect fit, while an R^2 of 0 indicates no relationship between the features and the target.

We can calculate the R^2 using the r2_score function from Scikit-learn.

from sklearn.metrics  import r2_score r2 = r2_score(y_test, y_pred) print("R^2:", r2)

It is also a good idea to visualize the predicted and actual stock prices to get a better understanding of how well the model is performing. We can use Matplotlib to create a line plot of the predicted and actual prices.

import matplotlib.pyplot as plt plt.plot(y_pred, label="Predicted") plt.plot(y_test.values, label="Actual") plt.legend() plt.show()

Here is the complete code for developing a machine-learning model to predict stock prices using Python:

# Install and import necessary libraries
!pip install yfinance
import yfinance as yf
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Retrieve stock data from Yahoo Finance API
ticker = yf.Ticker("AAPL")
data = ticker.history(period="60mo", interval="1d")

# Select features and target
features = data[["Open", "Close", "High", "Low", "Volume"]]
target = data["Close"]

# Normalize data
scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_scaled, target, test_size=0.2, random_state=42)

# Create MLP model
mlp = MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=1000)

# Fit model to training data
mlp.fit(X_train, y_train)

# Predict stock prices on test data
y_pred = mlp.predict(X_test)

# Calculate RMSE and R^2
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("RMSE:", rmse)

r2 = r2_score(y_test, y_pred)
print("R^2:", r2)

# Plot predicted and actual stock prices
plt.plot(y_pred, label="Predicted")
plt.plot(y_test.values, label="Actual")
plt.legend()
plt.show()

In this blog post, we have developed a machine-learning model to predict stock prices using Python. We have collected stock data using the Yahoo Finance API, preprocessed the data, trained an MLP model on the data, and evaluated the model’s performance.

While the model we have developed in this example may not be perfect, it provides a starting point for further exploration and improvement. There are a number of ways that we could potentially improve the model’s performance, such as by fine-tuning the hyperparameters, using different features or a different model, or incorporating additional data.

I hope this blog post has been helpful and has given you some ideas for your own machine-learning projects. Happy coding!

Leave a Comment