How to connect Python to PostgreSQL: A Step-by-Step Guide for Data Scientists!

Easier than ever!

4 min readJun 18, 2024

--

Data Scientists often have to access data warehouses to perform their tasks, such as analytics, EDA, ETL, etc.

One of the most popular databases in the industry is PostgreSQL.

PostgreSQL is widely used for developers of all levels and corporations of all sizes. It also has a nice ecosystem with a large range of extensions and add-ons that flawlessly power the main database.

Whether you’re creating a custom database for a side project, or you need a production-level transactional/analytical database, Postgre is an amazing option.

I know, this sounds nice and all, but you’re only here to learn how to connect Python to your PostgreSQL database. So follow along to learn how to do that!

Photo by Caspar Camille Rubin on Unsplash

1. Creating a virtual environment

Many data scientists/analysts stay in their Google Collab environment and hence never become familiar with virtual environments.

This is a neat way of ensuring that the correct package/library versions are consistently used every time your code runs, therefore contributing to the generalizability of your results (never forget the “science” portion of data science).

Steps to do that:

  1. Open the Command Prompt on Windows or Terminal on Linux/Mac.
  2. Second, create a new directory to store the project files. You can name it whatever, I’ll name it analytics:
mkdir analytics

3. Inside your new directory, create a new virtual environment called .venv using the built-in venv module. The syntax is simply python3 -m venv path-to-create-environment:

python3 -m venv .venv

4. Finally, activate/start the virtual environment:

# On Windows
.venv/scripts/activate

# On Mac and Linux
source venv/bin/activate

2. Installing the psycopg2 module

Psycopg2 is a PostgreSQL database driver, we need it to perform operations on PostgreSQL using python

First, install the psycopg2 package using the following pip command:

pip install psycopg2

Second, create the requirements.txt file:

pip3 freeze > requirements.txt

A requirements.txt file ensures reproducibility by specifying exact package versions, enabling consistent environments and results.

3. Connecting to the PostgreSQL database from Python

Now we are ready to connect to our PostgreSQL database.

First, we need to create a configuration file called database.ini in the project directory to store database connection parameters for authentication:

[postgresql]
host=host_address
database=database_name
user=your_username
password=your_password

In the database.ini file, replace the host_address, database_name, your_username and your_password with the actual values.

Now we create a new file called config.py in the project directory:

touch config.py

Now let’s write a function to read the config data from the database.ini file:

from configparser import ConfigParser

def load_config(filename='database.ini', section='postgresql'):
parser = ConfigParser()
parser.read(filename)

# get section, default to postgresql
config = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
config[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return config

if __name__ == '__main__':
config = load_config()
print(config)

The load_config() function uses the Python built-in configparser package to read data from the database.ini file.

By using the database.ini and not directly hardcoding your credentials on Python, your PostgreSQL connection parameters are now flexible when being used in different environments (e.g. testing, prod and dev).

However, always add your database.ini file to a .gitignore file such that you don’t expose your credentials to externals when pushing your code to Git Hub.

Simply create a file called .gitignore and add the line:

database.ini

Last but not least, let’s create a new file called connect.py that will use the load_config() function to read the database configuration and finally connect our code to the PostgreSQL:

import psycopg2
from config import load_config

def connect(config):
""" Connect to the PostgreSQL database server """
try:
with psycopg2.connect(**config) as conn:
print('Connected to the PostgreSQL server.')
return conn
except (psycopg2.DatabaseError, Exception) as error:
print(error)

if __name__ == '__main__':
config = load_config()
connect(config)

Since we use the config.py module, we can pass the configuration to the connect() function and unpack it using the ** operator.

The method above is better and safer than the alternative of passing in your credentials directly to the function, like this:

conn = psycopg2.connect(
host="host_address",
database="database_name",
user="your_username",
password="your_password"
)

Closing Remarks

PostgreSQL is awesome, and even better when coupled with Python. I highly encourage you to keep your database.ini, connect.py and config.py inside a config folder for organization. If you decide to do this, you should modify the argument ‘filename’ in load_config() to “config/database.ini”.

If you liked this tutorial please share it with your friends and leave a comment on what you liked the most and what I could have done better, don’t forget to add me on LinkedIn and GitHub and don’t hesitate to reach out if you have any questions.

--

--

Computer Science student in love with teaching and learning Data Science. Python lover and R bully :)