Table of contents
No headings in the article.
Hello pals,
While working with Django, we all write code that does the job, but some code may be performing excessive computations or operations that we are unaware of. These operations may be ineffective and/or counterproductive in practice.
Here, I am going to mention some anti-patterns in Django models.
Using
len(queryset)
instead ofqueryset.count()
The queryset in Django are lazily evaluated which means that records in database aren’t read from database until we interact with the data.
len(queryset)
performs the count of database records by Python interpreter in application level. For doing so, all the records should be fetched from the database at first, which is computationally heavy operation.
Whereas,queryset.count()
calculates the count at the database level and just returns the count.
For a model Post
:
from django.db import models
class Post(models.Model):
author = models.CharField(max_length=100)
title = models.CharField(max_length=200)
content = models.TextField()
If we use len(queryset)
, it handles the calculation like SELECT * FROM post
which returns a list of records (queryset) and then python interpreter calculates the length of queryset which is similar to list data structure. Imagine the waste in downloading many records only to check the length and throw them away at the end! But, if we need the records after reading the length, then len(queryset)
can be valid.
If we use queryset.count()
, it handles the calculation like SELECT COUNT(*) FROM post
at database level. It makes the code execution quicker and improves database performance.
Using
queryset.count()
instead ofqueryset.exists()
- While we kept praising the use of
queryset.count()
to check the length of a queryset, using it may be performance heavy if we want to check the existence of the queryset.
For the same modelPost
, when we want to check if there are any post written by authorArjun
, we may do something like:
- While we kept praising the use of
posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')
if posts_by_arjun.count() > 0:
print('Arjun writes posts here.')
else:
print('Arjun doesnt write posts here.)
The posts_by_arjun.count() performs an SQL operation that scans every row in a database table. But, if we are just interested in If Arjun writes posts here or not ?
then, more efficient code will be:
posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')
if posts_by_arjun.exists():
print('Arjun writes posts here.')
else:
print('Arjun doesnt write posts here.)
posts_by_arjun.exists()
returns a bool expression that finds out if at least one result exists or not. It simply reads a single record in the most optimized way (removing ordering, clearing any user-defined select_related()
or distinct()
methods.)
Also, checking existence / truthiness of queryset like this is inefficient.
posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')
if posts_by_arjun:
print('Arjun writes posts here.')
else:
print('Arjun doesnt write posts here.)
This does the fine job in checking if there are any posts by Arjun or not but is computationally heavy for larger no of records. Hence, use of queryset.exists()
is encouraged for checking existence / truthiness of querysets.
Using
signals
excessivelyDjango signals are great for triggering jobs based on events. But it has some valid cases, and they shouldn’t be used excessively. Think of any alternative for signals within your codebase, brainstorm on its substitution and try to place signals logic in your models itself, if possible.
They are not executed asynchronously. There is no background thread or worker to execute them. If you want some background worker to do your job for you, try using
celery
.As signals are spread over separate files if you’re working on a larger project, they can be harder to trace for someone who is a fresh joiner to the company and that’s not great. Although,
django-debug-toolbar
does some help in tracing the triggered signals.
Let’s create a scenario where we want to keep the record of Post
writings in a separate model PostWritings
.
class PostWritings(models.Model):
author = models.CharField(max_length=100, unique=True)
posts_written = models.PositiveIntegerField(default=0)
If we want to automatically update the PostWritings
record for a use based on records created on Post
model, there are ways to achieve the task with / without signals.
A. With Signals
from django.db.models.signals import post_save
from django.db.models import F
from django.dispatch import receiver
from .models import Post
@receiver(sender=Post, post_save)
def post_writing_handler(sender, instance, created, **kwargs):
if created:
writing, created = PostWritings.objects.get_or_create(author=instance.author)
writing.update(posts_written=F('posts_written') + 1)
B. Without Signals
We need to override the save()
method for Post
model.
from django.db import models
from django.db.models import F
class Post(models.Model):
author = models.CharField(max_length=100)
title = models.CharField(max_length=200)
content = models.TextField()
def save(self, *args, **kwargs:
# Overridden method.
author = self.author
if self.id:
writing, created = PostWritings.objects.get_or_create(author=author)
writing.update(posts_written=F('posts_written') + 1)
super(Post, self).save(*args, **kwargs)
As the same job can be accomplished without signals, the code can be easily traced and prevent unnecessary event triggers.
If someone feels about not having readability on save()
method here, breaking up code is always great. Let’s do that.
from django.db import models
from django.db.models import F
class Post(models.Model):
author = models.CharField(max_length=100)
title = models.CharField(max_length=200)
content = models.TextField()
def _update_post_writing(self, created=False, author=None):
if author is not None and craeted:
writing, created = PostWritings.objects.get_or_create(author=author)
writing.update(posts_written=F('posts_written') + 1)
def save(self, *args, **kwargs:
# Overridden method.
author = self.author
created = self.id is None
super(Post, self).save(*args, **kwargs)
self._update_post_writing(created, author)
Not defining an
__str__
method for a modelWhen you define a model, it is important to create a string representation of that model so that it can be displayed accurately. If the
__str__
method is not defined, then it will not be possible to access the model and display it within the user interface correctly.Not using the built-in ModelForm validation
Using the built-in ModelForm validation can help to ensure that the data being entered into the form is valid. This is useful for form validation, preventing invalid data from being entered and ensuring that forms are processed correctly.
Having complex relationships or calculations within your model
Models should not contain too much complexity as this can lead to them becoming difficult to maintain. Keeping the relationships and any calculations within the model to a minimum helps to keep things organized and makes it easier to debug any potential issues.
Not using
get_or_create
orbulk_create
When creating large numbers of objects - When creating large numbers of objects, it is important to use the
get_or_create
orbulk_create
methods rather than creating them one at a time. This can help to make the process faster and more efficient by avoiding unnecessary extra steps.Ignoring the
unique_together
orunique_for_date
fieldsWhen creating model instances - It is important to pay attention to the
unique_together
andunique_for_date
fields when creating model instances, as they are used to ensure that data is unique and consistent. Failing to take these fields into account can lead to duplicate data being written to the database.
We appear to have learned how to mitigate some Django Model Anti Patterns. For now, thank you everyone for having me here. I'll be back with more Django-related content soon. You can also find me on GitHub. Till then keep coding :)