Model Anti-patterns in Django

Model Anti-patterns in Django

Table of contents

No heading

No headings in the article.

Hello pals,
While working with Django, we all write code that does the job, but some code may be performing excessive computations or operations that we are unaware of. These operations may be ineffective and/or counterproductive in practice.

Here, I am going to mention some anti-patterns in Django models.

  1. Using len(queryset) instead of queryset.count()

    • The queryset in Django are lazily evaluated which means that records in database aren’t read from database until we interact with the data.

    • len(queryset) performs the count of database records by Python interpreter in application level. For doing so, all the records should be fetched from the database at first, which is computationally heavy operation.
      Whereas, queryset.count() calculates the count at the database level and just returns the count.

For a model Post :

    from django.db import models

    class Post(models.Model):
        author = models.CharField(max_length=100)
        title = models.CharField(max_length=200)
        content = models.TextField()

If we use len(queryset), it handles the calculation like SELECT * FROM post which returns a list of records (queryset) and then python interpreter calculates the length of queryset which is similar to list data structure. Imagine the waste in downloading many records only to check the length and throw them away at the end! But, if we need the records after reading the length, then len(queryset) can be valid.

If we use queryset.count(), it handles the calculation like SELECT COUNT(*) FROM post at database level. It makes the code execution quicker and improves database performance.

  1. Using queryset.count() instead of queryset.exists()

    • While we kept praising the use of queryset.count() to check the length of a queryset, using it may be performance heavy if we want to check the existence of the queryset.
      For the same model Post, when we want to check if there are any post written by author Arjun, we may do something like:
    posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')

    if posts_by_arjun.count() > 0:
        print('Arjun writes posts here.')
    else:
        print('Arjun doesnt write posts here.)

The posts_by_arjun.count() performs an SQL operation that scans every row in a database table. But, if we are just interested in If Arjun writes posts here or not ? then, more efficient code will be:

    posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')

    if posts_by_arjun.exists():
        print('Arjun writes posts here.')
    else:
        print('Arjun doesnt write posts here.)

posts_by_arjun.exists() returns a bool expression that finds out if at least one result exists or not. It simply reads a single record in the most optimized way (removing ordering, clearing any user-defined select_related() or distinct() methods.)

Also, checking existence / truthiness of queryset like this is inefficient.

    posts_by_arjun: Queryset = Post.objects.filter(author__iexact='Arjun')

    if posts_by_arjun:
        print('Arjun writes posts here.')
    else:
        print('Arjun doesnt write posts here.)

This does the fine job in checking if there are any posts by Arjun or not but is computationally heavy for larger no of records. Hence, use of queryset.exists() is encouraged for checking existence / truthiness of querysets.

  1. Using signals excessively

    • Django signals are great for triggering jobs based on events. But it has some valid cases, and they shouldn’t be used excessively. Think of any alternative for signals within your codebase, brainstorm on its substitution and try to place signals logic in your models itself, if possible.

    • They are not executed asynchronously. There is no background thread or worker to execute them. If you want some background worker to do your job for you, try using celery.

    • As signals are spread over separate files if you’re working on a larger project, they can be harder to trace for someone who is a fresh joiner to the company and that’s not great. Although, django-debug-toolbar does some help in tracing the triggered signals.

Let’s create a scenario where we want to keep the record of Post writings in a separate model PostWritings.

    class PostWritings(models.Model):
        author = models.CharField(max_length=100, unique=True)
        posts_written = models.PositiveIntegerField(default=0)

If we want to automatically update the PostWritings record for a use based on records created on Post model, there are ways to achieve the task with / without signals.

A. With Signals

    from django.db.models.signals import post_save
    from django.db.models import F
    from django.dispatch import receiver
    from .models import Post

    @receiver(sender=Post, post_save)
    def post_writing_handler(sender, instance, created, **kwargs):
    if created:
        writing, created = PostWritings.objects.get_or_create(author=instance.author)
        writing.update(posts_written=F('posts_written') + 1)

B. Without Signals

We need to override the save() method for Post model.

    from django.db import models
    from django.db.models import F

    class Post(models.Model):
        author = models.CharField(max_length=100)
        title = models.CharField(max_length=200)
        content = models.TextField()

        def save(self, *args, **kwargs:
            # Overridden method.
            author = self.author
            if self.id:
                writing, created = PostWritings.objects.get_or_create(author=author)
                writing.update(posts_written=F('posts_written') + 1)
            super(Post, self).save(*args, **kwargs)

As the same job can be accomplished without signals, the code can be easily traced and prevent unnecessary event triggers.
If someone feels about not having readability on save() method here, breaking up code is always great. Let’s do that.

    from django.db import models
    from django.db.models import F

    class Post(models.Model):
        author = models.CharField(max_length=100)
        title = models.CharField(max_length=200)
        content = models.TextField()

        def _update_post_writing(self, created=False, author=None):
            if author is not None and craeted:
                writing, created = PostWritings.objects.get_or_create(author=author)
                writing.update(posts_written=F('posts_written') + 1)

        def save(self, *args, **kwargs:
            # Overridden method.
            author = self.author
            created = self.id is None
            super(Post, self).save(*args, **kwargs)
            self._update_post_writing(created, author)
  1. Not defining an __str__ method for a model

    When you define a model, it is important to create a string representation of that model so that it can be displayed accurately. If the __str__ method is not defined, then it will not be possible to access the model and display it within the user interface correctly.

  2. Not using the built-in ModelForm validation

    Using the built-in ModelForm validation can help to ensure that the data being entered into the form is valid. This is useful for form validation, preventing invalid data from being entered and ensuring that forms are processed correctly.

  3. Having complex relationships or calculations within your model

    Models should not contain too much complexity as this can lead to them becoming difficult to maintain. Keeping the relationships and any calculations within the model to a minimum helps to keep things organized and makes it easier to debug any potential issues.

  4. Not using get_or_create or bulk_create

    When creating large numbers of objects - When creating large numbers of objects, it is important to use the get_or_create or bulk_create methods rather than creating them one at a time. This can help to make the process faster and more efficient by avoiding unnecessary extra steps.

  5. Ignoring the unique_together or unique_for_date fields

    When creating model instances - It is important to pay attention to the unique_together and unique_for_date fields when creating model instances, as they are used to ensure that data is unique and consistent. Failing to take these fields into account can lead to duplicate data being written to the database.

We appear to have learned how to mitigate some Django Model Anti Patterns. For now, thank you everyone for having me here. I'll be back with more Django-related content soon. You can also find me on GitHub. Till then keep coding :)