03-Django ORM Optimization

Django Tools

Simple ways to get Queries details
- queryset.explain()
- query_debugger.py

Django Debug Toolbar

python -m pip install django-debug-toolbar
This will display details in browser, like:
- No of sql queries made
- template used
- CPU usage
- Signals, static files

Setup

In settings.py

if DEBUG:
	MIDDLEWARE += [
		'debug_toolbar.middleware.DebugToolbarMiddleware',
	]
	INSTALLED_APPS += [
		'debug_toolbar',
	]
	INTERNAL_IPS = ['127.0.0.1', ]

STATIC_URL = '/static/'

In urls.py
- Add at end of the file

if settings.DEBUG:
	import debug_toolbar
	from django.urls import path

	urlpatterns += [
		path('__debug__/', include(debug_toolbar.urls)),
	]

django-extensions

It adds a bunch of additional useful commands to the manage.py interface… read more
pip install django-extensions
Add to settings.py

INSTALLED_APPS += [
		'django_extensions',
	]

Commands

python manage.py graph_models -a -o myapp_models.png

Generate (and view) a graphviz graph of app models:
- graphviz should be installed
  - sudo apt-get install graphviz graphviz-dev
  - pip install pygraphviz

# Show all url patterns
python manage.py show_urls

# Check templates for rendering errors
python manage.py validate_templates

# The enhanced django shell
python manage.py shell_plus

# The enhanced django runserver
# Need to install -- Werkzeug 
python manage.py runserver_plus

# To view SQL queries that are executed
python manage.py shell_plus --print-sql

Django Query optimization

Video: select and prefetch related
Django Database Optimization Tips
Understanding Django QuerySets Evaluation and Caching
Refer: 04_optimization_eg.py

Query Optimization ways

Retrieve everything at once if you know you will need it
- select_related()
- prefetch_related()
Use ForeignKey id Directly
- print(p1.author_id) # 1 hit – Do
- print(p1.author.id) # 2 hits – Don’t
Don’t retrieve things you don’t need
- count(), exists(), defer()
- only(), values(), values_list()
Use get() Wisely
- Use get when you know there is only one object that matches your query
Don’t order results if you don’t care
- If a model has a default ordering (Meta.ordering) and you don’t need it, remove it on a QuerySet by calling order_by() with no parameters.
- p1 = Posts.objects.all().order_by()
Use bulk methods
- Use bulk methods to reduce the number of SQL statements
- bulk_create(), bulk_update()
- insert in bulk, remove in bulk

select_related

select_related works by creating an SQL join and
including the fields of the related object in the SELECT statement
Works on O2O, M2O(FK)

perfetch_related

prefetch_related made the JOIN using Python rather than in the database.
Works on O2O, M2O(FK), M2M, Rev FK
Mainly used for M2M

count()

If you only want the count

# Do, Executes more efficient SQL to determine count
Posts.objects.all().count()

# Don't, Evaluates the entire queryset
len(Posts.objects.all())

exists()

If you only want to find out if at least one result exists,

qs = Entry.objects.all()  # Don't 
qs = Entry.objects.exists()  # Do
if qs:
	pass

Select some fields only in a queryset

Customer.objects.all()[:2]
# <QuerySet [<Customer: Customer object (1)>, 
# <Customer: Customer object (3)>]>

values()

Returns dictionary QuerySet instead of model instances QuerySet

Customer.objects.all()[:2].values('f_name', 'mobile')
# <QuerySet [{'f_name': 'Aravind', 'mobile': None}, 
# {'f_name': 'nstiseo', 'mobile': '3819239212'}]>

values_list()

Returns tuple QuerySet instead of model instances QuerySet

Customer.objects.all()[:2].values_list('f_name', 'mobile')
# <QuerySet [('Aravind', None), ('nstiseo', '3819239212')]>

only()

Used when we need only certain fields

Product.objects.only('cur_category')
# <QuerySet [<Product: Product object (13)>, ...

Product.objects.only('cur_category')[0].__dict__
# {'id': 13, 'cur_category': 'slipons'}

values() vs only()

Returned Queryset
- values() –> dictionary QuerySet
- only() –> model instances QuerySet
Default fetched field
- only() –> It also fetches the id
Access
- values() –> Only fetched fields can be accessed
- only() –> Any field can be accessed with additional db hit

p1 = Product.objects.all().only('cur_category').first()
print(p1.id)  	# 1 hits
print(p1.name) 	# 2 hits

defer()

Used when we don’t need certain fields
which could contain a lot of data (for example, text fields), or
require expensive processing to convert them to Python objects.

a1 = Adhar.objects.all().defer('adhar_no').first()  
print(a1.signature) 	# 1 hit
print(a1.adhar_no) 	# 2 hits -- hits db again to get adhar_no

bulk_create()

Entry.objects.bulk_create([
	Entry(headline='This is a test'),
	Entry(headline='This is only a test'),
])

Entry.objects.create(headline='This is a test')
Entry.objects.create(headline='This is only a test')

bulk_update()

entries[0].headline = 'This is not a test'
entries[1].headline = 'This is no longer a test'
Entry.objects.bulk_update(entries, ['headline'])

entries[0].headline = 'This is not a test'
entries[0].save()
entries[1].headline = 'This is no longer a test'
entries[1].save()


# Refer: https://gitlab.com/amritpra94/18-django-sample-apps/-/tree/master/01%20dmo
# Usage and Cost of Queries

class Person(models.Model):
    name = models.CharField(max_length=255)  # Mandatory
    bio = models.TextField(max_length=500, null=True)  # Optional (don't put null=True)
    birth_date = models.DateField(null=True) # Optional (here you may add null=True)


# O2O
class Adhar(models.Model):
    person = models.OneToOneField("Person",on_delete=models.CASCADE)
    signature = models.TextField()
    adhar_no = models.TextField(max_length=100)


# M2O(FK)
class Posts(models.Model):
    title = models.CharField(max_length=100)
    content = models.TextField()
    date_posted = models.DateTimeField(default=timezone.now)    
    author = models.ForeignKey(Person, on_delete=models.CASCADE)

class Publication(models.Model):
    title = models.CharField(max_length=30)

## M2M
class Article(models.Model):
    headline = models.CharField('Article Name',max_length=100)
    publications = models.ManyToManyField(Publication, related_name='article_pub',related_query_name='arti')
    is_active = models.BooleanField(dfault=True)
'''
# Better -- M2M
class Article(models.Model):
    headline = models.CharField('Article Name',max_length=100)
    publications = models.ManyToManyField(Publication, through=PubArticle)

class PubArticle(models.Model):
    article = models.ForeignKey(Article, on_delete=models.CASCADE)
    publication = models.ForeignKey(Publication, on_delete=models.CASCADE)
'''


## O2O  -- select_related
a1 = Adhar.objects.first()
print(a1.person_id) 	# 1 hits
print(a1.person) 		# 2 hits
print(a1.person.name) 	# 2 hits

a1 = Adhar.objects.select_related('person').first()
print(a1.person.name) 	# 1 hit


a1 = Adhar.objects.all()
for i in a1:
	print(i.signature) 		# 1 hit
	print(i.person) 		# n+1 hits

a1 = Adhar.objects.select_related('person')
for i in a1:
	print(i.person) 		# 1 hit



## M2O  -- select_related
p1 = Posts.objects.first()
print(p1.author_id) 	# 1 hit
print(p1.author.id) 	# 2 hits  -- Bad -- Access id as above way
print(p1.author.name) 	# 2 hits

p1 = Posts.objects.select_related('author').first()
print(p1.author.name) 	# 1 hit


p1 = Posts.objects.all()
for i in p1:
	print(i.title)			# 1 hit
	print(i.author_id) 		# 1 hit 
	print(i.author) 		# n+1 hits

p1 = Posts.objects.select_related('author').all()
for i in p1:
	print(i.author) 		# 1 hit
	print(i.author.name) 	# 1 hit



## O2O  -- prefetch_related
a1 = Adhar.objects.prefetch_related('person').first()
print(a1.person.name)  # 2 hits

a1 = Adhar.objects.prefetch_related('person')
for i in a1:
    print(i.person)     # 2 hits


## M2O  -- prefetch_related
p1 = Posts.objects.prefetch_related('author').first()
print(p1.author.name)  # 2 hit

p1 = Posts.objects.prefetch_related('author').all()
for i in p1:
    print(i.author)         # 2 hits
    print(i.author.name)    # 2 hits


## M2M  -- prefetch_related
a1 = Article.objects.first()
print(a1.headline) 				# 1 hit
print(a1.publications.all()) 	# 2 hits
print(a1.publications.first().title) 	# 2 hits


a1 = Article.objects.prefetch_related('publications').first()
print(a1.headline) 				# 2 hits  -- Bad
print(a1.publications.all()) 	# 2hits
print(a1.publications.first().title) 	# 2 hits

# 2 hits
a1 = Article.objects.prefetch_related('publications').first()
x1 = a1.publications.all()
for i in x1:
	print(i.title)


# 2 hits
a1 = Article.objects.prefetch_related('publications').all()
for i in a1:
	print(i.publications.all())


# 2 hits
a1 = Article.objects.prefetch_related('publications').all()
for i in a1:
	x1 = i.publications.all()
	for j in x1:
		print(j.title)

Reference

Prefetch

Django Tools

Django Debug Toolbar

Setup

django-extensions

Commands

Django Query optimization

Query Optimization ways

select_related

perfetch_related

count()

exists()

Select some fields only in a queryset

values()

values_list()

only()

values() vs only()

defer()

bulk_create()

bulk_update()

More

Reference