Building a search app with Django and Haystack

GoalΒΆ

The goal of this tutorial is to build a search app using Django and Haystack You will learn how to use Django commands to initialize a database with emoji data. You will also learn how to add search to a Django project using Haystack.

Upon completion, you will have a built an app that allows you to search for over a thousand emojis. This app also gives you the ability to copy any emoji to your clipboard with one click.

Before you startΒΆ

Make sure you meet the following prerequisites before starting the tutorial steps:

This project depends on Pipenv. Pipenv allows you to download and install versions of packages in a virtual environment.

Another prerequisite is Elasticsearch. An Elasticsearch instance needs to run separate from the app.

Installing packagesΒΆ

The app depends on the following packages:

Open up a terminal prompt and create a directory called emoji-in-the-haystack:

mkdir emoji-in-the-haystack
cd emoji-in-the-haystack

Install the packages:

pipenv install django==3.0.7
pipenv install git+https://github.com/django-haystack/django-haystack.git#egg=django-haystack
pipenv install elasticsearch==5.5.3
pipenv install requests==2.24.0

You’ll see a bunch of colorful output and a couple of 🐍 emojis. In this directory, you should now see the files Pipfile and Pipfile.lock.

You’re ready to create a Django project.

Setting up a Django project and appΒΆ

After installing the packages, the next step is to create a Django project.

Activate your virtual environment:

pipenv shell

You should now see your terminal prompt prefixed with (emoji-in-the-haystack).

Create a Django project called emoji_haystack:

django-admin startproject emoji_haystack .

The directory should now look like this:

β”œβ”€β”€ Pipfile
β”œβ”€β”€ Pipfile.lock
β”œβ”€β”€ manage.py
└── emoji_haystack
   β”œβ”€β”€ __init__.py
   β”œβ”€β”€ asgi.py
   β”œβ”€β”€ settings.py
   β”œβ”€β”€ urls.py
   └── wsgi.py

Create a Django app called search:

python manage.py startapp search

The directory should now look like this:

β”œβ”€β”€ Pipfile
β”œβ”€β”€ Pipfile.lock
β”œβ”€β”€ manage.py
β”œβ”€β”€ emoji_haystack
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ asgi.py
β”‚Β Β  β”œβ”€β”€ settings.py
β”‚Β Β  β”œβ”€β”€ urls.py
β”‚Β Β  └── wsgi.py
└── search
   β”œβ”€β”€ __init__.py
   β”œβ”€β”€ admin.py
   β”œβ”€β”€ apps.py
   β”œβ”€β”€ migrations
   β”‚Β Β  └── __init__.py
   β”œβ”€β”€ models.py
   β”œβ”€β”€ tests.py
   └── views.py

You need to enable the newly created app.

Update the INSTALLED_APPS setting in settings.py:

33
34
35
36
37
38
39
40
41
42
INSTALLED_APPS = [
   'django.contrib.admin',
   'django.contrib.auth',
   'django.contrib.contenttypes',
   'django.contrib.sessions',
   'django.contrib.messages',
   'django.contrib.staticfiles',

   'search.apps.SearchConfig',
]

To test that everything is working, run the app:

python manage.py runserver

Navigate to http://127.0.0.1:8000/ and confirm that the app is working.

Note: You can run python manage.py migrate to get rid of the Django warnings when running the app.

Emoji dataΒΆ

The next step is to create a Django model class to represent the emoji data.

Update models.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from django.db import models


class Emoji(models.Model):
    name = models.CharField(
        max_length=50,
    )
    code = models.CharField(
        max_length=50,
    )

You need to store the name for each emoji. For example, β€œgrimacing face” is the name given to 😬. You also need to store the code for an emoji. These code points are unique for every emoji. Django handles rendering emojis in the browser using these codes.

After creating the model, run a migration to apply these changes to the database:

python manage.py makemigrations --name add_emoji_model search
python manage.py migrate

The next step is to create a new directory for the Django command. Django commands are special scripts registered in Django projects.

The command in this app retrieves emoji data and saves it to the database using the Emoji model class. This commands must live in the new directory.

Create the new directory:

cd search
mkdir management
cd management
mkdir commands
cd commands

Inside this commands directory, create the initemojidata command:

touch initemojidata.py

The directory should now look like this:

β”œβ”€β”€ Pipfile
β”œβ”€β”€ Pipfile.lock
β”œβ”€β”€ db.sqlite3
β”œβ”€β”€ emoji_haystack
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ asgi.py
β”‚Β Β  β”œβ”€β”€ settings.py
β”‚Β Β  β”œβ”€β”€ urls.py
β”‚Β Β  └── wsgi.py
β”œβ”€β”€ manage.py
└── search
   β”œβ”€β”€ __init__.py
   β”œβ”€β”€ admin.py
   β”œβ”€β”€ apps.py
   β”œβ”€β”€ management
   β”‚Β Β  └── commands
   β”‚Β Β      └── initemojidata.py
   β”œβ”€β”€ migrations
   β”‚Β Β  β”œβ”€β”€ 0001_add_emoji_model.py
   β”‚Β Β  └── __init__.py
   β”œβ”€β”€ models.py
   β”œβ”€β”€ tests.py
   └── views.py

Here is the code to retrieve and save emoji data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import json
import requests

from django.core.management.base import BaseCommand, CommandError

from search.models import Emoji


EMOJI_JSON_URL = 'https://raw.githubusercontent.com/iamcal/emoji-data/master/emoji.json'


class Command(BaseCommand):
    help = 'Initialize database with emoji data'

    def add_arguments(self, parser):
        parser.add_argument(
            '--dry-run',
            action='store_true',
            default=False)

    def execute(self, *args, **options):
        self.count = 0

        try:
            super().execute(*args, **options)
        except KeyboardInterrupt:
            self.stdout.write('')

        self.stdout.write(self.style.SUCCESS(
            'Emojis created: {}'.format(self.count)))

    def handle(self, *args, **options):
        self.dry_run = options['dry_run']

        emojis = self.get_emojis()

        for emoji in emojis:
            if not emoji.get('name'):
                continue

            code = self.handle_code(emoji)
            name = emoji['name'].lower()
            self.stdout.write(
                '{} - {}'.format(name, code))

            if not self.dry_run:
                emoji = Emoji(
                    name=name,
                    code=code)

                emoji.save()

            self.count += 1

    def get_emojis(self):
        response = requests.get(
            url=EMOJI_JSON_URL)

        emojis = json.loads(response.content)

        return emojis

    def handle_code(self, emoji):
        """
        U+1F1EC, U+1F1FE - > &#x1F1EC&#x1F1FE
        """
        unified = emoji.get('non_qualified') or emoji.get('unified')
        unified = unified.split('-')

        codes = []
        for code in unified:
            _code = '&#x' + code
            codes.append(_code)

        return ''.join(codes)

The syntax for Django commands may take some time getting used to. Django commands require a Command class definition that subclasses BaseCommand. This class requires a handle() method. Your logic goes in here.

I use the execute() method to define some variables to count and output the number of items updated when a command finishes running.

On line 35, the get_emojis() method defined on the class gets called using the self property. The method makes a request to the URL defined on line 9. This endpoint is a JSON file hosted on GitHub.

It may not include the newest emojis but it’s the best option for this app. The Emojipedia API is no longer available for public use. Typically you need to handle errors when making API requests but it’s fine to leave out here.

The command retrieves the emoji data and begins to process each data item on line 37. It ignores data items with no name field. On line 41, the command calls the handle_code(). This method transforms the emoji unicode data into a string that gets stored in the database. The transformation of this unicode data makes it possible to render emojis in HTML. More on this later.

You can run this command with an optional dry_run argument. Providing this argument means you can test your Django command logic without saving anything to the database. If this argument is not passed in when running the command, the command creates an Emoji object with name and code set and saves it to the database.

Django commands are ran from the root of the project.

Run the Django command (--dry-run option):

python manage.py initemojidata --dry-run

Run the Django command (no regrets option):

python manage.py initemojidata

The emoji data is now stored in the database.

Haystack setupΒΆ

Haystack makes it easy to add custom search to Django apps. You write your search code once and can go back and forth between search backends as you please. You can choose to use different search backends like Elasticsearch, Solr, and others. This tutorial uses Elasticsearch.

Integrating Haystack consists of creating a search index model and updating a couple of Django settings.

The search index model corresponds to the database model defined earlier. Haystack requires this file to know what data to place in the search index.

Inside the search app directory, create a search_indexes.py file:

cd search
touch search_indexes.py

Here’s what the code for that looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import datetime

from haystack import indexes
from search.models import Emoji


class EmojiIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        return Emoji

When you make search a query, Haystack searches the text field. This field corresponds to the name field defined in the Emoji model.

Next, include the urls provided by Haystack in urls.py. Django implicitly calls a custom Haystack view that handles search requests and returning responses. This response uses an HTML template that you need to create and configure. More on this later.

16
17
18
19
20
21
22
from django.contrib import admin
from django.urls import include, path

urlpatterns = [
    path('admin/', admin.site.urls),
    path('search/', include('haystack.urls')),
]

You need to enable the Haystack app.

Update the INSTALLED_APPS setting in settings.py:

33
34
35
36
37
38
39
40
41
42
43
44
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'search.apps.SearchConfig',

    'haystack',
]

Add a connection to Elasticsearch in settings.py:

127
128
129
130
131
132
133
134
135
136
# Haystack configuration
# https://haystacksearch.org

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch5_backend.Elasticsearch5SearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

Haystack setup continuedΒΆ

The following steps are cumbersome but they are essential in getting Haystack to work.

In settings.py, update the TEMPLATES setting:

58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [os.path.join(BASE_DIR, 'templates')],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

From the root of the project, create a templates directory:

mkdir templates
cd templates

Creating a single project-level templates directory is a recognized Django pattern.

In the templates directory, create a search directory and a file called search.html:

mkdir search
cd search
touch search.html

In the search directory, create an indexes directory:

mkdir indexes
cd indexes

In the indexes directory, create a search directory and a file called emoji_text.txt:

mkdir search
cd search
touch emoji_text.txt

Here’s what emoji_text.txt should look like:

{{ object.name }}

Haystack uses this data template to build the document used by the search engine.

The final directory structure should look like this:

β”œβ”€β”€ Pipfile
β”œβ”€β”€ Pipfile.lock
β”œβ”€β”€ db.sqlite3
β”œβ”€β”€ emoji_haystack
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ asgi.py
β”‚Β Β  β”œβ”€β”€ settings.py
β”‚Β Β  β”œβ”€β”€ urls.py
β”‚Β Β  └── wsgi.py
β”œβ”€β”€ manage.py
β”œβ”€β”€ search
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”œβ”€β”€ admin.py
β”‚Β Β  β”œβ”€β”€ apps.py
β”‚Β Β  β”œβ”€β”€ management
β”‚Β Β  β”‚Β Β  └── commands
β”‚Β Β  β”‚Β Β      └── initemojidata.py
β”‚Β Β  β”œβ”€β”€ migrations
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0001_add_emoji_model.py
β”‚Β Β  β”‚Β Β  └── __init__.py
β”‚Β Β  β”œβ”€β”€ models.py
β”‚Β Β  β”œβ”€β”€ search_indexes.py
β”‚Β Β  β”œβ”€β”€ tests.py
β”‚Β Β  └── views.py
└── templates
   └── search
      β”œβ”€β”€ indexes
      β”‚Β Β  └── search
      β”‚Β Β      └── emoji_text.txt
      └── search.html

Search templateΒΆ

Now it’s time to update search.html. This template contains a text field to type in a search query, a button that fires a search request and some template variables. Use the template example found here.

Note: Remove {% extends 'base.html' %} at the top of the file.

The main differences in the template for this tutorial are the following two lines:

18
19
20
21
{% for result in page.object_list %}
   <p>{{ result.object.code|safe }}</p>
   <p>{{ result.object.name }}</p>
{% empty %}

object_list is a list of search results. For each search result, display the emoji and its name. result.object provides direct access to the Emoji model and its database fields.

Displaying the emoji requires using the safe Django filter. It does not require further HTML escaping.

Running ElasticsearchΒΆ

Navigate to the location of your Elasticsearch installation and start an instance. For example, say you downloaded Elasticsearch in your Downloads folder:

cd Downloads
cd elasticsearch-5.5.3
cd bin
elasticsearch

Haystack ships with a set of Django commands that handle indexing the emoji data stored in the database. This tutorial uses the rebuild_index command. This command rebuilds the search index by first clearing it and then updating it. Have a look at the source code for more info.

From the root of the project, run the command:

python manage.py rebuild_index

Run the app:

python manage.py runserver

Navigate to http://127.0.0.1:8000/search and confirm that the app is working.

If you query for β€œcat,” you get back a list of results. If you query for β€œflag,” you get back results for flag emojis.

If you scroll to the bottom, you’ll see a Previous button and Next button. Haystack returns at most 20 results per page. This out of the box feature is awesome. The layout needs a little bit of work though.

Bootstrap + clipboard.jsΒΆ

You can use Bootstrap to clean up the design. Another feature is to copy an emoji to your clipboard by clicking on it - clipboard.js can help here.

Load Bootstrap and clipboard.js from CDN in search.html:

1
2
3
4
5
6
7
8
9
<script src="https://cdn.jsdelivr.net/npm/clipboard@2/dist/clipboard.min.js"></script>

<!-- Bootstrap CSS -->
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css"
integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO"
crossorigin="anonymous">

{% block content %}

A couple of Bootstrap <div> elements and some styling updates go a long way in improving the look of the app.

Including the data-clipboard-text attribute on the emoji button lets you copy emojis to your clipboard:

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
        {% if query %}
            <h3>Results</h3>

            <div class="container">
            <div class="row">
            {% for result in page.object_list %}
                <div class="col-sm">
                    <button type="button" class="btn" data-clipboard-text="{{ result.object.code|safe }}" style="font-size:90px;">{{ result.object.code|safe }}</button>
                    <p style="text-align: center">{{ result.object.name }}</p>
                </div>
            {% empty %}
                <p>No results found.</p>
            {% endfor %}
            </div>
            </div>

The last thing to do is to initialize clipboard.js in search.html:

50
51
52
53
54
55
56
57
58
59
60
61
62
{% endblock %}

<script>
    var clipboard = new ClipboardJS('.btn');

    clipboard.on('success', function(e) {
        console.log(e);
    });

    clipboard.on('error', function(e) {
        console.log(e);
    });
</script>

Run the app with these new changes:

python manage.py runserver

Navigate to http://127.0.0.1:8000/search and confirm the changes. This looks much better. The emojis are more prominent and the click-to-copy feature is the πŸ’ on top.

What you’ve learnedΒΆ

Rejoice and show your friends how to find the emoji in the haystack. If you’re up for the challenge, see if you can make the following app improvements:

  • Load a subset of emojis on the homepage before a user searches

  • Add a navigation bar to filter by emoji category

  • Support for newer emojis

Webhook signatures for fun and profit

Webhooks - less painful than playing hooky by skipping work.

Image source: Encyclopedia SpongeBobia

What's a webhook?

Application programming interfaces (API) consist of client requests and server responses. Webhooks are the reverse of APIs! A third-party service (e.g. server) will send data to one or more configured listeners (e.g. clients). You can set up a listener to consume webhook events by following these steps:
  1. create a new URL in your web application to listen for events (e.g. mycoolapp.com/webhooks)
  2. create a secret token with your third-party service (e.g. GitHub repository settings)
  3. give your application access to this secret token (e.g. environment variables)
  4. deploy the application to listen for requests
  5. verify the webhook signature found in each request
  6. if the signature passes this verification step, process the event data
  7. if it doesn't pass, raise an error
Webhooks allow us to get information in real-time. Let's say we want to find out if a task has finished. Instead of polling an API and asking for the state of a task, webhooks automatically notify us when a task is done. All we have to do is verify the webhook signature.

Companies like Stripe and Twilio provide developers with software development kits (SDKs). These SDKs typically verify signatures for you. If not, have no fear! We can manually verify these signatures using Python.

Note: the terms "third-party" and "authorized users" will be used interchangeably from here on out.

Trust...

Let's assume our application was partly compromised. Our webhook URL is now public and out in the open. How do we differentiate authorized users from bad actors? Our application and the third-party service need some way to authenticate messages. One way to achieve this is to use a hash-based message authentication code (HMAC).

First, an authorized user sends a signature with every request to our application. Next, our application computes the expected signature by combining HMAC with our secret token. It compares both signatures and allows requests from this user if the signatures match. Bad actors would have a hard time trying to fool us without this secret token.

Now that we've covered secret tokens, let's take a look at the code to manually verify signatures.

...but verify


We define a request "object" on line 20. We use this object to represent a request that would normally be sent by an authorized user. This request has a signature, which is a bytes string. Let's assume the signature in the request is valid. The goal of our application is to calculate this signature using HMAC and our secret token.

The shared secret is hardcoded on line 7 for demonstration purposes. Remember, the secret should be stored as an environment variable on your server!

Next, we use the hmac and the hashlib Python modules to create a hashing object on line 9.

The method signature for the new() method is: hmac.new(key, msg=None, digestmod=''):
  • key is set to the secret token encoded in bytes
  • msg is set to the request body encoded in bytes
  • digestmod is set to the SHA-1 hashing algorithm

We get the expected signature on line 14 by encoding the digest of our hashing object using Base64. You might be able to skip this step. You should confirm if the data you receive is encoded using Base64.

On line 16 we compare the signature found in the request with the signature we expect. You typically use the == operator when comparing values in Python. Do not do this here! Heed the following warning found in the Python documentation:
Warning: When comparing the output of digest() to an externally-supplied digest during a verification routine, it is recommended to use the compare_digest() function instead of the == operator to reduce the vulnerability to timing attacks.
On line 25 we combine all of this together and verify the request. We display a thumbs up emoji for authorized users and a red light emoji for bad actors!

Wrapping up

I took a cybersecurity course my last semester in college. I'd be lying if I told you I enjoyed writing C code and setting up Ubuntu virtual machines on my Windows laptop. That being said, it's awesome seeing the theories I learned in school put to practice.

Check out these links with more information on HMAC and webhook security:

My initial thoughts on Posthaven

I've been a Posthaven user for less than a week. Here's what I've gathered about the platform:

  • Having limited themes is a good thing - I can focus more on creating content and not spend 10 hours choosing a theme;
  • The editor is rough around the edges - I wish there was support for Markdown and editing links after inserting them is broken;
  • SEO support is lacking - I'm not worried about ranking on Google but I would like the links I share on Slack to look nice;
  • Clicking "Save as Draft" is fun;
It's good enough for me. I can afford the $5 a month and the fee is a good forcing function to get me to write.

It also looks like Posthaven is still being maintained. Their Twitter account is active and one can request features. If you're reading this, you should go vote!

When Google and Stack Overflow don't pick up

Larry David knows all about phone etiquette.

Image source: NBC News

Pick up the phone, baby

I recently worked on improving some phone number validation logic at Winnie. We validate a batch of phone numbers and send them off to a third-party service. Some of the numbers we were sending were deemed invalid by the service. This was preventing us from automating some data updates we wanted to run daily. How hard could validating digits be?

Some validation boxes we already checked off included:

  • regex pattern matching to only return digits (e.g. removing non digit characters from 281-330-8004)
  • checking if the value is equal to 10 characters (e.g. 2813308004 has no country code)
  • checking if the value is equal to 11 characters (e.g. 12813308004 has a country code)

An edge case we were not considering were 800 numbers! A code change went out to ignore these type of phone numbers. The next day we were able to send a new batch of phone numbers to the third-party with no issues. Problem solved? Not quite.

Man with a plan

We were still sending them invalid phone numbers. Perfectly-looking phone numbers were being deemed invalid by them. For example, 234-911-5678 is an invalid phone number. How? There are no non-digit characters and it looks like a valid phone number!

It turns out there is something called the North American Numbering Plan. Under the modern plan, a U.S. phone number must adhere to the NPA-NXX-xxxx format. Each component in this format must follow certain rules. The valid range for the first digit in the NPA component is 2-9. The valid range for each digit in the xxxx component is 0-9. 123-234-5678 is invalid because the first digit is a 1. In the example above, 234-911-5678 was invalid because it violated the following rule: the second and third digit in the NXX component can't both be 1.

I was determined to avoid translating these rules to brittle Python code. I knew there had to be a solution we could leverage instead of reinventing the wheel.

1-800-GOOGLE-IT

What does a software engineer do when they're stuck? Turn to Google. Here are some search queries I tried:

  • "npa nxx validator"
  • "npa nxx github"
  • "npa nxx python"
No luck. The Stack Overflow results I got were not what I was looking for. Where was the accepted answer I yearned for? Finally, I Googled "django phone look up". One of the first results was a GitHub link for django-phonenumber-field. I started searching for more of the same terms in this repository: "nxx", "valid", "is_valid". On a side note, the search experience on GitHub has improved tremendously.

I finally found a promising method in the source code:

def is_valid(self):
Β Β Β  """
Β Β Β  checks whether the number supplied is actually valid
Β Β Β  """
Β Β Β  return phonenumbers.is_valid_number(self)

I searched for is_valid_number to get the method definition but got nothing. I realized that phonenumbers was an external package that the project relied on. I immediately Googled the package, skimmed the README and tested it with our invalid phone numbers. It worked! I was confident that this package was enough for our needs and soon it found a home in our requirements.txt file.

I went back and looked at django-phonenumber-field README and saw the following:
The answer to my problem was right there! All I had to do was read the freaking docs.

Can you hear me now?

Would I have saved 5 minutes by skipping straight to the README instead of browsing the source code? Sure. But being able to read code, especially code that you didn't write, is a useful skill. Plus, GitHub has made it even easier to navigate code on their platform. Can you tell I'm a fan of GitHub?

Googling is a skill. Reading source code is a skill. Reading documentation is a skill. Combine these skills with communicating effectively and what do you get? Probably something better than a 10x engineer.