20Jan

Using django_uni_form and CSRF

Posted by Elf Sternberg as django, programming

The last update I have to django_uni_form, a rather amazing form layout handler for Django, does not understand what to do with the CSRF token. So in case you’re wondering, after you’ve created your form, you do the following in your view:

        form = FilmForm(initial = {'title': 'History is ...'})
        form.helper.add_input(Hidden(name = 'csrfmiddlewaretoken',
                                     value = request.META['CSRF_COOKIE']))

It’s unfortunate that you have to do it in the view, but it’s the only way to have access to the request.META object, which is where the current CSRF token is stored.

One of those problems routinely experienced by Django developers is “The Admin page of ten thousand users.” There are plenty of objects in our model that are owned by a specific user, and when we want to create or edit such a list, we’re frequently confronted with an unsorted, difficult-to-search <select> box with a randomized list of users. In this day, we use Chrome (or Firefox 4) on a machine with reasonable memory, so we have no need (yet) to segment that select box into sub-selects, but even with a well-provisioned desktop finding that one user out of a list of thousands can be a chore.

As is often the case with Django, we can work backwards from a generalized display to a more specific display. Here’s the source code I’m going to explain:

from django import forms
from django.contrib import admin
from django.contrib.admin import widgets

from models import *

class UserModelChoiceField(forms.ModelChoiceField):
    def label_from_instance(self, obj):
        return "%s, %s" % (obj.last_name, obj.first_name)

class SubscriptionAdminForm(forms.ModelForm):
    user = UserModelChoiceField(
        queryset = User.objects.order_by('last_name', 'first_name'))
    book = forms.ModelChoiceField(
        queryset = Book.objects.order_by('title'))
    purchase_date = forms.SplitDateTimeField(
        widget = widgets.AdminSplitDateTime)
    class Meta:
        model = Subscription

class SubscriptionAdmin(admin.ModelAdmin):
    search_fields = ['user__last_name', 'user__first_name', 'book__title']
    form = SubscriptionAdminForm

admin.site.register(Subscription, SubscriptionAdmin)

Starting at the bottom, I register the Subscription object with a ModelAdmin to the admin system. I override the standard ModelAdmin with two options. The first specifies the order in which objects on the Subscription list page will be searched when the user searches for anything. This helps find a specific subscription. Once you’ve found the subscription you want to manage, you go to the Subscription detail page.

It’s rare (but possible) that the administrator wants to change which user owns a specific subscription, but it’s more common that an administrator may want to comp a subscription to a user. First you have to find that user. To do that, you can let the Admin create its own generic form, but that form will be populated with users ordered by the default order specified in Auth (and there is no default order, so the order is deterministic, but useless to you and me).

Instead, I create a SubscriptionAdminForm, and tell the SubscriptionAdmin to use it instead. In that form, I first specify that this form is a ModelForm for model Subscription. “book” is made into a ModelChoiceField, but I manually specify an order for the queryset. “purchase_date” uses the SplitDateTimeField, but I use the Django admin’s AdminSplitDateTime widget, to maintain visual consistency with the rest of the admin. For the User, I could start with a standard ModelChoiceField, but that would make the labels come out “first_name last_name,” which is not search-ready via standard key searches. You’re likely to have a lot of “Roberts” scattered throughout the system, but only a few “Smiths,” and it would be nice if they were all near each other.

So I override the ModelChoiceField and change the label_from_instance() method to return something other that the standard representation; in this case, I want the format “last name, first name.”

And there you go: Books ordered by title, users ordered by Last Name, First Name (and displayed that way), and the whole thing displayed in a Django Admin friendly way.

Little bits of pieces of this stuff can be found scattered throughout the Django Book, the Django docs, and the source code, but this blog entry provides a concise summary of the basic features you might want in a customized Admin form. Consider this as a template for improving the experience for the poor schlubs who have to use your Admin page every day.

I wish I’d known this a long time ago.  Django’s request object includes a dictionary of key/value pairs passed into the request via POST or GET methods.  That dictionary, however, works in a counter-intuitive fashion.  If a URL reads http://foo.com?a=boo, then the expected content of request.GET['a'] would be 'boo', right?  And most of us who’ve used other URL parsers in the past know that http://foo.com?a=boo&a=hoo know that the expected content of request.GET['a'] would be ['boo', 'hoo'].

Except it isn’t.  It’s just 'hoo'.  Digging into the source code, I learn that in Django’s MultiValueDict, __getitem__(self, key) has been redefined to return the last item of the list.  I have no idea why.  Maybe they wanted to ensure that a scalar was always returned.  The way to get the whole list (necessary when doing an ‘in’ request) is to call request.GET.getlist('a').

Lesson learned, an hour wasted.

It drives me nuts that we in the Django community rely on Solr or Haystack to provide us with full-text search when MySQL provides a perfectly functional full-text search feature, at least at the table level and for modest projects. I understand that not every app runs on MySQL, but mine do, and I’m sure many of you are running exactly that, and could use this technique without modification.

Well, after much digging, I found an article on MercuryTide’s website covering custom QuerySets with FULLTEXT and relevance, and built this library around it.

I used this rather than Django’s internal filter keyword search, because this technique adds an additional aggregated value, the relevance of the search terms to the search. This is useful in sorting the search, something not automatically provided by the QuerySet.filter() mechanism.

You must create the indexes against which the search will be conducted. For performance reasons, if you’re importing a massive collection of data, it’s better to import all of the data and then create the index. More importantly, when you declare that a SearchManager to be used by a Model, you declare it thusly:

class Book(models.Model):
    ...
    objects = SearchManager()

When you do, you must add an index that corresponds to that list of fields:

CREATE FULLTEXT INDEX book_text_index ON books_book (title, summary)

Notice how the contents of the index correspond with the contents of the Search Manager.  Or you can automate the process with South:

    def forwards(self, orm):
        db.execute('CREATE FULLTEXT INDEX book_text_index ON books_book (title, summary)')

    def backwards(self, orm):
        db.execute('DROP INDEX book_text_index on books_book')

To use the library is fairly trivial. If there is only one index (which can encompass several columns) for any table, you call

books = Book.objects.search('The Metamorphosis').order_by('-relevance')

If there’s more than one index, you specify the index by the list of fields:

books = Book.objects.search('The Metamorphosis', ('title', 'summary')).order_by('-relevance')

Note that that’s a tuple, and must be.

If you specify fields that are not part of a FULLTEXT index, the error message will include lists of viable indices.   It will also tell you if there are no indices.  (Getting that to work was tricky, as it involved database introspection and the decoration of methods, so I’m especially proud of it.)

The library is fully available on my github account: django_mysqlfulltextsearch

13Oct

Django-Zipdistance

Posted by Elf Sternberg as Design, programming, python

Recently, I had the pleasure of attending another of those Seattle Django meet-ups.  This one was a potpourri event, just people talking about what they knew and how they knew it.   I revealed that I’d written my first aggregator, and that seemed to be an impressive statement.  Apparently Django Aggregators (database conditionals that perform sub-selected summarizing or filtering events) is something of a black art, much like WordPress Treewalkers were a black art I figured out in just a few hours.

Aggregators consist of two parts: The Definition and the Implementation.  Unfortunately, Django’s idea is that these are two different objects, bound together not by inheritance but by aggregation (both the definition and the implementation are assembled in a generic context, one providing access to the ORM and the other to the SQL).   The definition is used by the ORM to track the existence and name of the aggregate, and is then used to invoke the implemenation, which in turn creates the raw SQL that will be added to the SQL string ultimately sent to the server, and ultimately parsed by the ORM.

I needed to use aggregation because I wanted to say, “For any two points’ latitude and longitude, give me the great circle distance between them,” and then say, “For a point (X, Y) on a map, give me every other place in the database within n miles great circle distance.”

The latter was not possible with Django’s Queryset.extra() feature.    You can add a WHERE clause, but not a HAVING clause, and this definitely requires a HAVING clause when running on MySQL.  Using an Aggregator with a limit forces the ORM to realize it needs a HAVING clause.  Besides, it was a good excuse to learn the basics of Aggregation. Ultimately, I was able to do what the task required: find the distance between any two US Zip Code regions without making third-party requests.

I make absolutely no promises that this code is useful to anyone else.  The Aggregator is definitely not pretty: it’s virtually a raw SQL injector.  But it was fun.  Enjoy: Django-ZipDistance.

07Oct

A significant Django tool: storages

Posted by Elf Sternberg as Uncategorized

If you work with Django for any period of time, the day comes when you’ll be accepting outside data from your users: files, images, and the like.  Django provides only two places where you can store these items: in memory, or to a file.  Django storages provides for two more critical locations: BLOBs in your SQL database (which you may want to do– never know), and most importantly, Amazon AWS S3.  For my current work with a film library and catalog, having S3 be the storage solution has always been a bit of a kludge: producers would upload films to the server, and we’ve eventually get it into S3.  Now, with AWS S3, Django Storages backed with the incredible BOTO library makes the entire process unbelievably easy.

Storages also supports storing content in the database, CouchDB, FTP, or anything else you can imagine. And the source code makes for excellent examples.

Also, if you use django-storages, consider looking at many of the branches on bitbucket, because there are variants of it for S3 that disable the HTTPS default for Cloudfront, which was important for us at Indieflix. Not everything coming out of Cloudfront has to run through SSL.

Hot on the heels of my last entry, the next utility is needed to extend the event object to automatically produce a URL compatible with Google Calendar’s “create an event” handler. Now, I could extend the Event application myself and add a get_google_url() method to the model, but let’s try to do this The Django Way. All I want is a URL, and I need it in specific templates. That says to me: template tag. Even better, template tag filter.

This example is problematic in that we only needed dates, not times. This does need work to extend it to handle times.

Sticking with my convention of naming extensions to existing apps project_app, the filename becomes:

<<project_events_tags.py>>=
from django import template
from django.contrib.sites.models import Site
from django.utils.http import urlquote_plus

register = template.Library()

@register.filter
def google_calendarize(event):
    st = event.start
    en = event.end and event.end or event.start
    tfmt = '%Y%m%dT000000'

    dates = '%s%s%s' % (st.strftime(tfmt), '%2F', en.strftime(tfmt))
    name = urlquote_plus(event.name)

    s = ('http://www.google.com/calendar/event?action=TEMPLATE&' +
         'text=' + name + '&' +
         'dates=' + dates + '&' +
         'sprop=website:' + urlquote_plus(Site.objects.get_current().domain))

    if event.location:
        s = s + '&location=' + urlquote_plus(event.location)

    return s + '&trp=false'

google_calendarize.safe = True

And this is invoked via:

<<events.html>>=
{% load project_events_tags %}
...
<a href="{{ event|google_calendarize }}">+ Add to Google Calendar</a>

It couldn’t be easier.

The Django Event Calendar is a fairly old and well-respected piece of code, useful for a variety of social networking and announcement-oriented applications.  It’s not the be-all of events calendars, but it does what it has to well enough.  I’ve used it on several projects.

The ${BOSS} asked me to use it for announcing upcoming movie showings, and she wanted me to add the capability to automatically export to Outlook or Google Calendar.   There is a python module for generating iCalendar files, and uniting the two is relatively straightforward. The only trick to making the iCalendar component work with Django is that both have a model named “Event”. Getting around this is a little awkward, but it can be managed with the magic of the django’s get_model() function.

This view handler uses the Site application for its event calendar information. I’ve named the file ics_views.py‘ and put it in the application folder projectname_events a django application where my application’s extensions to the events calendar are kept. Since I already have a views file there which imports events.Event, this was my way of keeping the two Event models apart. I’m sure my readers could come up with a more clever UUID generator than the one I’ve supplied below, but it is consistent and correct:

<ics_views.py>=
from datetime import datetime
from icalendar import Calendar, Event
from django.db.models import get_model
from django.http import HttpResponse
from django.contrib.sites.models import Site

def export(request, event_id):
    event = get_model('events', 'event').objects.get(id = event_id)

    cal = Calendar()
    site = Site.objects.get_current()

    cal.add('prodid', '-//%s Events Calendar//%s//' % (site.name, site.domain))
    cal.add('version', '2.0')

    site_token = site.domain.split('.')
    site_token.reverse()
    site_token = '.'.join(site_token)

    ical_event = Event()
    ical_event.add('summary', event.description)
    ical_event.add('dtstart', event.start)
    ical_event.add('dtend', event.end and event.end or event.start)
    ical_event.add('dtstamp', event.end and event.end or event.start)
    ical_event['uid'] = '%d.event.events.%s' % (event.id, site_token)
    cal.add_component(ical_event)

    response = HttpResponse(cal.as_string(), mimetype="text/calendar")
    response['Content-Disposition'] = 'attachment; filename=%s.ics' % event.slug
    return response

Now modify the project’s urls.py to include this object:

<urls.py>=
urlpatterns += patterns('',
    url(r'^events/(?P<event_id>\d+)/export/', 'app_events.ics_views.export', name="event_ics_export"),
)

And this can be invoked quite simply:

<events.html>=
...
<a href="{% url event_ics_export event.id %}" >Export Event</a></p>
...

And that’s all there is to it.  Now, when you click on the link above, you’ll be prompted to download an ICS file that, if your operating system appreciates such things, will automagically try and add it to your calendar.

I searched for “django gearman” on Google and Bing, and found precious little.  There isn’t much out there, so I’ve decided to put together my own example, using Gearman as a queue manager.

If you don’t know what Gearman is, it’s a “generic application framework to farm out work to other machines or processes that are better suited to do the work.”  You have Gearman clients and workers: clients dispatch jobs to a configurable table of gearman servers, which in turn dispatch the job to any idle worker processes.  Clients and workers can be on separate machines; it is the collection of gearman servers that makes decisions about which worker to accept a given task.  This can be very useful when you have a long-running process, such as audio or video processing (or, in my case, the automatic LaTeX-ification of documentation).

As a (utterly trivial and inadequate) example, I’m going to show you how to communicate between a client triggered with Django, a worker that does some work, and then how the message gets back to Django that the process has been run.  This example is trivial because it assumes both processes are on the same machine; it is inadequate because it uses no sychronization to ensure that the message passed back to the Django process isn’t accidentally destroyed by a race condition.

Let’s start with the basics:

virtualenv --no-site-packages geardemo
cd geardemo
source bin/activate
pip install gearman django
django-admin startproject geardemo
cd geardemo
./manage.py startapp testapp
mkdir workers

Here, I’ve created a virtual environment in which we’re going to run our example, installed gearman and django, started a django project, and in that project created our app and a directory for the workers.

What I want to do is create a view in my testapp that dispatches jobs to Gearman. The view will take a single argument from its web page and pass that to Gearman. I also want to pass my session key, because I’ll be using the session object to receive notice that the process is done.  In testapp/views.py:

from django.shortcuts import render_to_response
from django.template import RequestContext
from gearman import GearmanClient, Task

from django.conf import settings

try:
    import cPickle as pickle
except ImportError:
    import pickle

def run(request):
    if request.method == "POST":
        jobname = request.POST.get('name')

        if jobname.strip():
            client = GearmanClient(["127.0.0.1"])
            req = request.COOKIES.get(settings.SESSION_COOKIE_NAME, None)
            arg = (jobname, req)
            pik = pickle.dumps(arg, pickle.HIGHEST_PROTOCOL)
            res = client.dispatch_background_task("work", pik)

        status = request.session.get('worker.status', [])
        return render_to_response('view.html', { 'status': status },
            context_instance=RequestContext(request))

The tricky part, for me at least, was figuring out how to pass multiple objects to the worker process. They have to be pickled; the gearmand servers are written in C and will blindly pass any stream of bytes from the clients to the workers, so to pass multiple objects, they must be serialized into a string using pickle.dumps before handing them to gearman.

The worker process has its own mysteries.  In the client, I get the status from the session object, which means that I have to populate the session object within the worker.  This is inadequate because there’s no synchronization process at work; for example, the django process could read the status object, the worker process could read then write, then the django process could write, and erase the worker process’s changes.  To do this, I pass the session key to the worker in the client request above.  This is likewise inadequate because there’s no guarantee the worker and the client are on the same system– but this will work if you’re using memcached as your session store.  In fact, I recommend using memcached and a smarter synchronization process, one perhaps with atomic increments, independent of sessions, for this kind of communication.

Here’s the file workers/worker.py:

from django.core.management import setup_environ
from django.utils.importlib import import_module
from gearman import GearmanWorker
import time
import sys
import os.path

try:
    import cPickle as pickle
except ImportError:
    import pickle

sys.path.insert(0, os.path.realpath(os.path.join(os.path.dirname(__file__), '..')))
import settings

setup_environ(settings)
from django.conf import settings
engine = import_module(settings.SESSION_ENGINE)

def dothework(gjob):
    job, session_key = pickle.loads(gjob.arg)
    for i in xrange(0, 6):
        time.sleep(1)

    if session_key:
        session = engine.SessionStore(session_key)
        status = session.get('worker.status', [])
        status.append('done with %s' % job)
        session['worker.status'] = status
        session.save()

    return True

worker = GearmanWorker(['127.0.0.1'])
worker.register_function("work", dothework
worker.work()

There’s a certain amount of rigamorale in importing the Django session object.  Using sys.path to put settings.py within our papth, importing the session object, setting up the environment, then re-importing the django settings object gives us access to the SESSION_ENGINE, and just like the client I need pickle to get at the argument and the session key passed from the django process.    This process merely sleeps for six seconds, then writes back to the user’s session that his process is done, with the word passed in from the client.

Note that both the client and the server use a token, “work”, to communicate which function they want run at the worker’s end.

One last file. Here’s view.html, the template for our example:



<h1>Test</h1>
<p>Current Status: </p>
{% if not status %}
<p>No recent status updates</p>
{% else %}
<ul>
{% for s in status %}
<li>{{ s }}</li>
{% endfor %}
</ul>
{% endif %}

<hr>
<form action="." method="POST">
{% csrf_token %}
<label for="name">Job Name: <input type="text" name="name"></label>
<input type="submit" value="submit">
</form>

Very simple and straightforward.  We’re not even using forms.  Just if there’s status then show it, and ask for another job to do.

To run this:

./manage syncdb
gearmand -d
cd workers
python workers.py &
cd ..
./manage runserver

If you did everything right, you can now browse to port 8000 on your server and get the view above. Pass it job names, and six seconds after you do, you’ll get a response. If you stack the jobs, the later ones will take longer because Gearman is running them in a serial queue, not in parallel.

And that’s it.

Full source code to the geardemo program is available.  You will still have to set up the virtualenv on your own.

11Aug

Adding ReCaptcha to Django

Posted by Elf Sternberg as django, python, web development

I learned today how to enable ReCaptcha for Django. It’s fairly trivial. I’ll show you how to enable this for account registration.

First, go and create a key pair for your site. You don’t even have to give them an email address, which is nice.

Install the recaptcha client library on your site:

pip install recaptcha-client

You’ll have to override or replace any registration templates you have, and add this to the form, somewhere in the form (usually, right above the submit button):

<p>Please be a human, and not some spamming robot:</p>

<script type="text/javascript"
    src="http://www.google.com/recaptcha/api/challenge?k=YOUR_RECAPTCHA_PUBLIC_KEY{{ captcha_error }}">
</script>

<noscript>
    <iframe src="http://www.google.com/recaptcha/api/noscript?k=YOUR_RECAPTCHA_PUBLIC_KEY{{ captcha_error }}"
        height="300" width="500" frameborder="0"></iframe><br>
    <textarea name="recaptcha_challenge_field" rows="3" cols="40">
    </textarea>
    <input type="hidden" name="recaptcha_response_field"
        value="manual_challenge">
</noscript>

Add your private key to your settings.py file, and code your “accounts.view.create” this way:

def create(request, template_name='accounts/create.html',
    redirect_field_name=REDIRECT_FIELD_NAME):

    user_form = None
    captcha_error = ""
    redirect_to = request.REQUEST.get(redirect_field_name, '')

    if request.method == "POST":
        captcha_response = captcha.submit(
            request.POST.get("recaptcha_challenge_field", None),
            request.POST.get("recaptcha_response_field", None),
            settings.RECAPTCHA_PRIVATE_KEY,
            request.META.get("REMOTE_ADDR", None))

        if not captcha_response.is_valid:
            captcha_error = "&error=%s" % captcha_response.error_code
        else:
            # perform other registration checks as needed...
            # success!
            return HttpResponseRedirect(redirect_to)

    if not user_form:
        user_form = UserForm(prefix="user")

    return render_to_response(template_name, {
        'captcha_error': captcha_error,
        'user_form': user_form},
        context_instance=RequestContext(request))

And that’s it. You have ReCaptcha enabled. I see that the python library includes an HTML generator, but it’s for recaptcha.net, and I decided to use the newer google addresses.

By the way, I’m not sure why, but I much prefer the form = None sentinel method of checking for form initialization. I think it’s a lot cleaner than a metric ton of else statements.

Subscribe to Feed

Categories

Calendar

November 2019
M T W T F S S
« Oct    
 123
45678910
11121314151617
18192021222324
252627282930