Defining a wsgi app deployment standard

Thu, 09 Feb 2012

Next month at Pycon, we’ll have a web summit and I’m invited there to talk about how I deploy web applications. This is not a new topic, as it was already discussed a bit last year — see Ian Bicking’s thought on the topic.

My presentation at the summit will be in two parts. I want to 1/ explain how I organized our Python deployments at Mozilla (using RPMs)  2/ make an initial proposal for a deployment standard that would work for the community at large – I intend to work on this during Pycon and later on the dedicated SIG.

Here’s an overview of the deployment standard idea…

How we deploy usually

If I want to roughly summarize how people deploy their web applications these days, from my knowledge I’d say that there are two main categories.

  1. Deployments that need to be done in the context of an existing packaging system — like RPM or DPKG
  2. Deployments that are done in no particular context, where we want it to just work. — like a directory containing a virtualenv and all the dependencies needed.

In both cases, preparing a deployment usually consists of fetching Python packages at PyPI and maybe compile some of them. These steps are usually done using tools like zc.buildout or virtualenv + pip, and in the case of Mozilla Services, a custom tool that transforms all dependencies into RPMs.

In one case we end up with a directory filled with everything needed to run the application, except the system dependencies, and in the other case with a collection of RPMs that can be deployed on the target system.

But in both cases, we end up using the same thing: a complete list of Python dependencies.

The trick with using tools like zc.buildout or pip is that from an initial list of dependencies, you end up pulling indirect dependencies. For instance, the Pyramid package will pull the Mako package and so on.  A good practice is to have them listed in a single place and to pin each package to a specific version before releasing the app. Both pip and zc.buildout have tools to do this.

Deployments practices I have seen so far:

  • a collection of rpms/debian packages/etc are built using tools like bdist_rpms etc.
  • a virtualenv-based directory is created in-place in production or as a pre-build binary release that’s archived and copied in production
  • a zc-buildout-based directory is created in-place in production or as a pre-build binary release that’s archived and copied in production

The part that’s still fuzzy for everyone that is not using RPMs or Debian packages is how to list system-level dependencies. We introduced in PEP 345 the notion of hint where you can define system level dependencies which name may not be the actual name on the target system. So if you say you need libxml-dev, which is valid under Debian, people that deploy your system will know they’ll need libxml-devel under Fedora. Yeah no magic here, it’s a tough issue. see Requires-External.

The Standard

EDIT : Ian has a much more rich standard proposal here. (see the comments)

The standard I have in mind is a very lightweight standard that could be useful in all our deployment practices – it’s a thin layer on the top of the WSGIstandard.

A wsgi application is a directory containing:

  • a text file located in the directory at dependencies.txt,  listing all dependencies – possibly reusing Pip’s requirements format
  • a text file located in the directory at external-dependencies.txt,  listing all system dependencies – possibly reusing PEP 345 format
  • a Python script located it the directory at bin/wsgiapp with an  “application” variable. The shebang line of the Python script might also point to a local Python interpreter (a virtualenv version)

From there we have all kind of possible scenarios where the application can be built and/or run with the usual set of tools

Here’s one example of a deployment from scratch :

  • The repository of the project is cloned
  • A virtualenv is created in the repository clone
  • pip, which gets installed with virtualenv, is used to install all dependencies describes in dependencies.txt
  • gunicorn is used to run the app locally using “cd bin; gunicorn wsgiapp:application”
  • the directory is zipped and sent in production
  • the directory is unzipped
  • virtualenv is run again in the directory
  • the app is hooked to Apache+mod_wsgi

Another scenario I’d use in our RPM environment:

  • The repository of the project is cloned
  • a RPM is built for each package in dependencies.txt
  • if possible, external-dependencies.txt is used to feed a spec file.
  • the app is deployed using the RPM collection

That’s the idea, roughly — a light standard to point a wsgi app and a list of dependencies.


Filed under: mozilla, python
Read moreComment

Scaling Crypto work in Python

Mon, 06 Feb 2012

We’re building a new service at Services called the Token Server – The idea is simple : give us a Browser ID assertion and a service name, and the Token Server will send you back a token that’s good for 30 minutes to use for the specific service.

That indirection makes our live easier to manage user authentication and resource allocation for our services . A few examples:

  • when a new user wants to use Firefox Sync, we can check which server has the smallest number of allocated users, and tell the user to go there
  • we can manage a user from a central place
  • we can manage a user we’ve never heard about before without asking her to register specifically to each service — that’s the whole point of Browser ID

I won’t get into more details because that’s not the intent of this blog post. But if you are curious the full draft spec is here - https://wiki.mozilla.org/Services/Sagrada/TokenServer

What’s this post is really about is how to build this token server.

The server is a single web service that gets a Browser ID assertion and does the following:

  1. verify the assertion
  2. create a token, which is a simple JSON mapping
  3. encrypt and sign the token

The GIL, Gevent, greenlet and the likes

Implementing this using Cornice and a crypto lib is quite simple, but has one major issue : the crypto work is CPU intensive, and even if the libraries we can use have C code under the hood, it seems that the GIL is not released enough to let your threads really use several cores. For example, we benched M2Crypto and it was obvious that a multi-threaded app was locked by the GIL.

But we don’t use threads in our Python servers — we use Gevent workers, which are based on greenlets. But while greenlets help on I/O bound calls, it won’t help on CPU bound work : you’re tied into a single thread in this case and each greenlet that does some CPU work blocks the other ones.

It’s easy to demonstrate — see http://tarek.pastebin.mozilla.org/1476644  If I run it on my Mac Book Air, the pure Python synchronous version is always faster (huh, the gevent version is *much* slower, not sure why..)

So the sanest option is to use separate processes and set up a messaging queue between the web service that needs some crypto work to be done and specialized crypto workers.

We’re back in that case to our beloved 100% I/O bound model we know how to scale using NGinx + GUnicorn + GEvent

For the crypto workers, we want it to be as fast as possible, so we started to look at Crypto++ which seems promising because it uses CPU-specific calls in ASM. There’s the pycryptopp binding that’s available to work with Crypto++ but we happen to need to do some tasks that are not available in that lib yet — like HKDF.

Yeah, at that point it became obvious we’d use pure C++ for that part, and drive it from Python.

Message passing

Back to our Token server — we need to send crypto work to our workers and get back the result. The first option that comes in mind is to use multiprocessing to spawn our C++ workers and to feed them with work.

The model is quite simple, but now that we have one piece in C++, it’s getting harder to use the built-in tools in multiprocessing to communicate with our workers — we need to be lower level and start to work with signals or sockets. And well, I am not sure what would be left of multiprocessing then.

This is doable but a bit of a pain to do correctly (and in a portable way.) Moreover, if we want to have a robust system, we need to have things like a hearbeat, which requires more inter-process message passing.   And now I need to code it in Python and C++

Hold on — Let me summarize my requirements:

  • inter-process communication
  • something less painful than signals or sockets
  • very very very fast

I got tempted by Memory Mapped Files, but the drawbacks I’ve read here and there scared me.

ZeroMQ

It turns out zeromq is perfect for this job – there are clients in Python and C++, and defining a protocol to exchange data from the Python web server to the crypto workers is quite simple.

In fact, this can be done as a reusable library that takes care of passing messages to workers and getting back results. It has been done hundreds of times, there are many examples in the zmq website, but I have failed to find any Python packaged library that would let me push some work to workers transparently, via a simple execute() call — if you know one tell me!.

So I am building one since it’s quite short and simple –  The project is called PowerHose and is located here : https://github.com/mozilla-services/powerhose.

Here is its descriptions/limitations:

  • Powerhose is based on a single master and multiple workers protocol
  • The Master opens a socket and waits for workers to register themselves into it
  • The worker registers itself to the master, provides the path to its own socket, and wait for some work on it.
  • Workers are performing the work synchronously and send back the result immediatly.
  • The master load-balances on available workers, and if all are busy waits a bit before it times out.
  • The worker pings the master on a regular basis and exits if it’s unable to reach it. It attempts several time to reconnect to give a chance to the master to come back.
  • Workers are language agnostic and a master could run heterogeneous workers (one in C, one in Python etc..)
  • Powerhose is not serializing/deserializing the data – it sends plain strings. This is the responsibility of the program that uses it.
  • Powerhose is not responsible to respawn a master or a worker that dies. I plan to use daemontools for this, and maybe provide a script that runs all workers at once.
  • Powerhose do not queue works and just rely on zeromq sockets.

The library implements this protocol and gives two tools to use it:

  • A JobRunner class in Python, you can use to send some work to be done
  • A Worker class in Python and C++, you can use as a base class to implement workers

Here’s an example of using Powerhose:

For the Token server, we’ll have:

  • A JobRunner in our Cornice application
  • A C++ worker that uses Crypto++

The first benches look fantastic — probably faster that anything I’d have implemented myself using plain sockets :)

I’ll try to package Powerhose so other projects at Mozilla can use it. I am wondering if this could be useful to more people, since I failed to find that kind of tool.  How do you scale your CPU-bound web apps ?


Filed under: mozilla, python
Read moreComment

Mozillians, win a free pass for Pycon US – take 2

Wed, 04 Jan 2012

I am extending the contest until Feb the 1st – Mozillians, win a free pass for Pycon US


Filed under: mozilla, python
Read moreComment

The fear of CRUD

Mon, 02 Jan 2012

Cornice is growing steadily, and we are thinking about the different ways to use it for our needs. One use case that comes often when we build web services is the need to publish a SQL Database via HTTP.

For instance, in a project I am working on, we might expose a list of servers and some information about them, that are stored in a SQL DB . The goal is to allow some management scripts to interact with the DB, to set and retrieve information about the servers, like: “can I use server 12 as a node for application X ?”

Interacting with CURL or a similar tool is simpler and more portable than coding yet another SQL client for this, so the idea is to see how this kind of web service can be done is the minimum pain with Cornice.

What I am thinking about building is a small CRUD interface that glues Cornice and SQLAlchemy. The latter has a way to define a database schema explicitly via mappings meaning that it’s easy to write a generic layer that exposes the database to the web via Cornice definitions. The work consists of transforming POST & PUT requests that contains data to write to the DB into SQLAlchemy objects, and transforming select results asked via GET requests into the proper responses.

Nothing very new, there are tons of existing systems that implement CRUD on the top of ORMs or plain SQL libraries. The only reason to build yet another one is to use it in the context of our current toolset which is composed of Cornice, Pyramid & SQLAlchemy for most projects. The whole code will probably be less than 300 lines at the end anyways.

Oh my.

Turns out this idea is really freaking out some people around me. There’s a strong aversion of some coders against anything that looks a bit like Active Records — in the Rails Context. In other words anything that would completely automate the serialization & deserialization layer and make it hard to tweak some code.

Another criticism is that a CRUD system would not be able to scale in the context of a big database, like Firefox Sync, that uses numerous databases to shard data.

Turns out building a CRUD on tools like SQLAlchemy or Pyramid is not really going to ruin your scalability as long as:

  • you can tweak the serialization / deserialization
  • you can override any operation in the CRUD operations when needed
  • you don’t shoot yourself in the foot by using CRUD with some code or DB that is not meant to be used that way
  • you can use the power of the underlying tools without being blocked by the lib

For the latter, Ben Bangert was pointing me at SQLAlchemy horizontal feature, which is basically what I wrote from scratch last year to make the Sync server shard across databases… At this point I sense that Firefox Sync could have been built with a CRUD lib, and be as efficient as it is today, because when I look at the queries produced by the code and the one a CRUD lib would produce, we are one or two tweaks away.

Anyways, here’s a first attempt at such a library.

Defining the model

In SQLAlchemy, you can define the DB model using mappings, which are simple classes containing a description of the tables.

For example, if I have a class “users” with a field “id” and a field “name”, the mapping will look like this:

class Users(_Base):                                           
    __tablename__ = 'users'                                      
    id = Column(Integer, primary_key=True)                    
    name = Column(String(256), nullable=False)

What I started to do is write a meta class one can use in a class to publish the mapping via HTTP.

Here’s an example:

from cornicesqla import MetaDBView
from myapp import Users, DBSession

class UsersView(object):
    __metaclass__ = MetaDBView
    mapping = Users
    path = '/users/{id}'
    collection_path = '/users'
    session = DBSession

What we have here is the definition of a view for the Users mapping. The class defines an URI for the collection (collection_path) and for each user (path). The session attribute is an SQLAlchemy session object you usually define when you work with that tool.

That’s it.

The model gets published, and you can GET, PUT, POST and DELETE on /users and /users/someid.

The code of the prototype is here and you can find a working example in the tests here. It’s called cornice-sqla

Tweaking serialization & data validation

By default, cornice-sqla will serialize and deserialize using JSON but you can tweak these steps by providing a custom serializer, or deserializer (or both.)

Let’s say you want to use the Colander libary to validate and serialize the data. To do this, you just have to write your serializer method into the view class

class UsersValidation(colander.MappingSchema):
    name = colander.SchemaNode(colander.String())

class UsersView(object):
    __metaclass__ = MetaDBView

    mapping = Users
    path = '/users/{id}'
    collection_path = '/users'
    session = DBSession

    def serialize(self):
        """Unserialize the data from the request, to serialize it for the DB"""
        try:
            user = json.loads(self.request.body)
        except ValueError:
            request.errors.add('body', 'item', 'Bad Json data!')
            # let's quit
            return

        schema = UsersValidation()
        try:
            deserialized = schema.deserialize(user)
        except Invalid, e:
            # the struct is invalid
            request.errors.add('body', 'item', e.message)

        return deserialized

Colander is used here to validate the incoming request and create a flat mapping we can push into the DB. Cornice’s error system is in usage here, as explained here.

You can tweak the data that gets back from the DB with unserialize(), and for the collection URI, use collection_serialize() and collection_unserialize().

Tweaking C, R, U or D

cornice-sqla is based on a fresh feature Gael added into Cornice lately: resources. A resource is a class where you can define get(), post(), delete() and put() methods for a given URI.

cornice-sqla views are based on resources, meaning that you can override anyone of those methods and do whatever you want if you don’t want the CRUD part.

What’s next

I need to make sure everything you can do in Cornice (acls various options etc) can still be done in cornice-sqla, and start to work with more complex DB schema that include relations etc.  I also need to add basic missing features like batching and some docs.

My hope at the end is that this small library will reduce considerably the code needed in some of our projects that interact with SQL.


Filed under: mozilla, python
Read moreComment

Pyramid @ Python 3

Sun, 25 Dec 2011

If you have been following closely the latest work done by Chris on WebOb, you know that WebOb and eventually Pyramid became Python 3 compatible.

That makes Python 3 a very tempting target for a new web project.

Paste & PasteScript still need to be ported to Python 3 and the Pyramid team has chosen not to. They have created their own paster replacer instead, which can be used to initiate a Pyramid project or run the app using the .ini file.

I am wondering if it would not be simpler at this point to drop Paste and use this replacer for all Python 3 frameworks that are using the Paste script and templates features.

Besides all the features Pyramid and its libs turns out most of the libs you usually need to build a classical web app already support Python 3, like SQLALchemy and PyMysql for MySQL access, Pylibmc for Memcached;

Things I am still missing in Python 3:

  • gevent
  • gunicorn
  • python-ldap
  • Cornice — I will port it soon

If you want to give it a shot, get the latest Python 3.2 and grab more details at : https://github.com/Pylons/pyramid/wiki/Python-3-Porting

And if you miss one lib, add it here

Merry Christmas !


Filed under: python
Read moreComment

Tutorial – build your web services with Cornice

Wed, 21 Dec 2011

At this stage, I think we’ve added enough helpers in Cornice to get anyone started in building web services in Python.

As a reminder, Cornice provides helpers to build & document REST-ish Web Services with Pyramid, a Python web framework. The main benefits of Cornice are:

  • automatic handling of some HTTP errors – Ask yourself: is your app handling properly 405 or 406 errors?
  • automatic web service documentation via a Sphinx extension.
  • a simple way to validate and convert requests data, and return structured 400 responses.

This is a small tutorial, extracted from our documentation.

Let’s create a full working application with Cornice. We want to create a light messaging service.

You can find its whole source code at https://github.com/mozilla-services/cornice/blob/master/examples/messaging

Features:

  • users can register to the service
  • users can list all registered users
  • users can send messages
  • users can retrieve the latest messages
  • messages have three fields: sender, content, color (red or black)
  • adding a message is done through authentication

Limitations:

  • there’s a single channel for all messages.
  • if a user with the same name is already registered, he cannot register.
  • all messages and users are kept in memory.

Design

The application provides two services:

  • users, at /users: where you can list all users or register a new one
  • messages, at /: where you can read the messages or add new ones

On the server, the data is kept in memory.

We’ll provide a single CLI client in Python, using Curses.

Setting up the development environment

To create this application, we’ll use Python 2.7. Make sure you have it on your system, then install virtualenv (see http://pypi.python.org/pypi/virtualenv.)

Create a new directory and a virtualenv in it:

$ mkdir messaging
$ cd messaging
$ virtualenv --no-site packages .

Once you have it, install Cornice in it with Pip:

$ bin/pip install Cornice

Cornice provides a Paster Template you can use to create a new application:

$ bin/paster create -t cornice messaging
Selected and implied templates:
cornice#cornice  A Cornice application

Variables:
egg:      messaging
package:  messaging
project:  messaging
Enter appname (Application name) ['']: Messaging
Enter description (One-line description of the project) ['']: A simple messaging service.
Enter author (Author name) ['']: Tarek
Creating template cornice
...
Generating Application...
Running python2.7 setup.py egg_info

Once your application is generated, go there and call develop against it:

$ cd messaging
$ ../bin/python setup.py develop
...

The application can now be launched via Paster, it provides a default “Hello” service, you can check:

$ ../bin/paster serve messaging.ini
Starting server in PID 7618.
serving on 0.0.0.0:5000 view at http://127.0.0.1:5000

Once the application is running, visit http://127.0.0.1:5000 in your browser or Curl and make sure you get:

{'Hello': 'World'}

Defining the services

Let’s open the file in messaging/views.py, it contains all the Services:

from cornice import Service

hello = Service(name='hello', path='/', description="Simplest app")

@hello.get()
def get_info(request):
    """Returns Hello in JSON."""
    return {'Hello': 'World'}

Users managment

We’re going to get rid of the Hello service, and change this file in order to add our first service – the users managment

_USERS = {}

users = Service(name='users', path='/users', description="Users"0

@users.get(validator=valid_token)
def get_users(request):
    """Returns a list of all users."""
    return {'users': _USERS.keys()}

@users.put(validator=unique)
def create_user(request):
    """Adds a new user."""
    user = request.validated['user']
    _USERS[user['name']] = user['token']
    return {'token': '%s-%s' % (user['name'], user['token'])}

@users.delete(validator=valid_token)
def del_user(request):
    """Removes the user."""
    user = request.validated['user']
    del _USERS[user['name']]
    return {'goodbye': user['name']}

What we have here is 3 methods on /users:

  • GET: simply return the list of users names – the keys of _USERS
  • PUT: adds a new user and returns a unique token
  • DELETE: removes the user.

Remarks:

  • PUT uses the unique validator to make sure that the user name is not already taken. That validator is also in charge of generating a unique token associated with the user.
  • GET users the valid_token to verify that a X-Messaging-Token header is provided in the request, with a valid token. That also identifies the user.
  • DELETE also identifies the user then removes it.

Validators are filling the request.validated mapping, the service can then use.

Here’s their code:

import os
import binascii
from webob import exc

def _create_token():
    return binascii.b2a_hex(os.urandom(20))

def valid_token(request):
    header = 'X-Messaging-Token'

    token = request.headers.get(header)
    if token is None:
        raise exc.HTTPUnauthorized()

    token = token.split('-')
    if len(token) != 2:
        raise exc.HTTPUnauthorized()

    user, token = token

    valid = user in _USERS and _USERS[user] == token
    if not valid:
        raise exc.HTTPUnauthorized()

    request.validated['user'] = user

def unique(request):
    name = request.body
    if name in _USERS:
        request.errors.add('url', 'name', 'This user exists!')
    else:
        user = {'name': name, 'token': _create_token()}
        request.validated['user'] = user

When the validator finds errors, it adds them to the request.errors mapping, and that will return a 400 with the errors.

Let’s try our application so far with CURL:

$ curl http://localhost:5000/users
{"status": "error", "errors": [{"location": "header",
                                "name": "X-Messaging-Token",
                                "description": "No token"}]}

$ curl -X PUT http://localhost:5000/users -d 'tarek'
{"token": "tarek-a15fa2ea620aac8aad3e1b97a64200ed77dc7524"}

$ curl http://localhost:5000/users -H "X-Messaging-Token:tarek-a15fa2ea620aac8aad3e1b97a64200ed77dc7524"
{'users': ['tarek']}

$ curl -X DELETE http://localhost:5000/users -H "X-Messaging-Token:tarek-a15fa2ea620aac8aad3e1b97a64200ed77dc7524"
{'Goodbye': 'tarek}

Messages managment

Now that we have users, let’s post and get messages. This is done via two very simple functions we’re adding in the views.py file:

messages = Service(name='messages', path='/', description="Messages")

_MESSAGES = []

@messages.get()
def get_messages(request):
    """Returns the 5 latest messages"""
    return _MESSAGES[:5]

@messages.post(validator=(valid_token, valid_message))
def post_message(request):
    """Adds a message"""
    _MESSAGES.insert(0, request.validated['message'])
    return {'status': 'added'}

The first one simply returns the five first messages in a list, and the second one inserts a new message in the beginning of the list.

The POST uses two validators:

  • valid_token(): the function we used previously that makes sure the user is registered
  • valid_message(): a function that looks at the message provided in the POST body, and puts it in the validated dict.

Here’s the valid_message() function:

def valid_message(request):
    try:
        message = json.loads(request.body)
    except ValueError:
        request.errors.add('body', 'message', 'Not valid JSON')
        return

    # make sure we have the fields we want
    if 'text' not in message:
        request.errors.add('body', 'text', 'Missing text')
        return

    if 'color' in message and message['color'] not in ('red', 'black'):
        request.errors.add('body', 'color', 'only red and black supported')
    elif 'color' not in message:
        message['color'] = 'black'

    message['user'] = request.validated['user']
    request.validated['message'] = message

This function extracts the json body, then checks that it contains a text key at least. It adds a color or use the one that was provided, and reuse the user name provided by the previous validator with the token control.

Generating the documentation

Now that we have a nifty web application, let’s add some doc.

Go back to the root of your project and install Sphinx:

$ bin/pip install Sphinx

Then create a Sphinx structure with sphinx-quickstart:

$ mkdir docs
$ sphinx-quickstart
Welcome to the Sphinx 1.0.7 quickstart utility.

..

Enter the root path for documentation.
> Root path for the documentation [.]: docs
...
> Separate source and build directories (y/N) [n]: y
...
> Project name: Messaging
> Author name(s): Tarek
...
> Project version: 1.0
...
> Create Makefile? (Y/n) [y]:
> Create Windows command file? (Y/n) [y]:

Once the initial structure is created, we need to declare the Cornice extension, by editing the source/conf.py file. We want to change extensions = [] into:

import cornice
extensions = ['cornice.sphinxext']

The last step is to document your services by editing the source/index.rst file like this:

Welcome to Messaging's documentation!
=====================================

.. services::
   :package: messaging

The services directive is told to look at the services in the messaging package. When the documentation is built, you will get a nice output of all the services we’ve described earlier.

The Client

A simple client to use against our service can do three things:

  1. let the user register a name
  2. poll for the latest messages
  3. let the user send a message !

Without going into great details, there’s a Python CLI against messaging that uses Curses.

See https://github.com/mozilla-services/cornice/blob/master/examples/messaging/messaging/client.py

Going deeper

If you want to dig deeper, here are a few links:

We’d love feedback & new contributors !


Filed under: mozilla, python
Read moreComment

New Year’s Python Meme

Tue, 20 Dec 2011

Hey I did this in 2009, let’s try again — I am adding one extra question this year

1. What’s the coolest Python application, framework or library you have discovered in 2011 ?

GEvent & Pyramid. Not discoveries, but a daily usage. GEvent was for me a fantastic way to make the Firefox Sync Python server scale without being forced to write callback-style code. Pyramid is a very elegant framework, that takes the simplicity from Pylons and the power and experience from Repoze & the Zope world. A good sign for me is that we don’t have to deal with the ZCA ;)

2. What new programming technique did you learn in 2011 ?

Better behaviour in high loaded server apps. During the last year, when we wrote all the pieces that makes Firefox Sync today, I’ve learned how to be more careful on how my apps would react when a back-end breaks or cease to reply, when a database gets slow, or when some service that’s used restarts — or when my own app restarts, still hammered by many requests. I did a fair amount of work on this, like smart pools of connectors and better testing techniques, and make decisions on what features survive when some third-party server is down, and what features just go 503.

3. What’s the name of the open source project you contributed the most in 2011 ? What did you do ?

Mozilla. I have not contributed as much as last year in Python because my work at Mozilla takes most of my time, but the good news is that all our stuff is open source so.. The most useful stuff for the community we’ve started is probably Cornice. But we’ve written and writing a myriad of apps and libs. See https://github.com/mozilla-services and http://hg.mozilla.org/services

In Python I still interact a bit with what’s going on in Packaging and hope I’ll be able to spend more time on it in 2012. But some packaging work I needed at work was also useful for the community, like pypi2rpm.

4. What was the Python blog or website you read the most in 2011 ?

Like in the past few years, Python Reddit. And I think I am not alone in that case. 90% of my blog hits come from Reddit :-)

5. What are the three top things you want to learn in 2012 ?

I’d like to learn how to program in a few new languages, just to give them a shot. Maybe Haskell. I’d also like to finish a spare project I have started with Benoit, and try to launch it, promote & market it. Last, I’d like to learn more about Firefox arcanes — just for my culture.

6. What are the top software, app or lib you wish someone would write in 2012 ?

  • I want to take a picture of a wine bottle and have it recognized in an online app, where I can share my thoughts about its taste.
  • I want an Android virtual ping-pong application, where you can use your phone as paddle and see the e-ball through the camera & play with a friend.

Want to do your own list ? here’s how:

  • copy-paste the questions and answer to them in your blog
  • tweet it with the #2012pythonmeme hashtag

Filed under: python
Read moreComment

Mozilla Apps — server side

Wed, 14 Dec 2011

Yesterday, we’ve launched the Developer Preview for Apps, you can play with at https://apps-preview.mozilla.org

The server side is composed of many pieces, and while they are subject to change since this is just a preview, I think it’s quite interesting to describe some of them already — and maybe get more contributors in the process, since everything is open source and contributors are welcome.

Here’s an overview of the system — we used:

  • Django for the App MarketPlace
  • Cornice for the Sync APIs
  • Node.js for the Sauropod APIs
  • HBase for the Sauropod DB

The App MarketPlace – Django

The App MarketPlace located at https://apps-preview.mozilla.org is where you can upload your own Apps, or install some. There’s a payment process for non-free Apps, and you can see which one you have bought in your profile.

The development is driven by the WebDev team, and is based on Django — see https://github.com/mozilla/zamboni/tree/master/apps/webapps

I can’t really describe this part, as I did not follow it closely. But basically, the Market Place keeps track of your apps and payment receipts, for other parts of the system to interact with.

The Dashboard – HTML + JS

Once you start to install Open Web Applications, you are redirected to a Dashboard at https://myapps.mozillalabs.com. This Dashboard is an HTML application that lists your installed Apps associated to your Browser ID.

What’s pretty cool is that no matter where you’ve installed a given App — Firefox on your Desktop, your Phone, it will appear on this dashboard, and synced across devices.This is done via a Sync service called AppSync.

Code pointers for the Dashboard:

AppSync

AppSync is the part I worked on. Its design was mainly done by Ian Bicking, who then worked on the client side implementation while I was working on the server side one.

It’s quite similar to what we did for Firefox Sync, except that:

  • AppSync supports BrowserID
  • The data is stored in Sauropod

Securing this data is part of a much larger ongoing project called Sauropod. The idea is that any database access has to be done with credentials, and that Sauropod is in charge of controlling them and dealing with the storage.

In the long term, depending how Sauropod evolves with respect to encryption, and how Firefox Sync itself evolves with respect to Browser ID and Sauropod access, we might merge both projects.

Or maybe Sauropod will provide APIs one day that are good enough for direct clients interactions, turning AppSync into a simple proxy ?

Time will tell !

Anyway, here’s an overview below of the AppSync system we’ve set up for this preview.

We have the AppSync server itself, that’s built using Cornice. It provides the synchronisation APIs described in this document: https://wiki.mozilla.org/Apps/Sync/Spec

Every time a client wants to write new data, we call the Sauropod server which is a very simple GET/SET api built with Node.js.

The flow is:

  • AppSync server asks for a new Sauropod session, using Browser ID credentials
  • Sauropod verifies the Browser ID credentials then create a DB token into a session
  • AppSync uses this token until its not valid anymore
  • Sauropod calls in turn an HBase cluster to access the data
  • Every write is mirrored in a MySQL database in AppSync. This mirroring was set so we can turn off Sauropod if we need to, and still have a working system. This will eventually go away later.

Find the code here:

What’s Next

I am really excited by this project. There are a lot of people involved in the Mozilla community, and seeing all the moving pieces assembled to build an Open App environement is pretty cool.

We’re going to work in the upcoming months on consolidating the whole system, making sure it scales well and correct the design as the feedback comes back.

If you want to get involved, you can install the preview, play with the available Apps and even maybe write your own Apps for the Market Place, or help us in the coding.

We’re hanging in #openwebapps on Mozilla’s IRC

EDIT : Anant wrote a very nice blog post on the topic

 


Filed under: mozilla, python
Read moreComment

QA script on web services

Mon, 12 Dec 2011

The other task Alexis and I are going to work on this week, besides Cornice, is a QA script for web services.

The goal is simple : check that a set of web services are HTTP compliant. For example, does your application send the proper 406 error when an unsupported Accept is asked by the client ?

If you document properly your web services, asking for an unsupported Accept should not occur of course, but in most projects those protocol details are often a bit vague. And someone that writes a client software will inevitably make some assumptions based on his HTTP knowledge on how the application is supposed to behave.

Richard Newman came up with a fair list of tests we could run against a web app already, and we’ve started to summarise and add more of them here: https://wiki.mozilla.org/Services/WALint

The idea of the script is to print out a report of errors and warnings it found on a web app, exactly like a lint tool would do on some code. That’s what I called it WALint (Web App Lint). Alexis doesn’t like the name but he did not find a better name yet ;)

The way it works is that you describe in a configuration file the URIs of your web services, then WALint runs tests against them, using what we called controllers.

Each controller is in charge of trying out something on the web app, using a small HTTP test client (using WebTest), given a path and a method.  WALint will provide built-in controllers and will be extensible. We will have Mozilla-specific tests, like the maximum size of a query string, or the maximum size of the request, since those limits are specific to the used stack.

We got bitten by this is the past in Sync – one web service failed to work properly because the client was building a super long query string, that was truncated along the way in our stack.

Our final goal with this tool is to be able to add in Jenkins these controls for all our web apps, and catch more problems before they occur in production.

Since it’s also useful while you build your code, WALint will have a UnitTest integration, so you can run it as a functional test from within your test suite — In that case, it will run directly against the code.

As usual, feedback & contributions are welcome. The code is being built here: https://github.com/mozilla-services/walint


Filed under: mozilla, python
Read moreComment

Cornice – Validators and 406s brought to you by Alexis

Wed, 07 Dec 2011

Alexis started to work with me this week from my house and we’re having good times. More geeks from Afpy are joining us this week-end for wine-drinking and working on a Python technical writing project.

While working myself on AppSync with the fine folks from Mozilla Labs,  I am mentoring Alexis to get him up to speed on our projects & standards at Mozilla Services.

Cornice, is a perfect match for him to start working — it’s isolated enough so he can have fun and produce features that are immediately useful, yet learn our standards & processes.

So we worked on its design with the help of its principal user, Ben Bangert, and I asked Alexis to blog about the work & doc we produced since Monday.

Read up => Introducing Cornice


Filed under: mozilla, pyramid, python
Read moreComment