Au revoir Django forms

A few months ago, I released version 0.1.0 of django-rest-fomly, and I think that it’s time to speak about the project. First, I’ll speak about the reasons motivating me to start the project, then I’ll speak about the tool itself.

1. Why django-rest-formly ?

I start using Django since 2011, and I really like it, especially because it’s a real web framework and there is a great and active community behind it. From a design perspective, I like the fact that it emphasizes DRY and CoC principles. Besides, Django comes with powerful batteries making web development rapid, including forms, templates, admin site, ORM , etc. I like all these packages, but I think that some of them, like forms, aren’t useful anymore.  Nowadays, we speak about Single Page Application (SPA), web 2.0 and Internet of Thing (IoT). In such world, there is no need for Django forms package. Yep, there are a lot of packages developed for web 1.0 and the DSF community is aware of it. In fact, they’re acting according to this reality. On December 2015, Mozilla awarded $150,000 for rewriting Django to support WebSocket and to integrate key parts of Django REST Framework, among other things (more information here).

As a software engineer, I understand the importance of separating backend from frontend, and this is what I do. This is a good practice that you had to adopt if you aren’t. One of the advantages of this practice is the possibility to use the best tool for each part. Personally, I use AngularJS in the frontend, the Angular application communicates with the backend through a RESTful interface built using Django REST framework. However, I found myself writing by hand what Django forms was doing for me, and I didn’t like that, this is a waste of time. I spent a lot of time looking for a solution that I can use to solve this problem but I didn’t find any frontend package. So, I decided to create a generic solution to this use-case.

The idea was very simple. I need an Angular module which could replace Django forms package. In other words, this angular module should be able to create forms from a configuration object, add validation for fields, etc. Generally, I didn’t like the idea of building something from scratch, especially if I find something very powerful. In my case, it was the angular-formly module: It offers all that I expect and more, but the configuration structure is not compatible with the Django REST framework. angular-formly expects something like this:

{
    key: 'email',
    type: 'input',
    templateOptions: {
        type: 'email',
        label: 'Email address',
        placeholder: 'Enter email'
    }
}

below Django REST metadata for email field:

{
  "actions": {
    "POST": {
        "email": {
            "type": "email",
            "required": false,
            "read_only": false,
            "label": "Email address"
        }
    }
  }
}

Now, I think that the idea becomes clear. django-rest-formly project gives you MAINLY a CLI tool able to create an angular-formly form configuration object for any Django REST endpoint.

2. Installation

django-rest-formly is a CLI tool build on top of Node.js. So,  you can install it with npm (Node Package Manager) :

$ npm install -g django-rest-formly

3. How to use it

If you already installed the package, the django-rest-formly command should be available. django-rest-formly has two commands:

  • list: which list all available endpoints from the root API
  • form: which return the angular-formly form configuration for a given endpoint

Let’s suppose that we have a REST API at http://127.0.0.1:8000/api/v1/, I can list all existing endpoints with this command:

$ django-rest-formly list --host 127.0.0.1 --port 8000 --root /api/v1
Available endpoints:
  snippets: http://127.0.0.1:8000/snippets.json
  users: http://127.0.0.1:8000/users.json

To have an idea about the users’ endpoint, we had to use the form command:

$ django-rest-formly form --host 127.0.0.1 --port 8000 --root /api/v1 users

I think that it’ll be better that you interact with the tool. So I recommend you to follow the below steps:

1) clone the django-rest-framework-tutorial repository from my GitHub account: if you have git installed, just run

$ git clone http://github.com/benzid-wael/todo

2) Now, install create a python virtual environment and install the dependencies:

$ cd /path/to/django-rest-framework-tutorial
$ virtualenv --no-site-packages env
(env) $ pip install -r requirements.txt

3) Now, we can start the server

$ python manage.py runserver

Super! To list all available endpoints, just run the below command:

$ django-rest-formly list --host 127.0.0.1 --port 8000
Available endpoints:
  tasks: http://127.0.0.1:8000/tasks.json
  users: http://127.0.0.1:8000/users.json

Now let’s have a look into the tasks endpoint, we’ll use for that the form command:

$ django-rest-formly form --host 127.0.0.1 --port 8000 tasks

Now you can use the JSON output, to generate the correspondent form to your endpoint with form validation in place:

/* global angular */
(function() {
  'use strict';

  var app = angular.module('djangoRestFormlyExample', ['formly', 'formlyBootstrap']);

  app.controller('MainCtrl', function MainCtrl(formlyVersion) {
    var vm = this;
    // function assignment
    vm.onSubmit = onSubmit;

    // variable assignment
    vm.author = { // optionally fill in your info below 🙂
      name: 'Wael BEN ZID<span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>'
    };
    vm.exampleTitle = 'Introduction';
    vm.env = {
      angularVersion: angular.version.full,
      formlyVersion: formlyVersion
    };

    vm.model = {
      note: "",
      score: 0
    };
    vm.options = {
      formState: {
        awesomeIsForced: false
      }
    };

    vm.fields = DjangoRestFormly.toFormly(formFieldConfig);

    // function definition
    function onSubmit() {
      alert("You clicked on 'Submit' button");
    }
  });

})();

Hope you enjoyed the tutorial.

Advertisements

Strange Python Import Error

Recently, when I was installing a big Django project in my macOS Sierra, I was sucking with a strange ImportError.

ImportError: No module named certs

Where certs.py is a submodule of the requests package. At the first look, I thought that I didn’t install the right version of the requests package. But, I was wrong: I found the file under the right directory in my filesystem.

Now it remains only two options, either the virtualenv is not properly configured or there is a problem with Python in macOS. The first option  was eliminated as the shell was working fine and I was able to import the same module on the shell session. Then, I did another attempt, I checked file permission, even it doesn’t make sense on that time, and the user has all privileges to manipulate the whole directory.

Honestly, I didn’t find any explication and I didn’t know from where to start. But, after a discussion with a colleague, he told me that another colleague was sucking with the same issue for one week and he fixed it by increasing the max files limits. Now everything start to make sense in some way or another. As you may guess it, Python was not handling properly this error.

Solution

Now, you know the problem, so the solution is to increase max files limits:
* For older macOS (Lion or before)

You may add the following line to /etc/launchd.conf (owner: root:wheel, mode: 0644):

limit maxfiles 262144 524288

* For Mountain lion:
You may add the following lines to /etc/sysctl.conf (owner: root:wheel, mode: 0644):

kern.maxfiles=524288
kern.maxfilesperproc=262144

* For Mavericks, Yosemite, El Capitan, and Sierra:

You have to create a file at /Library/LaunchDaemons/limit.maxfiles.plist (owner: root:wheel, mode: 0644):

<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"            "http://www.apple.com/DTDs/PropertyList-1.0.dtd>
<plist version="1.0">
  <dict>
    <key>Label</key>
    <string>limit.maxfiles</string>
    <key>ProgramArguments</key>
    <array>
      <string>launchctl</string>
      <string>limit</string>
      <string>maxfiles</string>
      <string>262144</string>
      <string>524288</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>ServiceIPC</key>
    <false/>
  </dict>
</plist>

 

 

Testing angular 2 apps: Part 1 – Introduction

In this series, I’m going to show you some tips about writing unit tests in Angular 2 with Jasmine framework. Right now, it’s hard to find up-to-date recipes to write unit tests for Angular 2 (version 2.4 exactly). But, I hope that this series will provide you with some materials to get you going.

1. Angular testing module

Angular 2 requires you to import your testing methods from @angular/testing, including Jasmine functions aka describe, beforeEach, it and so on.

import {
  describe,
  expect,
  beforeEach,
  it,
  inject
} from '@angular/core/testing';

But this is not something that you had to memorize. In fact, if you use angular-cli to seed your project which I recommend. You’ll get always the minimal code for unit testing for the unit under test (component or service, etc.). If you run the following command

$ ng generate service user-profile

angular-cli will generate two files for you:user-profile.service.ts and user-profile.service.spec.ts. The .spec file will look like:

import { TestBed, async, inject } from '@angular/core/testing';
import { UserProfileService } from './user-profile.service';

describe('UserProfileService', () => {
   beforeEach(() =>; {
     TestBed.configureTestingModule({
       providers: [UserProfileService]
     });
   });

   it('should ...', inject([UserProfileService], (service: UserProfileService) => {
     expect(service).toBeTruthy();
   }));
});

So what we have here ? We imported some testing utilities from Angular 2, three exactly:

  1. TestBed: which is very similar to NgModule, as it helps to setup dependencies for the defined tests. This dependencies should be passed to .configureTestingModule() and it’ll be used to resolve any dependency.
  2. async: In fact, we need to use async when dependencies involve asynchronous processing (XHR calls)
  3. inject: allow us to inject dependencies at the TestBed level

2. Dependency Injection

As mentioned earlier, We use TestBed to declare dependencies for out test suites. We had to use inject to get dependencies at TestBed level.

   it('should ...', inject([Service], (service: Service) => {
     expect(service).toBeTruthy();
   }));

If you need to get the dependency at component level, you need to use the
component injector which is a property of fixture’s DebugElement.

describe('MyComponent', () => {
   beforeEach(() => {
     TestBed.configureTestingModule({
       declarations: [ MyComponent ],
       providers: [ MyService ]
     });
   });
it('should ...', () => {
    let fixture = TestBed.createComponent(MyComponent);
    let service = fixture.debugElement.injector.get(MyService);
  });

});

WARNING: DON’T CREATE THE SERVICE USING THE CONSTRUCTOR. Don’t think even about it. This is neither safe nor maintainable.

Now, we can refactor the code using Jasmine’s beforeEach function so we don’t repeat this snippet for each spec.

describe('MyComponent', () => {

   let component: MyComponent,
       service: MyService,
       fixture: ComponentFixture<MyComponent>;</pre>

   beforeEach(() => {
     TestBed.configureTestingModule({
       declarations: [ MyComponent ],
       providers: [ MyService ]
     });
   });

   beforeEach(() => {
     fixture = TestBed.createComponent(MyComponent);
     component = fixture.componentInstance;
     service = fixture.debugElement.injector.get(MyService);
   });

   it('should ...', () => {
     expect(service).toBeTruthy();
   });
});

3. Testing coverage

Generally we need to test everything in our application including Pipes, Services, Components and custom classes. Our tests should cover logic written with TypeScript, generated DOM, emitted events, etc.

Hopefully, angular-cli provides a rich test command. In fact, it provide the –cov parameter,  which will generate a test coverage report for your project. Furthemore, if you use a CI server, you can display a nice report using Cobertura plugin for example. You can find below an example:

cobertura-example.png

Thanks for reading! Have any questions? Ping me at @benzid_wael. In the next posts, I’ll show you how to write tests for services, components, etc.

TypeScript

Personally, I think that the most awesome Microsoft innovation in the web is the TypeScript language. TypeScript is an interesting  evolution of JavaScript. Indeed, TypeScript is for JavaScript much like what C++ was for C back then. It defines a superset and powerful facilities for the JavaScript language, which make developer life easier.

Installing TypeScript on Linux

Nothing special here, TypeScript is delivered as a Node.js module

sudo npm install -g typescript

Why TypeScript ?

TypeScript is a strongly-types superset of JavaScript. In other words, it improve the syntax of your JavaScript code.  TypeScript encourages declarative programming which we miss with JavaScript which we’ll discuss further in the rest of this tutorial.

In the other hand, Angular 2 was fully written in TypeScript and it’s developed by Microsoft. Thus, it becomes the core front-end language for the major tech companies in the world. This let me think that it’ll be probably the standard for front-end development in the incoming years.

1. Type annotation

TypeScript support most basic types you would expected in any language: boolean, number, array, etc.

var isRuning: boolean = false;
var height: number = 6;
var status: string = "stopped";
var myList: number[] = [1, 2, 3];

A helpful addition to the standard set of datatypes from JavaScript is the ‘enum‘ type.

enum Color {White, Red, Green, Blue, Purple, Black};
var c: Color = Color.White;

TypeScript also added a new type Any, as its name indicate it describe a variable which you have no idea about its type at the time of writing code. For this type we have not any type-checking.

var nonTyped: any = True;
nonTyped = "Maybe a string instead";

2. Interfaces

In real world, we need to represent more complex types (aka records). For this purpose, TypeScript define the interface keyword. Below an example from the official intro tutorial (which you can find here):

interface Person {
 firstname: string;
 lastname: string;
}

function greeter(person : Person):string {
 return "Hello, " + person.firstname + " " + person.lastname;
}

let user = {firstname: "Jane", lastname: "User"}

document.body.innerHTML = greeter(user);

In the above code, we defined an interface Person and a function greeter which
accept an object person which should implements the Person interface and returns
a string. In other words, the person object should have firstname and lastname properties.

3. TypeScript and OOP

TypeScript comes with awesome batteries that makes OOP easier for developers and manage browser-compatability for you. Below, a simple example:

class Person {
    name: string;
    constructor(name: string) {
        this.name = name;
    }
    whois() {
        return this.name;
    }
}

var person = new Person("John Doe");
console.log(person.whois())

As you see, the syntax is easy to understand; especially if you know Java or C#.

3.1. Inheritance

In addition to being able to define custom classes, you can also extend it by using extends reserved keyword. Below an example:

class Person {
  firstName: string;
  lastName: string;

  fullName(): string {
     return this.firstName + ' ' + this.lastName;
  }
}

class Student extends Person {
  major: string;
}

class Teacher extends Person {
  salary: double;
}

In this example; the Student and Teacher class inherits from the Person class.
As a result, it has access to firstName and lastName properties and to th fullName
method.

3.2. super

super is a reserved typescript keyword, which give you the possibility to call the parent method from child instance. Below an example

class Base {
    log() {
      console.log('hello world');
    }
}

class Child extends Base {
    log() {
      super.log();
    }
}

4. Typings

Geerally, we use 3rd-party libraries in our projects, like jQuery / angular.js and TypeScript expect a declaration file— denoted by a .d.ts extension —  where it can find the API definition of your library. In this stage you should know about the gigantic DefinitelyTyped repo, where you’ll find declaration of almost all JavaScript libraries. We won’t go into too much detail here, but if we wanted to use underscore types, for example, we could simply go typings install underscore --save and have them downloaded into a path defined in typings.json. After that, you could use underscore’s type definitions anywhere in your project simply by including this line:

/// <reference path="underscore/underscore.d.ts" />

5. Conclusion

TypeScript is an interesting push toward improving on JavaScript’s shortcomings by introducing a static typing system, complete with interfaces and type unions. This helps us write safer, more legible and declarative code.

Do you fill that you’ll use TypeScript in your next project ? Do you think that it’ll be the future of JavaScript ? Or do you think that it’ll fail ? Let me know what you think below!

Firefox caching and jQuery plugins

If you’re using jQuery dropdown or bootstrap_dropdowns_enhancement plugin, don’t forget to set  autocomplete=”off” in any checkbox or radio input inside the dropdown menu. Otherwise, we’ll find that the browser display a value, but send another one after submitting your form.

What’s happening ?

With normal input, the last value will be remembered by Firefox when you refresh the page, not if you hit Ctrl + F5 which will clear the value. So, your dropdown will display the default value, and Firefox will set the last checked radio / checkbox values.

Practical introduction to web mining: data wrangling

Most of programming work in data analysis project is spend in data preparation stage, and these’s due to the fact that the collected data is not already represented in the required and expected structure for your data processing application. Hopefully, Twitter data is structured, so we’ll not spend a lot of time in this stage.

First thing that we had to do is loading the collected data. There’s nothing special here, we need only the  json python module. Below the code:

import json

def load_tweets(path):
    tweets = []
    with open(path, 'r') as file_stream:
        for line in file_stream:
            try:
                tweet = json.loads(line)
                tweets.append(tweet)
            except:
                pass
    return tweets

tweets_list = load_tweets("PL_tweets.txt")

1. Pandas

Next we will create a pandas DataFrame. Pandas is an open source python library providing high-level data structures and tools for data analysis. Pandas has mainly two data structures types:

  • Series: one-dimensional array containing an array of data and an associated array of index.
  • DataFrame: tabular data structure containing a collection of columns. DataFrame has both a row and column index. In other words, a DataFrame is a collection of Series.

Let’s first explore the tweet structure. If you don’t have an idea about the Twitter API, it’s a good idea to look first to the official documentation before completing this tutorial. Personally, I think that the key attributes of a tweet are:

  • id: the tweet identifier
  • text: the text of the tweet itself
  • lang: acronym for the language (e.g. “en” for english, “fr” for french)
  • created_at: the date of creation
  • favorite_count, retweet_count: the number of favorites and retweets
  • place, coordinates, geo: geo-location information if available
  • user: the author’s full profile
  • entities: list of entities like URLs, @-mentions, hashtags and symbols
  • in_reply_to_user_id: user identifier if the tweet is a reply to a specific user
  • in_reply_to_status_id: status identifier id the tweet is a reply to a specific status

The below code will create a Pandas DataFrame object containing the most usefule tweet’s metadata that we will use in the next post of this series:

import pandas as pd

# create Pandas DataFrame
tweets = pd.DataFrame()

# create some columns
tweets['tweetID'] = [ tweet['id'] for tweet in tweets_list ]
tweets['tweetText'] = [ tweet['text'] for tweet in tweets_list ]
tweets['tweetLang'] = [ tweet['lang'] for tweet in tweets_list ]
tweets['tweetCreatedAt'] = [ tweet['created_at'] for tweet in tweets_list ]
tweets['tweetRetweetCount'] = [ tweet['retweet_count'] for tweet in tweets_list ]
tweets['tweetFavoriteCount'] = [ tweet['favorite_count'] for tweet in tweets_list ]
tweets['tweetGeo'] = [ tweet['geo'] for tweet in tweets_list ]
tweets['tweetCoordinates'] = [ tweet['coordinates'] for tweet in tweets_list ]
tweets['tweetPlace'] = [ tweet['place'] for tweet in tweets_list ] 

# tweeple information 
tweets['userScreenName'] = [ tweet['user']['screen_name'] for tweet in tweets_list ]
tweets['userName'] = [ tweet['user']['name'] for tweet in tweets_list ]
tweets['userLocation'] = [ tweet['user']['location'] for tweet in tweets_list ]

# tweet interaction 
tweets['tweetIsReplyToUserId'] = [ tweet['in_reply_to_user_id'] for tweet in tweets_list ]
tweets['tweetIsReplyToStatusId'] = [ tweet['in_reply_to_status_id'] for tweet in tweets_list ]

Super ! we created our first data frame. Pandas data frame provide a beautiful and rich API, from visualizing to interacting with the dataframe:

  • head(N): returns first N rows
  • tail(N): returns last N rows
  • iteritems(): iterator over (column name, series) pair
  • etc.

The code below will display the first 5 rows in our data frame:

>>> tweets.head(5)

2. Cleaning Data

Unfortunately, the acquired data is usually dirty and have a lot of inconsistencies, which could be duplicated entries, bad values, not normalized values, etc. So, the cleanup process should include mainly:

  • removing duplicate entries
  • strip whitespaces
  • normalize numbers, dates, etc.

The output of this process is a clean dataset: a dataset consisted only of valid and normalized values, and this will ensure that our analysis code WILL NOT CRASH !

2.1 Missing data

If you followed the previous steps in this tutorial, you noticed probably, as shown in the below figure, the NaN values in some columns. NaN is a special value to denote missing data.

Pandas NaN value

fig 1. Missing data

Now, we had to handle this missing values. In fact, we had mainly two options:

  • replacing all NaN values with None
  • treat each column separately. For example, replacing NaN by None for tweetIsReplyToUserId and tweetIsReplyToStatusId columns, and replacing both None and NaN by “Unknown” for userLocation column, etc.

Personally, I will opt to the second option, and I’ll use the fillna method which will fill NaN values by the given value:

# let's handle userLocation column
tweets.userLocation.fillna("Unknown", inplace=True)
# Now let's replace the other NaN values by None
tweets.fillna(lambda: None)

Note that I set inplace argument of fillna method to True explicitly. Otherwise, the userLocation series will not be modified.

2.2 Bad data

If  you took previously a look to the Twitter documentation, you knew probably that the values of the  tweetCreatedAt column are a string representation of a date and time object. We had to convert these values to a datetime object.

You can use strptime function of datetime package which parse a string representation of date and/or time object. But, I prefer to use Pandas’ to_datetime method which will parse and convert the entire series.

tweets.tweetCreatedAt = pd.to_datetime(tweets.tweetCreatedAt)

2.3 Duplicated data

Really, I didn’t expect to have duplicated entries in my dataset. But as the script crashed several time, I wasn’t surprised. Pandas provide some methods to deal with duplicated data. The duplicated method will annotate rows by a boolean specifying if that row is duplicated or not. By default, the row identity is defined by checking all columns, but you can restrict it on specific columns. For our example, we can specify only the tweetID  column as it’s a unique ID for the tweet.


>>> tweets.duplicated(['tweetID'],
                      keep="last")
0        False
1        False
2        False
3        False
4        False
5        False
6         True
7        False
8        False
9        False
10       False
11       False
...

You can drop duplicated rows using drop_duplicates method, as below:


>>> tweets.drop_duplicates(['tweetID'],
                           take_last=True)

Conclusion

I think that I spoke about the most important tips/steps on data wrangling stage. But you had to not that twitter data is structured and clean but this not the regular case. In fact, real-world data is dirty: you had to do more work on it to be able to use it.

Waiting for your comments and suggestions.

Practical introduction to web mining: collect data

Web mining is the application of natural language processing techniques to web content in order to retreive relevant information. It  became more important these days due to an exponential increase in digital content especially with the apperance of social media platforms, especially Twitter which constitue a rich and fiable information source.

In this series, I’ll explain how to collect twitter data, manipulate it and extract knowledge from it. As I am fan of Python, I’ll try to compare Python to other programming languages such as Java, Ruby and PHP based on information that we will collect from twitter.

In this tutorial, we will start by collecting data from twitter, introduce tweepy and the structure of twitter data.

1. Create a Twitter application

First of all, you should have some Twitter keys to be able to connect to twitter API and gather data from it. We need especially API key, API secret, Access token and Access token secret. To get this informations, follow steps bellow:

  1. go to https://apps.twitter.com and login with your twitter account.
  2. Create a new Twitter application
  3. In the next page, precisely in the “API keys” tab you can find both API key, API secret
  4. Scroll down and generate you access token and token secret

create_twitter_app

Once you created a new Twitter app and generated your keys, you can move to the next step and start collecting data.

 2. Getting Data From Twitter

We will use the Twitter Stream API to collect tweets related to 4 keywords: python, java, php and ruby. Happily, the Twitter Stream API is restful and give us the possibility to filter tweets by keywords. The code below, will fetch popular tweets that contains one of the keywords mentioned earlier:

#!/bin/python
# -*- coding: utf-8 -*-

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

# User credentials for Twitter API 
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"


class StdoutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':
    # Twitter authetification
    listner = StdoutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, listner)

    # Filter Twitter Streams to capture data by the keywords
    stream.filter(track=['python', 'java', 'php', 'ruby'])

Now if you run this command:


python get_tweets.py >>PL_tweets.txt

you’ll have information about most popular tweets containing one of the keywords python, java, php and ruby in the specified txt file.

3. Understand Twitter response

The data collected previously is in JSON format, so it’s easy to read and understand. But, I’ll take the time here to highlight some useful informations inside the twitter response .

tweet_sample

As you propably noticed, the tweet contains information about the tweeple, list of tags and URIs appeared in the tweet, the main text of the tweet, retweet count, favourite count, etc.

Awesome, now you should start collect data. Next posts of this series will be hot and exciting, and you need a lot of data for it: more data, better experience.

Stay tuned ….