Ossterdam

Qyu: A distributed task execution system for complex workflows

Mohamed Osama — Wed, 21 Mar 2018 12:26:31 GMT

Here at FindHotel we build many of our tools in-house and whenever we think that someone can use and benefit from those tools, we release them to the open-source community. This article is about Qyu, a distributed task execution system we built for our advertising campaigns building and marketing software, and how we use it to manage our internal workflows. About two years ago, we were using Resque as a job scheduler for our Ruby services. The application was monolithic, memory-consuming and jobs were always lost in Resque's Redis queue. Resque's workers died abruptly without reliable error reporting, had to be redeployed if we want to change the number of workers on a certain queue and was not supported any longer by the open-source community. We had 10s of services that are used at a variable rate so adding to all the previous problems, we were paying more money for infrastructure that we should. We needed a solution that is reliable, autoscaling and that will help us to break our monolith into microservices.

Qyu is built in Ruby and depends on two basic pieces: a message queue and a state store. By default, as well as for testing purposes, Qyu ships with an in-memory message queue and state store which are definitely not suitable for production purposes. For production usage, we implemented two state stores based on ActiveRecord (for relational databases) and Redis, in addition to two message queues based on Amazon Simple Queue Service (SQS) and Redis. In our production environment, we use the ActiveRecord adapter with PostgreSQL along with the Amazon SQS adapter.

Architecture

State Store

The state store is the persistence layer of Qyu. It has 3 main models:

Workflow: Used to generate tasks under a job. It describes how a certain program flow looks like.
Job: A set of tasks running according to a certain workflow.
Task: Single building block of a job. It has it's payload and own status.

The following is an example of a simple workflow:

descriptor = {
  'starts' => %w(
    print_hello
  ),
  'tasks' => {
    'print_hello' => {
      'queue' => 'print-hello'
    }
  }
}

This workflow instructs the program to start with a task called print_hello and then this task is set to be enqueued in a queue called print-hello. A job using this workflow will have only one task enqueued in the queue.

Message Queue

The message queue component appends task IDs to their respective queues. It provides an organized and fault-tolerant method of dequeuing tasks and ensuring they run successfully before declaring them a success and removing them. An example of the message in a queue is {"task_id": 420}. A listening worker dequeues the message, gets the payload from the state store and starts processing the task.

Usage

Create your first workflow:

Qyu::Workflow.create(name: 'say-hello', descriptor: descriptor)

The above will create a workflow titled say-hello . Now we need to use that workflow by creating a job that follows it. We are gonna create a job with a sample payload as follows. Once you call start on the job, it will create the children tasks in the state store and enqueue the task ID in Qyu's message queue.

job = Qyu::Job.create(workflow: 'say-hello', payload: { 'times' => 5 })
job.start

Now that we have a message in the queue, we need a worker to listen on that queue, consume the message and perform something according to the specified payload. The following worker consumes the messages in the print-hello queue and prints "Hello" a certain number of times that is specified in the payload.

class SimpleWorker
  def initialize
    @worker = Qyu::Worker.new
  end
 
  def run
    # Consumes message from print-hello queue
    @worker.work('print-hello') do |task|
      task.payload['times'].times do |i|
        puts "#{i + 1}. Hello"
      end
    rescue StandardError => ex
        # do something
    end
  end
end
 
SimpleWorker.run

This worker will this pop a message from the queue and execute the code on it. Output for the above program is the following:

1. Hello
2. Hello
3. Hello
4. Hello
5. Hello

It read a parameter from the payload and used it in the worker. This is a very simple example to how Qyu can be used but there is other great features for Qyu that did not manifest themselves in the above example. One of the most interesting features is the sync workers or sync gates.

Sync Workers

In some use cases, you may need some tasks to finish before starting another specific task that depends on the output of those tasks. This is where Qyu's sync workers become useful. A sync worker can be started as follows:

w = Qyu::Workers::Sync.new
w.work('sync-adgroups')

To demonstrate the point, I will give one of our use cases as an example for the sync worker. At FindHotel we use Qyu to manage out campaign building software. We first build empty ad groups, populate those ad groups with ads and keywords and then at the end we package those ad groups into campaigns. The services that generate ads and keywords need to wait for the service that generates ad groups. In the same fashion, the service that packages those ad groups into campaigns need to wait for all of this. The following is out workflow descriptor for this component.

descriptor = {
    'starts' => [
      'ad:group:generate'
    ],
    'tasks'  => {
      'ad:group:generate'  => {
        'queue'  => 'ad-group-generate',
        'starts' => [
          'ad:group:children:split'
        ]
      },
      'ad:group:children:split' => {
        'queue' => 'ad-group-children-split',
        'starts_manually' => ['ad:group:children:generate'],
        'starts_with_params' => {
          'ad:group:sync:gate' => {
            'nr_tasks' => {
              'count' => 'ad:group:children:generate'
            }
          }
        }
      },
      'ad:group:children:generate' => {
        'queue' => 'ad-group-children-generate'
      },
      'ad:group:sync:gate' => {
        'queue' => 'sync-adgroups',
        'waits_for' => {
          'ad:group:children:generate' => {
            'condition' => {
              'param' => 'nr_tasks',
              'function' => 'eq_completed'
            }
          }
        },
        'starts' => [
          'campaign:package'
        ]
      },
      'campaign:package' => {
        'queue' => 'campaign-package',
      }
    }
  }

Notice the ad:group:sync:gate in sync-adgroups queue which waits for all ad:group:children:generate tasks to finish successfully before starting the dependent campaign:package task. Once the number of spawned tasks is equal the number of completed tasks; campaign packaging starts. So whenever there is a need to optimize code runtime and flexibility, Qyu is useful to paralellize some parts of the workflow and get a single output in the end. You can deploy multiple workers in Docker containers and scale some up/down according to load at any certain point in the workflow. For example we have 1 docker container for campaign-package since it is a simple task that should not be parallelized but we generate ads and keywords under ad groups via multiple docker containers.

To conclude, Qyu is very useful to us and we love it. We released it to the open source community to further improve it and hoping that it will help other developers build distributed systems. Feel free to submit issues or PRs at https://github.com/QyuTeam/qyu.

Gulp: A Web Developer's Secret Weapon for Maximizing Site Speed

Irina M. Papuc — Fri, 01 Jul 2016 10:33:42 GMT

Many of us have to handle web based projects that are used in production, which provide various services to the public. When dealing with such projects, it is important to be able to build and deploy our code quickly. Doing something quickly often leads to errors, especially if a process is repetitive, therefore it’s a good practice to automate such a process as much as possible.

Gulp: A Web Developer's Secret Weapon for Maximizing Site Speed

My fellow developers: There is no excuse for serving junk to your browser.

In this post, we will be looking at a tool that can be a part of what will allow us to achieve such automation. This tool is an npm package called Gulp.js. In order to become familiar with the basic Gulp.js terminology used in this post, please refer to “An Introduction to JavaScript Automation with Gulp” that was previously published on the blog by Antonios Minas, one of our fellow Toptal developers. We will assume basic familiarity with the npm environment, as it is used extensively throughout this post to install packages.

Serving Front-End Assets

Before we continue, let’s take a few steps back to get an overview of the problem that Gulp.js can solve for us. Many web-based projects feature front-end JavaScript files that are served to the client in order to provide various functionalities to the web page. Usually there’s also a set of CSS stylesheets that are served to the client as well. Sometimes when looking at the source code of a website or a web application, we can see code like this:

There are a few problems with this code. It has references to two separate CSS stylesheets and four separate JavaScript files. This means that the server has to make a total of six requests to the server, and each request has to separately load a resource before the page will be ready. This is less of an issue with HTTP/2 because HTTP/2 introduces parallelism and header compression, but it’s still an issue. It increases the total volume of traffic that is required to load this page, and reduces the quality of user experience because it takes longer to load the files. In case of HTTP 1.1, it also hogs the network and reduces the number of request channels that are available. It would have been much better to combine the CSS and JavaScript files into a single bundle for each. That way, there would be only a total of two requests. It would also have been nice to serve minified versions of these files, which are usually much smaller than the originals. Our web application might also break if any of the assets are cached, and the client would receive an outdated version.

Overload

One primitive approach to solving some of these problems is to manually combine each type of asset into a bundle using a text editor, and then run the result through a minifier service, such as http://jscompress.com/. This proves to be very tedious to do continuously during the development process. A slight but questionable improvement would be to host our own minifier server, using one of the packages available on GitHub. Then we could do things that would look somewhat similar to the following:

This would serve minified files to our client, but it would not solve the problem of caching. It would also cause additional load on the server, since our server would essentially have to concatenate and minify all the source files repetitively on every request.

Automating with Gulp.js

Surely we can do better than either of these two approaches. What we really want is to automate bundling and include it in the build phase of our project. We want to end up with pre-built asset bundles that are already minified and are ready to serve. We also want to force the client to receive the most up to date versions of our bundled assets on every request, but we still want to leverage caching if possible. Luckily for us, Gulp.js can handle that. In the remainder of the article, we will be building a solution that will leverage the power of Gulp.js to concatenate and minify the files. We will also be using a plugin to bust the cache when there are updates.

We will be creating the following directory and file structure in our example:

public/
|- build/
|- js/
|- bundle-{hash}.js
|- css/
|- stylesheet-{hash}.css
assets/
|- js/
|- vendor/
|- jquery.js
|- site.js
|- module1.js
|- module2.js
|- css/
|- main.css
|- custom.css
gulpfile.js
package.json
npm makes package management in Node.js projects a bliss. Gulp provides tremendous extensibility by taking advantage of npm’s simple packaging approach to deliver modular and powerful plugins.
The gulpfile.js file is where we will define the tasks that Gulp will perform for us. The package.json is used by npm to define our application’s package and track the dependencies that we will be installing. The public directory is what should be configured to face the web. The assets directory is where we will store our source files. To use Gulp in the project, we will need to install it via npm, and save it as a developer dependency for the project. We will also want to start with the concat plugin for Gulp, which will allow us to concatenate multiple files into one.

To install these two items, we will run the following command:

npm install --save-dev gulp gulp-concat
Next, we will want to begin writing the content of gulpfile.js.

var gulp = require('gulp');
var concat = require('gulp-concat');

gulp.task('pack-js', function () {
return gulp.src(['assets/js/vendor/.js', 'assets/js/main.js', 'assets/js/module.js'])
.pipe(concat('bundle.js'))
.pipe(gulp.dest('public/build/js'));
});

gulp.task('pack-css', function () {
return gulp.src(['assets/css/main.css', 'assets/css/custom.css'])
.pipe(concat('stylesheet.css'))
.pipe(gulp.dest('public/build/css'));
});

gulp.task('default', ['pack-js', 'pack-css']);
Here, we are loading the gulp library and its concat plugin. We then define three tasks.

The first task (pack-js) defines a procedure to compress multiple JavaScript source files into one bundle. We list the source files, which will be globbed, read, and concatenated in the order specified. We pipe that into the concat plugin to get one final file called bundle.js. Finally, we tell gulp to write the file to public/build/js.

The second task (pack-css) does the same thing as above, but for the CSS stylesheets. It tells Gulp to store the concatenated output as stylesheet.css in public/build/css.

The third task (default) is the one Gulp runs when we invoke it with no arguments. In the second parameter, we pass the list of other tasks to execute when the default task is ran.

Let’s paste this code into gulpfile.js using any source code editor that we normally use, and then save the file to the application root.

Next, we will open the command line and run:

gulp
If we look at our files after running this command, we will find two new files: public/build/js/bundle.js and public/build/css/stylesheet.css. They are concatenations of our source files, which solves part of the original problem. However, they are not minified, and there is no cache busting yet. Let’s add automated minification.

Optimizing Built Assets

We will need two new plugins. To add them, we will run the following command:

npm install --save-dev gulp-clean-css gulp-minify
The first plugin is for minifying CSS, and the second one is for minifying JavaScript. The first one uses the clean-css package, and the second one uses the UglifyJS2 package. We will load these two packages in our gulpfile.js first:

var minify = require('gulp-minify');
var cleanCss = require('gulp-clean-css');
We will then need to use them in our tasks just before we write the output to disk:

.pipe(minify())
.pipe(cleanCss())
The gulpfile.js should now look like this:

var gulp = require('gulp');
var concat = require('gulp-concat');
var minify = require('gulp-minify');
var cleanCss = require('gulp-clean-css');

gulp.task('pack-css', function () {
return gulp.src(['assets/css/main.css', 'assets/css/custom.css'])
.pipe(concat('stylesheet.css'))
.pipe(cleanCss())
.pipe(gulp.dest('public/build/css'));
});

gulp.task('default', ['pack-js', 'pack-css']);
Let’s run gulp again. We will see that the file stylesheet.css is saved in minified format, and the file bundle.js is still saved as is. We will notice that we now also have bundle-min.js, which is minified. We want only the minified file, and we want it saved as bundle.js, so we will modify our code with additional parameters:

.pipe(minify({
ext:{
min:'.js'
},
noSource: true
}))
As per gulp-minify plugin documentation (https://www.npmjs.com/package/gulp-minify), this will set the desired name for the minified version, and tell the plugin not to create the version containing the original source. If we delete the content of the build directory and run gulp from the command line again, we will end up with just two minified files. We have just finished implementing the minification phase of our build process.

Like what you're reading?Get the latest updates first.

Enter your email address...
Get Exclusive Updates
No spam. Just great engineering and design posts.
Cache Busting

Next, we will want to add cache busting, and we will need to install a plugin for that:

npm install --save-dev gulp-rev
And require it in our gulp file:

var rev = require('gulp-rev');
Using the plugin is a bit tricky. We have to pipe the minified output through the plugin first. Then, we have to call the plugin again after we write the results to disk. The plugin renames the files so that they are tagged with a unique hash, and it also creates a manifest file. The manifest file is a map that can be used by our application to determine the latest filenames that we should refer to in our HTML code. After we modify the gulp file, it should end up looking like this:

var gulp = require('gulp');
var concat = require('gulp-concat');
var minify = require('gulp-minify');
var cleanCss = require('gulp-clean-css');
var rev = require('gulp-rev');

gulp.task('pack-js', function () {
return gulp.src(['assets/js/vendor/.js', 'assets/js/main.js', 'assets/js/module.js'])
.pipe(concat('bundle.js'))
.pipe(minify({
ext:{
min:'.js'
},
noSource: true
}))
.pipe(rev())
.pipe(gulp.dest('public/build/js'))
.pipe(rev.manifest())
.pipe(gulp.dest('public/build'));
});

gulp.task('pack-css', function () {
return gulp.src(['assets/css/main.css', 'assets/css/custom.css'])
.pipe(concat('stylesheet.css'))
.pipe(cleanCss())
.pipe(rev())
.pipe(gulp.dest('public/build/css'))
.pipe(rev.manifest())
.pipe(gulp.dest('public/build'));
});

gulp.task('default', ['pack-js', 'pack-css']);
With proper cache busting in place, you can go nuts with long expiry time for your JS and CSS files and reliably replace them still with newer versions whenever necessary.
Let’s delete the contents of our build directory and run gulp again. We will find that we now have two files with hash tags affixed to each of the filenames, and a manifest.json saved to public/build. If we open the manifest file, we will see that it only has a reference to one of our minified and tagged files. What is happening is that each task writes a separate manifest file, and one of them ends up overwriting the other. We will need to modify the tasks with additional parameters that will tell them to look for the existing manifest file, and to merge the new data into it if it exists. The syntax for that is a bit complicated, so let’s look at what the code should look like and then go over it:

var gulp = require('gulp');
var concat = require('gulp-concat');
var minify = require('gulp-minify');
var cleanCss = require('gulp-clean-css');
var rev = require('gulp-rev');

gulp.task('default', ['pack-js', 'pack-css']);
We are piping the output to rev.manifest() first. This creates tagged files instead of the files that we had before. We are providing the desired path of our rev-manifest.json, and telling rev.manifest() to merge into the existing file, if it exists. Then we are telling gulp to write the manifest to the current directory, which at that point will be public/build. The path issue is due to a bug that is discussed in more detail on GitHub.

We now have automated minification, tagged files, and a manifest file. All of this will allow us to deliver the files more quickly to the user, and bust their cache whenever we make our modifications. There are just two remaining problems though.

The first problem is that if we make any modifications to our source files, we will get new tagged files, but the old ones will remain there as well. We need some way to automatically delete old minified files. Let’s solve this problem using a plugin that will allow us to delete files:

npm install --save-dev del
We will require it in our code and define two new tasks, one for each type of source file:

var del = require('del');

gulp.task('clean-js', function () {
return del([
'public/build/js/*.js'
]);
});

gulp.task('clean-css', function () {
return del([
'public/build/css/*.css'
]);
});
We will then make sure that the new task finishes running before our two main tasks:

gulp.task('pack-js', ['clean-js'], function () {
gulp.task('pack-css', ['clean-css'], function () {
If we run gulp again after this modification, we will have just the latest minified files.

The second problem is that we don’t want to keep running gulp every time we make a change. To solve this, we will need to define a watcher task:

gulp.task('watch', function() {
gulp.watch('assets/js//*.js', ['pack-js']);
gulp.watch('assets/css//*.css', ['pack-css']);
});
We will also change the definition of our default task:

gulp.task('default', ['watch']);
If we now run gulp from the command line, we will find that it no longer builds anything upon invocation. This is because it now calls the watcher task that will watch our source files for any changes, and build only when it detects a change. If we try changing any of our source files, and then look at our console again, we will see that the pack-js and pack-css tasks run automatically along with their dependencies.

Now, all we have to do is load the manifest.json file in our application and get the tagged filenames from that. How we do that depends on our particular back-end language and technology stack, and would be quite trivial to implement, so we will not go over it in detail. However, the general idea, is that we can load the manifest into an array or an object, and then define a helper function that will allow us to call versioned assets from our templates in a manner similar to the following:

gulp(‘bundle.js’)
Once we do that, we will not have to worry about changed tags in our filenames ever again, and we will be able to focus on writing high quality code.

The final source code for this article, along with a few sample assets, can be found in this GitHub repository.

Conclusion

In this article, we went over how to implement Gulp based automation for our build process. I hope that this proves helpful to you and allows you to develop more sophisticated build processes in your own applications.

Please keep in mind that Gulp is just one of the tools that can be used for this purpose, and there are many others such as Grunt, Browserify, and Webpack. They vary in their purposes and in the scope of problems that they can solve. Some can solve problems that Gulp cannot, such as bundling JavaScript modules with dependencies that can be loaded on demand. This is referred to as “code splitting”, and it is an improvement over the idea of serving one big file with all parts of our program on every page. These tools are quite sophisticated, but might be covered in the future. In a following post, we will go over how to automate the deployment of our application.

This article is from https://www.toptal.com/javascript/optimize-js-and-css-with-gulp

UX Testing For The Masses: Keep It Simple And Cost Effective

Irina M. Papuc — Fri, 24 Jun 2016 10:31:55 GMT

User experience design (UXD or UED) is the process of enhancing user satisfaction by improving the usability, accessibility, and pleasure provided in the interaction between the user and the product.

This nicely encapsulates what the design part is all about, but what about the other equally important facet of UX, the testing process? The former can be self-taught, at least to a degree. The latter can be considered as one of the more misunderstood, but ultimately necessary steps in UX design. It has to be effective and involve the most important people – your users/customers.

For the UX guru-in-training, testing can be a difficult and overwhelming topic to approach initially, due to its sheer scale and the diverse directions it can take. This can sometimes be confusing and misleading, depending on which area you wish to focus on and what your professional background is.

For the sake of this article, we’ll approach UX testing from the aspect of a web/app designer who wishes to extend their UI design skills and better understand the core User Centered Design (UCD) approach to an application that should take place before Photoshop or Axure are even powered up.

Understanding User Centered Design (UCD)

Before we proceed to testing, let’s start by explaining the basic concept behind UCD.

UCD places the user first in the design and development cycle of an application/website. UCD is based around an understanding of the application’s environment, tasks, and its users. It then addresses the complete user experience as a whole.

What this basically means is that the entire design process involves real users throughout, in order to ensure the end product meets its initial brief requirement as fully as possible.

To sum up the process in its most basic form (there are many variations of UCD), the following phases are as follows:

Context of use: Identify who will use the product and what they will use it for, and under what conditions they intend to use it.
Requirements: Identify any business requirements or user goals that must be met for the product to be successful.
Design solutions: This part of the process may be done in stages, building from a rough concept to a complete design through a number of iterations.
Evaluation of designs: Ideally through usability testing with actual users. This step is just as important for UCD as quality testing is to good software development.
Some of the techniques and methods used in UCD are:

Card Sorting

Card sorting can offer useful insight at the UX Design/Design stage.

Card sorting involves participants being given an unsorted group of cards, each card has a statement on it relating to a page or section of the website. The participants are then asked to sort the cards into groups and name them.

Card sorting is a simple and effective way of testing your UX designs on a range of different subjects.

This is usually a great way of learning what your website navigation and content structure should look like, and how they should work in a way that’s logical to your intended user base.

Usability Testing Session

A usability testing session involves collecting data from a group as they use the website/interactive prototypes. It usually comes at a relatively high cost, because it involves a lot of human interaction and legwork.

What does a usability testing session look like? People are invited to attend a session during which they will be asked to perform a series of tasks on the website, while you or the moderator takes notes. The user will often be asked to fill in a questionnaire at the end of the test, to ascertain how difficult it was to perform certain tasks, such as buy a product on an e-commerce site from a specific category page and proceed to checkout.

This type of testing is usually reserved for high-end interactive prototypes or interactive wireframes. It is a great way of gathering data on the most common issues real-world users will encounter.

Focus Groups

Focus group testing is more or less self-explanatory. It involves asking focus group members (which could be site users or the intended target audience) being asked a series of questions related to the website, and being encouraged to share their thoughts and feelings on different related areas of the site design/wireframes.

UX tests involving user groups and questionnaires can cover a broad demographic, but both come with trade-offs.

It’s normally a good idea to have an experienced moderator during such a group session to ensure accurate notes are taken. Additionally, a good moderator should be able to identify the telltale signs of groupthink, and make sure that the whole process is not negatively affected by group dynamics.

Questionnaires

Questionnaires can be a great way of generating invaluable solid statistical data – providing the right questions are asked.

A questionnaire can be particularly useful when you want to collect a much more varied cross-section of data than could be achieved through a small focus group. It can also be argued that people tend to be more honest without the immediate pressure of being in a small user group.

The risk of groupthink is averted, so individuals will make their own decisions.

Testing on a Tight Budget or Timescale

Don’t worry, none of these processes are set in stone. In case you are forced to operate on a tight budget or cut corners to meet a hard deadline, there are ways of streamlining the process without sacrificing too much.

If you have to UX test on a tight budget or on short notice, you will have to cut corners and think outside the box.

For example, you could organize part of these processes differently, or merge them together and use your friends and family as test subjects if needs be. What is important is that you are actively seeking involvement, feedback, and constructive criticism on the processes you design from other people.

If your budget and schedule won’t allow you to do everything you had in mind, you need to think outside the box and come up with new ways of obtaining usable test results. While this approach involves some tradeoffs, you should still be able to get a lot of actionable information from your test subjects.

UX Testing Methods for Beginners

So you’ve researched, planned, strategized and implemented a working website/landing page/app. But how do you know it is actually fulfilling its potential and justifying all the hours of research? All those questionnaires and card sorting, all your effort and money spent? How do we quantify the results of UX testing and research?

Here are a few useful services that are essential for UX testing at all levels. Some of these services are free, while others are not.

Either way, the following tools are invaluable for gathering real data on your website.

Crazy Egg

Crazy Egg allows you to find out how users are using your website and where they are clicking. The service is not free but offers a great 30-day free trial. If your piggy bank is empty and you’re running out of time, you can at least use the trial period to polish one project.

In my opinion, it is well worth signing up and trying a few websites to get great insight into what clients are doing on your website. Crazy Egg utilizes heatmaps to show you where all the action takes place on your site, allowing you to find out what works and what doesn’t.

Heatmaps tend to be an invaluable addition to the UX design process. In this context, we are talking about mousetracking heatmaps rather than eyetracking heatmaps, which are far more advanced and not as easy to come by. Most designers, especially freelancers and designers working for small businesses with limited budgets, are likely to be restricted to moustracking heatmaps. While eyetracking heatmaps can give a lot more insight into how users perceive your site before they take action, they simply aren’t an option for most designers and developers.

Optimizely

Optimizely allows you to conduct effective and in-depth A/B testing, yet it’s relatively easy to use.

A/B testing involves comparing two versions of a page to find out which one performs better. Two pages are shown to similar visitors at the same time, and the page with the better conversion rate is the more effective page.

A/B testing can sometimes mean the difference between a campaign’s failure and success.

The downside? Optimizely is not free, although you can use the Starter plan free of charge. However, this free plan lacks a lot of features compared to the Enterprise plan. Pricing may be a problem for independent designers and small outfits.

Google Analytics

A service that hardly needs an introduction, Google Analytics is one of the most in-depth tools available and is currently the most widely used web analytics service in the world. In a nutshell, Google Analytics allows you to do click testing and drop off points.

Integrated with Adwords, it allows you to track landing page conversions (from ad clicks) and view in-depth information about your website’s traffic.

Google is constantly improving the service and tends to add new features on a regular basis. For example, it recently introduced Data Studio 360, which is an elaborate data visualization and reporting platform designed for enterprise users. It was soon joined by a free version of Data Studio, designed for individuals and small companies.

Measuring Success

With an effective UCD approach, all results from the above tools and approaches will allow you to set realistic goals to improve your website and user experience. Acting on the feedback from drop-offs, user complaints, and A/B testing can lead to:

Increased website traffic.
Increased sales/performance.
Increased return visitors.
Improved usability and ease of use of your website.
Reduced future development costs.
Go full circle with your UX tests. Incorporate your findings in your project and re-test when possible.

Go full circle with your UX tests. Incorporate your findings in your project and re-test when possible.

Truly effective websites are not only concerned with how easy it is to perform tasks accurately and quickly, but also how enjoyable the user experience is. A good user experience should encourage return business.

With website development, you rarely (if ever) get it right first time around. That’s why it’s vital to set aside time and resources for research and subsequent testing. While both require planning and resources, they can make a big difference on the outcome of your project, and therefore tend to be a worthwhile investment.

Taking Things a Step Further

There has been much written over the last decade or so about the importance of UX testing and usability, and it can be a difficult subject.

If you are just getting started, I would definitely recommend attending a seminar or course in your area (hopefully your employer will see the value and can pay for it!). Thankfully, as interest in UX is picking up, there is also an increasing number of great free resources, and many of them are geared towards the beginner UX designer.

UX Testing Websites and Blogs

Here are a few valuable online sources of information and inspiration:

UserTesting
The Usability Post
UX Booth
UX Magazine
UX Apprentice
MeasuringU
Designmodo
UX Myths

These resources are more than sufficient to get you started, but if you prefer a more hands-on approach, UX courses and seminars are a good place to take your training to the next level. Naturally, real world projects are the ultimate crucible. They will help you polish your skills and gain a much better understanding of the process from start to finish, and allow you to streamline your UX testing to conserve time and money while obtaining useful results.

This article is from https://www.toptal.com/designers/ux/ux-testing-for-the-masses

Clean Code and The Art of Exception Handling

Irina M. Papuc — Thu, 19 May 2016 08:19:31 GMT

Exceptions are as old as programming itself. Back in the days when programming was done in hardware, or via low-level programming languages, exceptions were used to alter the flow of the program, and to avoid hardware failures. Today, Wikipedia defines exceptions as:

anomalous or exceptional conditions requiring special processing – often changing the normal flow of program execution…

And that handling them requires:

specialized programming language constructs or computer hardware mechanisms.

So, exceptions require special treatment, and an unhandled exception may cause unexpected behavior. The results are often spectacular. In 1996, the famous Ariane 5 rocket launch failure was attributed to an unhandled overflow exception. History’s Worst Software Bugs contains some other bugs that could be attributed to unhandled or miss-handled exceptions.

Over time, these errors, and countless others (that were, perhaps, not as dramatic, but still catastrophic for those involved) contributed to the impression that exceptions are bad.

The results of improperly handling exceptions have led us to believe that exceptions are always bad.

But exceptions are a fundamental element of modern programming; they exist to make our software better. Rather than fearing exceptions, we should embrace them and learn how to benefit from them. In this article, we will discuss how to manage exceptions elegantly, and use them to write clean code that is more maintainable.

Exception Handling: It’s a Good Thing

With the rise of object-oriented programming (OOP), exception support has become a crucial element of modern programming languages. A robust exception handling system is built into most languages, nowadays. For example, Ruby provides for the following typical pattern:

begin
do_something_that_might_not_work!
rescue SpecificError => e
do_some_specific_error_clean_up
retry if some_condition_met?
ensure
this_will_always_be_executed
end
There is nothing wrong with the previous code. But overusing these patterns will cause code smells, and won’t necessarily be beneficial. Likewise, misusing them can actually do a lot of harm to your code base, making it brittle, or obfuscating the cause of errors.

The stigma surrounding exceptions often makes programmers feel at a loss. It’s a fact of life that exceptions can’t be avoided, but we are often taught they must be dealt with swiftly and decisively. As we will see, this is not necessarily true. Rather, we should learn the art of handling exceptions gracefully, making them harmonious with the rest of our code.

Following are some recommended practices that will help you embrace exceptions and make use of them and their abilities to keep your code maintainable, extensible, and readable:

maintainability: Allows us to easily find and fix new bugs, without the fear of breaking current functionality, introducing further bugs, or having to abandon the code altogether due to increased complexity over time.
extensibility: Allows us to easily add to our code base, implementing new or changed requirements without breaking existing functionality. Extensibility provides flexibility, and enables a high level of reusability for our code base.
readability: Allows us to easily read the code and discover it’s purpose without spending too much time digging. This is critical for efficiently discovering bugs and untested code.
These elements are the main factors of what we might call cleanliness or quality, which is not a direct measure itself, but instead is the combined effect of the previous points, as demonstrated in this comic:

"WTFs/m" by Thom Holwerda, OSNews

With that said, let’s dive into these practices and see how each of them affects those three measures.

Note: We will present examples from Ruby, but all of the constructs demonstrated here have equivalents in the most common OOP languages.

Always create your own ApplicationError hierarchy

Most languages come with a variety of exception classes, organized in an inheritance hierarchy, like any other OOP class. To preserve the readability, maintainability, and extensibility of our code, it’s a good idea to create our own subtree of application-specific exceptions that extend the base exception class. Investing some time in logically structuring this hierarchy can be extremely beneficial. For example:

class ApplicationError < StandardError; end

Validation Errors

class ValidationError < ApplicationError; end
class RequiredFieldError < ValidationError; end
class UniqueFieldError < ValidationError; end

HTTP 4XX Response Errors

class ResponseError < ApplicationError; end
class BadRequestError < ResponseError; end
class UnauthorizedError < ResponseError; end

...

Example of an application exception hierarchy.

Having an extensible, comprehensive exceptions package for our application makes handling these application-specific situations much easier. For example, we can decide which exceptions to handle in a more natural way. This not only boosts the readability of our code, but also increases the maintainability of our applications and libraries (gems).

From the readability perspective, it’s much easier to read:

rescue ValidationError => e
Than to read:

rescue RequiredFieldError, UniqueFieldError, ... => e
From the maintainability perspective, say, for example, we are implementing a JSON API, and we have defined our own ClientError with several subtypes, to be used when a client sends a bad request. If any one of these is raised, the application should render the JSON representation of the error in its response. It will be easier to fix, or add logic, to a single block that handles ClientErrors rather than looping over each possible client error and implementing the same handler code for each. In terms of extensibility, if we later have to implement another type of client error, we can trust it will already be handled properly here.

Moreover, this does not prevent us from implementing additional special handling for specific client errors earlier in the call stack, or altering the same exception object along the way:

app/controller/pseudo_controller.rb

def authenticate_user!
fail AuthenticationError if token_invalid? || token_expired?
User.find_by(authentication_token: token)
rescue AuthenticationError => e
report_suspicious_activity if token_invalid?
raise e
end

def show
authenticate_user!
show_private_stuff!(params[:id])
rescue ClientError => e
render_error(e)
end
As you can see, raising this specific exception didn’t prevent us from being able to handle it on different levels, altering it, re-raising it, and allowing the parent class handler to resolve it.

Two things to note here:

Not all languages support raising exceptions from within an exception handler.
In most languages, raising a new exception from within a handler will cause the original exception to be lost forever, so it’s better to re-raise the same exception object (as in the above example) to avoid losing track of the original cause of the error. (Unless you are doing this intentionally).
Never rescue Exception

That is, never try to implement a catch-all handler for the base exception type. Rescuing or catching all exceptions wholesale is never a good idea in any language, whether it’s globally on a base application level, or in a small buried method used only once. We don’t want to rescue Exception because it will obfuscate whatever really happened, damaging both maintainability and extensibility. We can waste a huge amount of time debugging what the actual problem is, when it could be as simple as a syntax error:

main.rb

def bad_example
i_might_raise_exception!
rescue Exception
nah_i_will_always_be_here_for_you
end

elsewhere.rb

def i_might_raise_exception!
retrun do_a_lot_of_work!
end
You might have noticed the error in the previous example; return is mistyped. Although modern editors provide some protection against this specific type of syntax error, this example illustrates how rescue Exception does harm to our code. At no point is the actual type of the exception (in this case a NoMethodError) addressed, nor is it ever exposed to the developer, which may cause us to waste a lot of time running in circles.

Never rescue more exceptions than you need to

The previous point is a specific case of this rule: We should always be careful not to over-generalize our exception handlers. The reasons are the same; whenever we rescue more exceptions than we should, we end up hiding parts of the application logic from higher levels of the application, not to mention suppressing the developer’s ability to handle the exception his or herself. This severely affects the extensibility and maintainability of the code.

If we do attempt to handle different exception subtypes in the same handler, we introduce fat code blocks that have too many responsibilities. For example, if we are building a library that consumes a remote API, handling a MethodNotAllowedError (HTTP 405), is usually different from handling an UnauthorizedError (HTTP 401), even though they are both ResponseErrors.

As we will see, often there exists a different part of the application that would be better suited to handle specific exceptions in a more DRY way.

So, define the single responsibility of your class or method, and handle the bare minimum of exceptions that satisfy this responsibility requirement. For example, if a method is responsible for getting stock info from a remote a API, then it should handle exceptions that arise from getting that info only, and leave the handling of the other errors to a different method designed specifically for these responsibilities:

def get_info
begin
response = HTTP.get(STOCKS_URL + "#{@symbol}/info")

fail AuthenticationError if response.code == 401
fail StockNotFoundError, @symbol if response.code == 404
return JSON.parse response.body

rescue JSON::ParserError
retry
end
end
Here we defined the contract for this method to only get us the info about the stock. It handles endpoint-specific errors, such as an incomplete or malformed JSON response. It doesn’t handle the case when authentication fails or expires, or if the stock doesn’t exist. These are someone else’s responsibility, and are explicitly passed up the call stack where there should be a better place to handle these errors in a DRY way.

Resist the urge to handle exceptions immediately

This is the complement to the last point. An exception can be handled at any point in the call stack, and any point in the class hierarchy, so knowing exactly where to handle it can be mystifying. To solve this conundrum, many developers opt to handle any exception as soon as it arises, but investing time in thinking this through will usually result in finding a more appropriate place to handle specific exceptions.

One common pattern that we see in Rails applications (especially those that expose JSON-only APIs) is the following controller method:

app/controllers/client_controller.rb

def create
@client = Client.new(params[:client])
if @client.save
render json: @client
else
render json: @client.errors
end
end
(Note that although this is not technically an exception handler, functionally, it serves the same purpose, since @client.save only returns false when it encounters an exception.)

In this case, however, repeating the same error handler in every controller action is the opposite of DRY, and damages maintainability and extensibility. Instead, we can make use of the special nature of exception propagation, and handle them only once, in the parent controller class, ApplicationController:

app/controllers/client_controller.rb

def create
@client = Client.create!(params[:client])
render json: @client
end

app/controller/application_controller.rb

rescue_from ActiveRecord::RecordInvalid, with: :render_unprocessable_entity

def render_unprocessable_entity(e)
render
json: { errors: e.record.errors },
status: 422
end
This way, we can ensure that all of the ActiveRecord::RecordInvalid errors are properly and DRY-ly handled in one place, on the base ApplicationController level. This gives us the freedom to fiddle with them if we want to handle specific cases at the lower level, or simply let them propagate gracefully.

Not all exceptions need handling

When developing a gem or a library, many developers will try to encapsulate the functionality and not allow any exception to propagate out of the library. But sometimes, it’s not obvious how to handle an exception until the specific application is implemented.

Let’s take ActiveRecord as an example of the ideal solution. The library provides developers with two approaches for completeness. The save method handles exceptions without propagating them, simply returning false, while save! raises an exception when it fails. This gives developers the option of handling specific error cases differently, or simply handling any failure in a general way.

But what if you don’t have the time or resources to provide such a complete implementation? In that case, if there is any uncertainty, it is best to expose the exception, and release it into the wild.

Sometimes the best way to handle an exception is to let it fly free.

Here’s why: We are working with moving requirements almost all the time, and making the decision that an exception will always be handled in a specific way might actually harm our implementation, damaging extensibility and maintainability, and potentially adding huge technical debt, especially when developing libraries.

Take the earlier example of a stock API consumer fetching stock prices. We chose to handle the incomplete and malformed response on the spot, and we chose to retry the same request again until we got a valid response. But later, the requirements might change, such that we must fall back to saved historical stock data, instead of retrying the request.

At this point, we will be forced to change the library itself, updating how this exception is handled, because the dependent projects won’t handle this exception. (How could they? It was never exposed to them before.) We will also have to inform the owners of projects that rely on our library. This might become a nightmare if there are many such projects, since they are likely to have been built on the assumption that this error will be handled in a specific way.

Now, we can see where we are heading with dependencies management. The outlook is not good. This situation happens quite often, and more often than not, it degrades the library’s usefulness, extensibility, and flexibility.

So here is the bottom line: if it is unclear how an exception should be handled, let it propagate gracefully. There are many cases where a clear place exists to handle the exception internally, but there are many other cases where exposing the exception is better. So before you opt into handling the exception, just give it a second thought. A good rule of thumb is to only insist on handling exceptions when you are interacting directly with the end-user.

Follow the convention

The implementation of Ruby, and, even more so, Rails, follows some naming conventions, such as distinguishing between method_names and method_names! with a “bang.” In Ruby, the bang indicates that the method will alter the object that invoked it, and in Rails, it means that the method will raise an exception if it fails to execute the expected behavior. Try to respect the same convention, especially if you are going to open-source your library.

If we were to write a new method! with a bang in a Rails application, we must take these conventions into account. There is nothing forcing us to raise an exception when this method fails, but by deviating from the convention, this method may mislead programmers into believing they will be given the chance to handle exceptions themselves, when, in fact, they will not.

Another Ruby convention, attributed to Jim Weirich, is to use fail to indicate method failure, and only to use raise if you are re-raising the exception.

“An aside, because I use exceptions to indicate failures, I almost always use the “fail” keyword rather than the “raise” keyword in Ruby. Fail and raise are synonyms so there is no difference except that “fail” more clearly communicates that the method has failed. The only time I use “raise” is when I am catching an exception and re-raising it, because here I’m not failing, but explicitly and purposefully raising an exception. This is a stylistic issue I follow, but I doubt many other people do.”

Many other language communities have adopted conventions like these around how exceptions are treated, and ignoring these conventions will damage the readability and maintainability of our code.

Logger.log(everything)

This practice doesn’t solely apply to exceptions, of course, but if there’s one thing that should always be logged, it’s an exception.

Logging is extremely important (important enough for Ruby to ship a logger with its standard version). It’s the diary of our applications, and even more important than keeping a record of how our applications succeed, is logging how and when they fail.

There is no shortage of logging libraries or log-based services and design patterns. It’s critical to keep track of our exceptions so we can review what happened and investigate if something doesn’t look right. Proper log messages can point developers directly to the cause of a problem, saving them immeasurable time.

That Clean Code Confidence

Proper exception handling allows for clean code and successful software.

Clean exception handling will send your code quality to the moon!

Exceptions are a fundamental part of every programming language. They are special and extremely powerful, and we must leverage their power to elevate the quality of our code instead of exhausting ourselves fighting with them.

In this article, we dived into some good practices for structuring our exception trees and how it can be beneficial for readability and quality to logically structure them. We looked at different approaches for handling exceptions, either in one place or on multiple levels.

We saw that it’s bad to “catch ‘em all”, and that it’s ok to let them float around and bubble up.

We looked at where to handle exceptions in a DRY manner, and learned that we are not obligated to handle them when or where they first arise.

We discussed when exactly it is a good idea to handle them, when it’s a bad idea, and why, when in doubt, it’s a good idea to let them propagate.

Finally, we discussed other points that can help maximize the usefulness of exceptions, such as following conventions and logging everything.

With these basic guidelines, we can feel much more comfortable and confident dealing with error cases in our code, and making our exceptions truly exceptional!

Special thank to Avdi Grimm and his awesome talk Exceptional Ruby, which helped a lot in the making of this article.

This article was written by Ahmed Abdelrazzak, a Toptal SQL developer and can be read on https://www.toptal.com/qa/clean-code-and-the-art-of-exception-handling

More SQL resources can be found on https://www.toptal.com/sql

( 0 || something ) in Javascript - "Logical Or" Bug

Mohamed Osama — Mon, 16 May 2016 01:18:00 GMT

Today I remembered something that bothered me during my first days of writing javascript. A line of code that created a bug that took me a long time to detect thinking that this "buggy" line could never be the reason for the problem.

Here's a simple example to demonstrate the dilemma I had.

function incrementCounter(input, override) {
    return override || ++input || 0;
}

This method increments a certain input integer, tries to increment it or return 0 if NaN. If an override value is provided then it returns this value.

Let's take a look at some outputs with different inputs.

incrementCounter() returns 0
incrementCounter(1) returns 2
incrementCounter(1, 5) returns 5

Everything is fine and according to plan until now.

But...

incrementCounter(1, 0) returns 2

Yes 2 ! not 0. Why is that?

In javascript, zero evaluates to false when using logical or || operations. That means 0 || 'anything' will return 'anything'. For those coming from Ruby background, this makes no sense at all. In Ruby, this will return 0.

That is a mistake that some developers could fall into and take time to debug.

To make the piece of code above work correctly, we need to separate the comparison to 0 so that we return it when specified as the override.

function incrementCounter(input, override) {
    if(override == 0) { return 0; }
    return override || ++input || 0;
}

This exactly returns the expected output.

Sequelize: ActiveRecord for NodeJS

Mohamed Osama — Mon, 16 May 2016 00:52:00 GMT

If you are a Ruby on Rails developer, you probably love ActiveRecord that it is too hard for you to imagine using another ORM.

Unfortunately, ActiveRecord is only found in Rails. Luckily for NodeJS developers, there is an awesome alternative which is Sequelize.

Sequelize is a promise-based Node.js ORM for Postgres, MySQL, MariaDB, SQLite and Microsoft SQL Server. It features solid transaction support, relations, read replication and more.

An avid ActiveRecord user myself, I can tell that Sequelize and its CLI are heavily inspired by ActiveRecord.

Migrations

rake db:migrate

or in Rails 5

rails db:migrate

Guess what! With Sequelize CLI it is pretty much the same

sequelize db:migrate

Associations
One-To-One and One-To-Many associations

# AR implementation
class Player < ActiveRecord::Base
   belongs_to :team
   has_one :jersey
end

// Sequelize implementation
var Player = this.sequelize.define('player', {})
  , Team  = this.sequelize.define('team', {})
  , Jersey = this.sequelize.define('jersey', {})
  , Document = this.sequelize.define('document', {});

Player.belongsTo(Team);
Player.hasOne(Jersey);
Player.hasMany(Document, {as: 'Contracts'})

Scopes

Scopes are that easy in ActiveRecord.

class Player < ActiveRecord::Base
   scope :retired, -> { where(retired: true) }
end

Well, also for Sequelize

var Project = sequelize.define('project', {},
  {
  defaultScope: {
    where: {
      active: true
    }
  },
  scopes: {
    retired: {
      where: {
        retired: true
      }
    }
  }
});

Validations

Sequelize's validations are implemented using validator.js. They are very extensive and useful. Here's an example of a simple User model with 3 attributes; username, email and website.

var User = sequelize.define('user', {
  username: {
    type: Sequelize.STRING,
    validate: {
      is: /^[a-z]+$/i,
      isLowercase: true,
      notNull: true
    }
  },
  email: {
    type: Sequelize.STRING,
    validate: {
      isEmail: true
    }
  },
  website: {
    type: Sequelize.STRING,
    validate: {
      isUrl: true
    }
  }
});

For the full list of validations, view the docs.

Querying

You can select only certain attributes in ActiveRecord, for example.

User.select(:name, :email)

In Sequelize, this is how you do it.

User.findAll({
  attributes: ['name', 'email']
});

Where

This is how you query for users assigned to project with id 5 in ActiveRecord

    User.where(project_id: 5)

I can barely see any difference!

    User.findAll({
      where: {
        projectId: 5
      }
    });

You can also use findOne which is equivalent to User.find_by(project_id: 5), it returns only one record matching the criteria.

Get a better insight about the whole sequelize query interface by browsing through querying docs.

Conslusion

I have been using sequelize for a few months now and it is a robust ORM. It still needs some work on handling some minor issues and bugs but in general it is quite reliable.

Contribute to sequelize by submitting PR's to their github repository or add some awesome features to the cli.

Business Intelligence Platform: Tutorial Using MongoDB Aggregation Pipeline

Irina M. Papuc — Wed, 11 May 2016 07:55:08 GMT

Using data to answer interesting questions is what researchers are busy doing in today’s data driven world. Given huge volumes of data, the challenge of processing and analyzing it is a big one; particularly for statisticians or data analysts who do not have the time to invest in learning business intelligence platforms or technologies provided by Hadoop eco-system, Spark, or NoSQL databases that would help them to analyze terabytes of data in minutes.

The norm today is for researchers or statisticians to build their models on subsets of data in analytics packages like R, MATLAB, or Octave, and then give the formulas and data processing steps to IT teams who then build production analytics solutions.

One problem with this approach is that if the researcher realizes something new after running his model on all of the data in production, the process has to be repeated all over again.

What if the researcher could work with a MongoDB developer and run his analysis on all of the production data and use it as his exploratory dataset, without having to learn any new technology or complex programming languages, or even SQL?

mongodb and business intelligence

If we use MongoDB’s Aggregation Pipeline and MEAN effectively we can achieve this in a reasonably short time. Through this article and the code that is available here in this GitHub repository, we would like to show how easy it is to achieve this.

Most of the Business Intelligence tools that are on the market are providing ways for researchers to import datasets from NoSQL and other Big Data technologies into the tool, then the transformations and analysis are done inside the tool. But in this business intelligence tutorial we are using the power of MongoDB Aggregation Pipeline without pulling the data out of MongoDB, and the researcher is using a simple interface to do all kinds of transformations on a production big data system.

MongoDB Aggregation Pipeline for Business Intelligence

Simply put, MongoDB’s aggregation pipeline is a framework to perform a series of data transformations on a dataset. The first stage takes the entire collection of documents as input, and from then on each subsequent stage takes the previous transformation’s result set as input and produces some transformed output.

There are 10 types of transformations that can be used in an aggregation pipeline:

$geoNear: outputs documents in order of nearest to farthest from a specified point
$match: filters input record set by any given expressions
$project: creates a resultset with a subset of input fields or computed fields
$redact: restricts the contents of the documents based on information from the document
$unwind: takes an array field with n elements from a document and returns n documents with each element added to each document as a field replacing that array
$group: groups by one or more columns and perform aggregations on other columns
$limit: picks first n documents from input sets (useful for percentile calculations, etc.)
$skip: ignores first n documents from input set
$sort: sorts all input documents as per the object given
$out: takes all the documents returned from previous stage and writes them to a collection
Except for the first and last in the list above, there are no rules about the order in which these transformations may be applied. $out should be used only once, and at the end, if we want to write the result of the aggregation pipeline to a new or existing collection. $geoNear can be used only as the first stage of a pipeline.

In order to make things easier to understand, let us walk through two datasets and two questions relevant to these datasets.

Difference in Salaries by Designation

In order to explain the power of MongoDB’s aggregation pipeline, we have downloaded a dataset which has salary information of university instructional staff for the entire US. This data is available at nces.ed.gov. We have data from 7598 institutions with the following fields:

var FacultySchema = mongoose.Schema({
InstitutionName : String,
AvgSalaryAll : Number,
AVGSalaryProfessors : Number,
AVGSalaryAssociateProfessors : Number,
AVGSalaryAssistantProfessors : Number,
AVGSalaryLecturers : Number,
AVGSalaryInstructors : Number,
StreetAddress : String,
City : String,
State : String,
ZIPCode : String,
MenStaffCount : Number,
WomenStaffCount : Number
}
With this data we want to find out (on average) what the difference is between salaries of associate professors and professors by state. Then, an associate professor can realize in which state he is valued closer to a professor in terms of salary.

To answer this question, a researcher first needs to weed out bad data from the collection, because there are a few rows/documents in our dataset where the average salary is a null or empty string. To accomplish this cleaning of the dataset we will add the following stage:

{$match: {AVGSalaryProfessors: {$not: {$type: 2}}, AVGSalaryAssociateProfessors: {$not: {$type: 2}}}}
This will filter out all the entities which have string values in those two fields. In MongoDB, each type is represented with a unique number - for strings, the type number is 2.

This dataset is a good example because in real world data analytics, engineers often have to deal with data cleanups as well.

Now that we have some stable data, we can continue to the next stage where we will average the salaries by state:

{$group: {_id: "$State", StateAVGSalaryProfessors: {$avg: "$AVGSalaryProfessors"}, StateAVGSalaryAssociateProfessors: {$avg: "$AVGSalaryAssociateProfessors"}}}
We just need to run a projection of the above result set and get the difference in state average salaries, as shown below in Stage 3 of our pipeline:

{$project: {_ID: 1, SalaryDifference: {$subtract: ["$StateAVGSalaryProfessors", "$StateAVGSalaryAssociateProfessors"]}}}
This should give us the state level average salary difference between professors and associate professors from a dataset of 7519 educational institutions all over US. To make it even more convenient to interpret this information, let us do a simple sort so we know which state has the least difference by adding a $sort stage:

{$sort: { SalaryDifference: 1}}
From this dataset, it is apparent that Idaho, Kansas, and West Virginia are three states where the difference in salaries of associate professors and professors is the least compared to all the other states.

The full aggregation pipeline generated for this is shown below:

[
{$match: {AVGSalaryProfessors: {$not: {$type: 2}}, AVGSalaryAssociateProfessors: {$not: {$type: 2}}}},
{$group: {_id: "$State", StateAVGSalaryProfessors: {$avg: "$AVGSalaryProfessors"}, StateAVGSalaryAssociateProfessors: {$avg: "$AVGSalaryAssociateProfessors"}}},
{$project: {_ID: 1, SalaryDifference: {$subtract: ["$StateAVGSalaryProfessors", "$StateAVGSalaryAssociateProfessors"]}}},
{$sort: { SalaryDifference: 1}}
]
The resulting dataset that shows up looks like this. Researchers can also export these results to CSV in order to report on it using visualization packages like Tableau, or through simple Microsoft Excel charts.

mongodb dataset example

Average Pay by Employment Type

Another example that we will explore in this article involves a dataset obtained from www.data.gov. Given the payroll information of all state and local government organizations in the United States of America, we would like to figure out the average pay of full-time and part-time “Financial Administration” employees in each state.

The dataset has been imported, resulting in 1975 documents where each document follows this schema:

mongoose.Schema({
State : String,
GovernmentFunction : String,
FullTimeEmployees : Number,
VariationPCT : Number,
FullTimePay : Number,
PartTimeEmployees : Number,
PartTimePay : Number,
PartTimeHours : Number,
FullTimeEquivalentEmployment : Number,
TotalEmployees : Number,
TotalMarchPay : Number
}, {collection: 'payroll'});
The answer to this question may help a Financial Administration employee to choose the best state to move to. With our MongoDB aggregator pipeline based tool, this can be done quite easily:

In the first stage, filter on GovernmentFunction column to discard all non-”Financial Administration” entities:

{$match:{GovernmentFunction:'Financial Administration'}}
In the next stage of the tutorial, we will group the entities by state and calculate the average full time and part time salaries in each state:

{$group: {_id: '$State', FTP_AVG: {$avg: '$FullTimePay'}, PTM_AVG: {$avg: '$PartTimePay'}}}
Finally, we will sort the results from higher paying states to lower paying states:

{$sort: {FTP_AVG: -1, PTM_AVG: -1}}
This should allow the tool to generate the following aggregation pipeline:

[
{$match:{GovernmentFunction:'Financial Administration'}},
{$group: {_id: '$State', FTP_AVG: {$avg: '$FullTimePay'}, PTM_AVG: {$avg: '$PartTimePay'}}},
{$sort: {FTP_AVG: -1, PTM_AVG: -1}}
]
Running the aggregation pipeline should produce some results like this:

mongodb aggregation pipeline

Like what you're reading?Get the latest updates first.

Enter your email address...
Get Exclusive Updates
No spam. Just great engineering and design posts.
Building Blocks

To build this business intelligence application we used MEAN, which is a combination of MongoDB, ExpressJS, AngularJS, and NodeJS.

MEAN Business intelligence

As you may already know, MongoDB is a schemaless document database. Even though each document that it stores is limited to 16MB in size, its flexibility and performance along with the aggregation pipeline framework it provides makes MongoDB a perfect fit for this tool. Getting started with MongoDB is very easy, thanks to its comprehensive documentation.

Node.js, another integral component of the MEAN, provides the event-driven server-side Javascript environment. Node.js runs Javascript using Google Chrome’s V8 engine. The scalability promises of Node.js is what is driving many organizations towards it.

Express.js is the most popular web application framework for Node.js. It makes it easy to build APIs or any other kind of server-side business layer for web applications. It is very fast because of its minimalist nature, but is also quite flexible.

AngularJS, created and maintained by a number of Google engineers, is rapidly becoming one of the most popular front-end Javascript frameworks available at our disposal.

There are two reasons why MEAN is so popular and our choice for application development at techXplorers:

The skillset is simple. An engineer who understands JavaScript is good to go on all layers.
Communication between front-end to business to database layers all happens through JSON objects, which saves us significant time in design and development at different layers.
Conclusion

In this MongoDB aggregation pipeline tutorial we have demonstrated a cost effective way to give researchers a tool where they can use production data as exploratory datasets and run different sets of transformations to analyze and construct models from.

We were able to develop and deploy this application end-to-end in just 3 days. This applicaton was developed by a team of 4 experienced engineers (2 in the US and 2 in India) and a designer and freelance UX expert helping us with some thoughts on interface design. At some point in the future, I will take the time to explain how this level of collaboration works to build awesome products in unbelievably short time.

We hope you take advantage of MongoDB’s Aggregation Pipeline, and put power in the hands of your researchers who can change the world with their clever analysis and insights.

This application is live to be played with here - http://apps.techxplorers.net/analytics/.

The article is found on https://www.toptal.com/mongodb/business-intelligence-platform-using-mongodb-aggregation-pipeline.

More MongoDB resources can be found on https://www.toptal.com/mongodb.

Swap Two Variables in Place

Mohamed Osama — Sat, 16 Jan 2016 12:55:05 GMT

Usually, when a programmer is asked to swap two variables, they will:

Create a temp variable to hold one of their values. temp = x
Assign one to the other x = y
Retrieve temporary variable y = temp

What if you were asked to do without adding a temp variable (i.e. without using any extra space). There is two easy ways to so.

The first way uses addition and subtraction. Here is a java implementation of this method.

public static void swap(int x, int y) {
   x = y - x; 
   y = y - x; 
   x = x + y; 
}

The second way uses the XOR operator. With 3 XOR operations, you can swap values in x and y easily.

public static void XORSwap(int x, int y) {
   x = x ^ y; 
   y = x ^ y; 
   x = x ^ y; 
}

Mongoid Cheat Sheet

Mohamed Osama — Sun, 10 Jan 2016 04:44:27 GMT

MongoDB is the next-generation database that helps businesses transform their industries by harnessing the power of data.

That is how the creators of MongoDB describe their database. And YES, MongoDB is truly amazing. Its write speed is blazing fast but you need to know how to deal with it. I am not a MongoDB expert but I will share my knowledge about Mongoid.

Before we start I would advice you to use Mongoid 5.0.0 or above since they contain a lot of interesting and helpful updates.

The mongoid.yml file should be structured in this way

  development:
    clients:
      default:
        database: my_development
        hosts:
          - localhost:27017

If you were using a Mongoid version that precedes 5.0.0, please change the old sessions to clients as shown above.

Bulk Inserting
To insert multiple documents to a MongoDB collection and you do not care about your model's callbacks, you should use Model.collection.insert_many
For example, If you have a model Company with fields name and revenue, you can bulk insert multiple documents as follows
```
  Company.collection.insert_many({name: "C1", revenue: 1000}, {name: "C2", revenue: 2000}, {name: "C3", revenue: 3000}, {name: "C4", revenue: 4000}, {name: "C5", revenue: 5000})
```

Instead of doing

    Company.create(name: "C1", revenue: 1000)
    Company.create(name: "C2", revenue: 2000)
    Company.create(name: "C3", revenue: 3000)
    Company.create(name: "C4", revenue: 4000)
    Company.create(name: "C5", revenue: 5000)

The later will fire the models' callbacks though!

The difference between Model.destroy_all and Model.delete_all is that destroy_all fires callbacks for each destroyed model while delete_all does not.

Model.collection.delete_many bulk deletes documents from a collection. It can be used as follows

  array_of_names = ["C1", "C2", "C3"]
  Company.collection.delete_many({name: { :$in => array_of_names}})

To be continued and you can ask for anything in the comments section.

How to: SSH Aliases using SSHez

Mohamed Osama — Tue, 05 Jan 2016 00:30:13 GMT

If you are reading this article, probably you have SSH'd into loads of servers and you would like an easy way to do so.

SSHez provides an easy way to set server aliases. SSH itself provides this function through the SSH config file located in ~/.ssh/config but there is no way to add your aliases except edit the file yourself.

For example:

Host rest_api
   HostName rest.example.com
   User oss
   Port 1026

Host db_server
   HostName 142.123.122.11
   User oss

Host ng_client
  HostName angular.example.com
  User oss

This way you can access any of the above servers using ssh hostname (ex: ssh rest_api instead of ssh oss@rest.example.com -p 559)

SSHez

It is a ruby gem that interfaces your config file. All you have to do is:

gem install sshez in your global gemset
Add a server alias sshez oss@rest.example.com -p 1026
Connect to your server any ssh
List your aliases sshez list
Remove an alias sshez remove

It is that easy!

The gem is currently in version 0.3.0, I will update the post whenever it is updated!