8 core components every PHP application should have to scale

Posted on 15:25 by Dion Beetson with No comments

As I find myself working on more and more software applications within different development teams, I'm sometimes amazed at how such large systems are missing some of the most fundamental components. Why they are missing these components is usually due to many reasons, but can include time pressures, developer experience, lack of solution planning.

Below is a list of the core components I really believe should exist in any well architected application. They will help your application scale when the time comes in the near future.

Obviously starting with the most important.

1. Multiple Environment support

An application should support at least the following environments:

Development;
Staging;
Production;

Many applications don't provide this support, but it's such a simple component that will allow you to:

Easily bring on more developers without hacking configs.
Implement new staging environments (eg: for new features, or for new QA teams).
Easily set up a new environment for unit testing.
Puts you in a good position when you are ready to incorporate continuous integration.

2. Decouple your configuration from your application

In any application I like to see decoupling of configuration to your PHP code. You can look into separating your configuration into .ini or .yml (or .xml if you really want to). For example:

config.ini
config_dev.ini
config_prod.ini

Your bootstrap will then load the correct config based on which environment you are running your application from.

3. Use version control

I have to admit, I have never worked with an application that isn't under version control. But I have heard of horror stories about developers manually versioning files like below:

index.html
index1.html
index2.html

I have even heard of developers SSH-ing onto the production servers, opening up vim and making modifications directly onto production! This is called "Cowboy Coding", you wont earn much respect from other software engineers if you are caught doing this :-)

Take this a step further, and have a branching strategy. For example:

What branch is the development stream?
What branch is the bug fix stream?
What branch is used on the staging environment?
What branch mimics production?

I highly recommend GIT (after being converted a few years ago from SVN). It provides local branching, local commits and so much more than SVN.

If you use GIT and are looking for a branching strategy, take a look at this post - our team implemented a similar variation of this very successfully.

http://nvie.com/posts/a-successful-git-branching-model/

4. Have a single repository (or use submodules in GIT, or externals in SVN)

I have (unfortunately) worked on a few applications that utilise 2 or more repositories, and then they require you to create a collection of magical symlinks between them to get the application up and running.

It's actually a very expensive process (time wise) for an engineering team to maintain multiple repositories that rely on each other (I speak from experience here), and they usually cause a lot of code duplication. Try to stick to a single repository, however, if you need to separate your code for reuse (which is great), take advantage of GIT submodules or SVN externals - that's what they are there for.

5. Have an error logging component available

I have always had the mentality that if you cannot log errors (or activities) within a component, then you shouldn't be building it. Being able to monitor errors or user activity, allows you to then improve the component or feature? Although this next section is in relation to error logging, you could quite easily architect the same component to log more than just errors (like user activity tracking).

2 problems I see in PHP applications are:

No centralized logging component.
The over use of error_log().

So what problems can error_log() cause?

Every environment will most likely be storing logs in a different locations (usually based off the OS or its specific apache/php setup). This makes it really hard to tail error logs when you have to always go on the hunt for them.
Some environments will output to screen, some to an error log file - no consistency.
It's hard/hacky to turn off logging for certain scenarios (unit testing, or staging environments) if your using error_log().
It just doesn't scale, what happens when your application goes from 1 server to 3 servers? All of the error logs are on their own individual servers. Then you will need to use additional software to aggregate into a single location.
What if your logging requirements change? You suddenly want to log the users IP, or users ID? How can you intercept the call cleanly to add this information?

Now if we were to implement or have available to us, a single logging component you could do the following:

Log to a centralized location, whether that is a database table, or log server.
You can scale to 'n' number of servers yet still have a centralized logging location.
You create a single entry point to log errors/warnings to:
$logger->logErr('This is an error');
$logger->logWarn('This is a warning');
As all logs go through a single method, you can easily attach more information in the future when required.
If this logging component logs to a database, you can easily make this data available within an admin area. Which can lead to faster error debugging when something goes wrong.

6. Use asynchronous solutions (or a simple queuing component)

No matter what application you work on, there will come a time when you don't want to have to wait for a certain process to finish before returning a response to the user, for example:

Sending an email.
Making a web service call
Running a memory and CPU intense report.

I'm not going to recommend building a complex queuing system. If you need one have a look at:

What I will recommend though, in its simplest form is being able to queue emails to send. When sending an email, your application needs to:

Build the email.
Connect to the email server.
Send the email content.
Then wait for a response.

This could take up to a few seconds - time which you don't want your end user to be waiting.

From a very high level, look at saving this information to a database table, allowing your application to return a response to the user almost instantaneously. Then all you need is a simple cronjob or daemon to monitor the database table and process all of the unprocessed records. If you build this component always try to think about how it could be used in the future for something other than sending emails?

7. Support for multiple database adapters

When your traffic and processing requirements begin to increase within your application you will find your single database may start to struggle under the load.

Now imagine.. When your application was first architected that the software engineer had this in mind and provided 2 (or more) database adapters:

One for writing (to the master)
One (or multiple) for reading (from a read only slave, or even a load balancer sitting in front of many read only slaves).

How easy would it be to suddenly start migrating the heavy read only SQL requests to your read only slave which in turn would dramatically reduce the load on your master database. You could even go one step further by abstracting the multiple adapters away from the application so your library classes automatically direct queries to either the master or slave depending on the query being executed.

Again a very simple component that can really come in handy when you need to start scaling your application quickly.

7. Support translations.

Decoupling your textual content into translation files will give you the following benefits:

Non-developers can maintain copy on your site without reliance on your engineering team.
Those phrases you use all the time for messages/headings etc will now be in a single place helping to reduce duplication.
Single point to update textual copy within your application.
When the time comes to support new languages you already have the infrastructure within your application.

8. Unit testing environment support

It's an interesting subject, from experience, some developers love it, some developers don't want to know about it. In my opinion it's because some of the developers pushing for unit testing, sometimes push too hard, too fast, sometimes all you can hear is "TDD this", "TDD that".

Let me first state, I'm not a firm believer of TDD, in practice:

It can take a lot of time to maintain;
You really need the support of the entire software engineering team;
Developers can end up writing tests just for the sake of code coverage.

When you start writing more test code than actual application code it rings alarm bells in my head.. Remember the more code you write, whether its application or test code, the more time required when you have to refactor in the future.

The middle ground..

Unit test the core components (payments, registration, messaging etc..);
At least unit test the critical path.
Having support for unit testing within your application means developers can easily write a few tests to validate a new component.
Unit testing code can give you and your manager confidence in your components.
Whether you like it or not, you will most likely work with a developer or two who will realize the importance of automated testing - be prepared.
Include unit tests within your task estimates (it is part of delivering the entire feature). If you break it out into a separate task it can more easily be dropped by product owners or managers etc..

The last bit..

Some of you may believe this is over engineering.. If you are working without a framework to be honest, it probably is. The better question is, why aren't you using a framework?

Setting up a base project with all or most of the above components shouldn't take any longer than a week. Once it's setup, if it's well architected so that each component is completely decoupled from every other component you can easily drop it into any new project and your ready to go.

You want to create just enough structure in your application to prepare yourself for scaling, but not so much that you spend 3 months setting it all up.

Lastly, all of those reason above is why I use the Symfony 2 PHP framework. It has most of those components I've described above with a standard install. Using bundles within Symfony2 gives you the decoupling of components and the ability to import bundles into new projects without copy and pasting code. If you haven't looked at the Symfony 2 framework, I highly recommend you do.

Hope you learned something from my blog post, any question let me know..

Dion Beetson
Founder of www.ackwired.com
Linked in: www.linkedin.com/in/dionbeetson
Twitter: www.twitter.com/dionbeetson
Website: www.dionbeetson.com

Symfony2 logging application errors to a database table.

Posted on 13:43 by Dion Beetson with 9 comments

Symfony2 + Monolog + Doctrine = Centralized database logs

If you are running a large Symfony 2 based application, you may want to write your application error logs to a central database. There are a few tutorials on the Symfony 2 website regarding writing logs to different outputs (eg: file, email etc), so if you have read those tutorials you will notice some similarities.

I thought it would be handy to collate all of the code I have found across the Internet into a single tutorial about writing all of your application logs to a database table.

The benefits of logging to a database, means you can run multiple servers and have your logs aggregated in a single location - which means easier debugging. Even if you are running a single server application, it doesn't hurt to be prepared - if you have the time..

This bundle can be dropped into your application and once registered should work transparency.

This bundle is located in the following location:

/src/Tools/LogBundle/

Create the bundle registration classes:

/src/Tools/LogBundle/ToolsLogBundle.php

namespace Tools\LogBundle;

use Symfony\Component\HttpKernel\Bundle\Bundle;

class ToolsLogBundle extends Bundle
{
}

/src/Tools/LogBundle/DependencyInjection/ToolsLogExtension.php

namespace Tools\LogBundle\DependencyInjection;

use Symfony\Component\DependencyInjection\ContainerBuilder;
use Symfony\Component\DependencyInjection\Loader\YamlFileLoader;
use Symfony\Component\HttpKernel\DependencyInjection\Extension;
use Symfony\Component\Config\FileLocator;

class ToolsLogExtension extends Extension
{
    public function load(array $configs, ContainerBuilder $container)
    {
        $loader = new YamlFileLoader($container, new FileLocator(__DIR__.'/../Resources/config'));
        $loader->load('services.yml');
    }

    public function getAlias()
    {
        return 'tools_log';
    }
}

Now create the entity/repository classes for storing the logs

/src/Tools/LogBundle/Entity/SystemLog.php

namespace Tools\LogBundle\Entity;

use Doctrine\ORM\Mapping as ORM;

/**
 *
 * @ORM\Entity(repositoryClass="Tools\LogBundle\Entity\SystemLogRepository")
 * @ORM\Table(name="system_log")
 */
class SystemLog
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     */
    private $id;

    /**
     * @ORM\Column(type="text", nullable=true)
     */
    private $log;

    /**
     * @ORM\Column(type="text", nullable=true)
     */
    private $serverData;

    /**
     * @ORM\Column(type="string", length=255, nullable=true)
     */
    private $level;

    /**
     * @ORM\Column(type="datetime")
     */
    private $modified;

    /**
     * @ORM\Column(type="datetime")
     */
    private $created;

    /**
     * @ORM\PreUpdate
     */
    public function setModifiedValue()
    {
        $this->modified = new \DateTime();
    }

    /**
     * @ORM\PrePersist
     */
    public function setCreatedValue()
    {
        $this->modified = new \DateTime();

        $this->created = new \DateTime();
    }

    /**
     * Get id
     *
     * @return integer
     */
    public function getId()
    {
        return $this->id;
    }

    /**
     * Set log
     *
     * @param string $log
     * @return SystemLog
     */
    public function setLog($log)
    {
        $this->log = $log;

        return $this;
    }

    /**
     * Get log
     *
     * @return string
     */
    public function getLog()
    {
        return $this->log;
    }

    /**
     * Set serverData
     *
     * @param string $serverData
     * @return SystemLog
     */
    public function setServerData($serverData)
    {
        $this->serverData = $serverData;

        return $this;
    }

    /**
     * Get serverData
     *
     * @return string
     */
    public function getServerData()
    {
        return $this->serverData;
    }

    /**
     * Set level
     *
     * @param string $level
     * @return SystemLog
     */
    public function setLevel($level)
    {
        $this->level = $level;

        return $this;
    }

    /**
     * Get level
     *
     * @return string
     */
    public function getLevel()
    {
        return $this->level;
    }

    /**
     * Set modified
     *
     * @param \DateTime $modified
     * @return SystemLog
     */
    public function setModified($modified)
    {
        $this->modified = $modified;

        return $this;
    }

    /**
     * Get modified
     *
     * @return \DateTime
     */
    public function getModified()
    {
        return $this->modified;
    }

    /**
     * Set created
     *
     * @param \DateTime $created
     * @return SystemLog
     */
    public function setCreated($created)
    {
        $this->created = $created;

        return $this;
    }

    /**
     * Get created
     *
     * @return \DateTime
     */
    public function getCreated()
    {
        return $this->created;
    }
}

/src/Tools/LogBundle/Entity/SystemLogRepository.php

namespace Tools\LogBundle\Entity;

use Doctrine\ORM\EntityRepository;

class SystemLogRepository extends EntityRepository
{
    /**
     * Find the latest logs
     */
    public function findLatest()
    {
        $qb = $this->createQueryBuilder('l');

        $qb->add('orderBy', 'l.id DESC');

        $qb->setMaxResults(200);

        //Get our query
        $q = $qb->getQuery();

        //Return result
        return $q->getResult();
    }
}

Now we configure our service

We define the logic to listen for application errors and call our own log handler.

/src/Tools/LogBundle/Resources/config/services.yml

parameters:
    logger_database.class: Tools\LogBundle\Logger\DatabaseHandler

services:
    monolog.processor.request:
        class: Tools\LogBundle\Processor\RequestProcessor
        arguments:  [ @session ]
        tags:
            - { name: monolog.processor, method: processRecord }
            - { name: kernel.event_listener, event: kernel.request, method: onKernelRequest}

    logger_database:
        class: %logger_database.class%
        calls:
            - [ setContainer, [ @service_container ] ]

    tools.backtrace_logger_listener:
        class: Tools\LogBundle\EventListener\BacktraceLoggerListener
        tags:
            - {name: "monolog.logger", channel: "backtrace"}
            - {name: "kernel.event_listener", event: "kernel.exception", method: "onKernelException"}
        arguments:
            - @logger

Define our request processor

This class can be used to add additional data to the logging record. For example we have access to the session object, we could inspect that object for additional information to log. This could be used for a variety of purposes, eg: adding server data, post data etc..

/src/Tools/LogBundle/Processor/RequestProcessor.php

namespace Tools\LogBundle\Processor;

use Symfony\Component\HttpFoundation\Session\Session;
use Symfony\Bridge\Monolog\Processor\WebProcessor;

class RequestProcessor extends WebProcessor
{
    private $_session;

    public function __construct(Session $session)
    {
        $this->_session = $session;
    }

    public function processRecord(array $record)
    {
        $record['extra']['serverData'] = "";

        if( is_array($this->serverData) ) {
            foreach ($this->serverData as $key => $value) {

                if( is_array($value) ) {
                    $value = print_r($value, true);
                }

                $record['extra']['serverData'] .= $key . ": " . $value . "\n";
            }
        }

        foreach ($_SERVER as $key => $value) {

            if( is_array($value) ) {
                $value = print_r($value, true);
            }

            $record['extra']['serverData'] .= $key . ": " . $value . "\n";
        }

        return $record;
    }
}

Define our backtrace listener

The backtrace always comes in handy, lets configure an event listener to add this to the logger object.

/src/Tools/LogBundle/EventListener/BacktraceLoggerListener.php

namespace Tools\LogBundle\EventListener;

use Symfony\Component\HttpKernel\Log\LoggerInterface;
use Symfony\Component\HttpKernel\Event\GetResponseForExceptionEvent;

class BacktraceLoggerListener
{
    private $_logger;

    public function __construct(LoggerInterface $logger = null)
    {
        $this->_logger = $logger;
    }

    public function onKernelException(GetResponseForExceptionEvent $event)
    {
        $this->_logger->addError($event->getException());
    }
}

Define our database handler

Now we just have to tell our database handler how to save the log to the database table.

/src/Tools/LogBundle/Logger/DatabaseHandler.php

namespace Tools\LogBundle\Logger;

use Monolog\Handler\AbstractProcessingHandler;
use Monolog\Logger;

/**
 * Stores to database
 *
 */
class DatabaseHandler extends AbstractProcessingHandler
{
    protected $_container;

    /**
     * @param string $stream
     * @param integer $level The minimum logging level at which this handler will be triggered
     * @param Boolean $bubble Whether the messages that are handled can bubble up the stack or not
     */
    public function __construct($level = Logger::DEBUG, $bubble = true)
    {
        parent::__construct($level, $bubble);
    }

    /**
     *
     * @param type $container
     */
    public function setContainer($container)
    {
        $this->_container = $container;
    }

    /**
     * {@inheritdoc}
     */
    protected function write(array $record)
    {
        // Ensure the doctrine channel is ignored (unless its greater than a warning error), otherwise you will create an infinite loop, as doctrine like to log.. a lot..
        if( 'doctrine' == $record['channel'] ) {

            if( (int)$record['level'] >= Logger::WARNING ) {
                error_log($record['message']);
            }

            return;
        }
        // Only log errors greater than a warning
        // TODO - you could ideally add this into configuration variable
        if( (int)$record['level'] >= Logger::WARNING ) {

            try
            {
                // Logs are inserted as separate SQL statements, separate to the current transactions that may exist within the entity manager.
                $em = $this->_container->get('doctrine')->getEntityManager();
                $conn = $em->getConnection();

                $created = date('Y-m-d H:i:s');

                $serverData = $record['extra']['serverData'];

                $stmt = $em->getConnection()->prepare('INSERT INTO system_log(log, level, serverData, modified, created)
                                        VALUES(' . $conn->quote($record['message']) . ', \'' . $record['level'] . '\', ' . $conn->quote($serverData) . ', \'' . $created . '\', \'' . $created . '\');');
                $stmt->execute();

            } catch( \Exception $e ) {

                // Fallback to just writing to php error logs if something really bad happens
                error_log($record['message']);
                error_log($e->getMessage());
            }
        }
    }
}

Configure monolog in config

Update your application config to use the new logger. Note: If you are not seeing any logs writing to the database, make sure your dev/prod specific configs are not overriding monolog/handlers/main within the config files.

/app/config/config.yml

monolog:
    handlers:
        main:
            type: service
            level: warning
            id: logger_database
            formatter: monolog.processor.request

Register the bundle in your application

/app/AppKernal.php

class AppKernel extends Kernel
{
    public function registerBundles()
    {
        $bundles = array(
            ...
            new Tools\LogBundle\ToolsLogBundle(),
            ...
        );
    }
}

That's about it..

Your application logs will now be writing to the system_log table within your database.

Some optional features

Use delayed insert

If your database platform supports it, you may want to delay the inserts to the database (as they are most likely a lower priority than other components of your application.

Output the latest logs within your application

You may want to view your logs within an admin section of your application. This again is pretty simple.

Build your controller

/src/Tools/LogBundle/Controller/AdminController.php

namespace Tools\LogBundle\Controller;

use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\HttpFoundation\Request;

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\Template;

/**
 *
 */
class AdminController extends Controller
{
    /**
     * @route("/")
     * @template()
     */
    public function indexAction( Request $request )
    {
        $em = $this->getDoctrine()->getEntityManager();
        $logs = $em->getRepository('Tools\LogBundle\Entity\SystemLog')->findLatest();

        return $this->render('ToolsLogBundle:default:index.html.twig', array(
            'logs' => $logs
        ));
    }
}

Define your bundle route

/src/Tools/LogBundle/Resources/config/routing.yml

tools_log_admin:
    resource: "@ToolsLogBundle/Controller/AdminController.php"
    type: annotation
    prefix: /admin/log

Add this routing file to your primary routing file.

/app/config/routing.yml

tools_log_bundle:
    resource: "@ToolsLogBundle/Resources/config/routing.yml"

Provide your view (twig) file

/src/Tools/LogBundle/Resources/views/default/index.html.twig

{# extends "YourBundle::layout.html.twig" #}
{% block content %}

        <h1>Logs</h1>

        <table class="table table-striped">
            <thead>
                <tr>
                    <th style="width: 30px;">Id</th>
                    <th style="width: 100px;">Level</th>
                    <th>Log</th>
                    <th style="width: 150px;">Created</th>
                </tr>
            </thead>
            <tbody>
            {% if logs|length > 0 %}
            {% for log in logs %}
                <tr>
                    <td>{{ log.id }}</td>
                    <td>{{ log.log }}</td>
                    <td>{{ log.level }}</td>
                    <td>{{ log.created.format('Y-m-d H:i:s') }}</td>
                </tr>
            {% endfor %}
            {% else %}
                <tr>
                    <td colspan="4">
                    No logs found
                    </td>
                </tr>
            {% endif %}
            </tbody>
        </table>

{% endblock %}

There you have it!

A simple admin page that will show you the latest 200 logs.

The URL will be:
www.yourdomain.com/admin/log

I highly recommend you update your firewall to restrict this page to admin users.

Thursday 31 January 2013