Tuesday, October 26, 2010

Notes from porting C# code to PHP

(Summary: I ported my TrueSkill implementation from C# to PHP and posted it on GitHub. It was my first real encounter with PHP and I learned a few things.)

I braced for the worst.

After years of hearing negative things about PHP, I had been led to believe that touching it would rot my brain. Ok, maybe that’s a bit much, but its reputation had me believe it was full of bad problems. Even the cool kids had issues with PHP. But I thought that it couldn’t be too bad because there was that one website that gets a few hits using a dialect of it. When Kaggle offered to sponsor a port of my TrueSkill C# code to PHP, I thought I’d finally have my first real encounter with PHP.


<?php echo "Disclaimer:"; ?>

To make the port quick, I kept most of the design and class structure from my C# implementation. This led to a less-than-optimal result since PHP really isn’t object-oriented. I didn’t do a deep dive on redesigning it in the native PHP way. I stuck with the philosophy that you can write quasi-C# in any language. Also, I didn’t use any of the web and database features that motivate most people to choose PHP in the first place. In other words, I didn’t cater to PHP’s specialty, so my reflections are probably an unfair and biased comparison as I was not using PHP the way it was intended. I expect that I missed tons of great things about PHP.

Personal disclaimers aside, even PHP book authors don’t claim that it’s the nicest language. Instead, they highlight the language’s popularity. I sort of got the feeling that people mainly choose PHP in lieu of languages like C# because of its current popularity and its perception of having a lower upfront cost, especially among cash-strapped startups. Matt Doyle, author of Beginning PHP 5.3, wrote the following while comparing PHP to other languages:

“Many would argue that C# is a nicer, better-organized language to program in than PHP, although C# is arguably harder to learn. Another advantage of ASP.NET is that C# is a compiled language, which generally means it runs faster than PHP’s interpreted scripts (although PHP compilers are available).” - p.5

He continued:

“ASP and ASP.NET have a couple of other disadvantages compared to PHP. First of all, they have a commercial license, which can mean spending additional money on server software, and hosting is often more expensive as a result. Secondly, ASP and ASP.NET are fairly heavily tied to the Windows platform, whereas the other technologies in this list are much more cross-platform.” - p.5

Next, he hinted that Ruby might eventually replace PHP’s reign:

“Like Python, Ruby is another general-purpose language that has gained a lot of traction with Web developers in recent years. This is largely due to the excellent Ruby on Rails application framework, which uses the Model-View-Controller (MVC) pattern, along with Ruby’s extensive object-oriented programming features, to make it easy to build a complete Web application very quickly. As with Python, Ruby is fast becoming a popular choice among Web developers, but for now, PHP is much more popular.” - p.6

and then elaborating on why PHP might be popular today:

“[T]his middle ground partly explains the popularity of PHP. The fact that you don’t need to learn a framework or import tons of libraries to do basic Web tasks makes the language easy to learn and use. On the other hand, if you need the extra functionality of libraries and frameworks, they’re there for you.” - p.7

Fair enough. However, to really understand the language, I needed to dive in personally and experience it firsthand. I took notes during the dive about some of the things that stuck out.

The Good Parts

  • It’s relatively easy to learn and get started with PHP. As a C# developer, I was able to pick up PHP in a few hours after a brief overview of the syntax from a book. Also, PHP has some decent online help.
  • PHP is available on almost all web hosts these days at no extra charge (in contrast with ASP.NET hosting). I can’t emphasize this enough because it’s a reason why I would still consider writing a small website in it.
  • I was pleasantly surprised to have unit test support with PHPUnit. This made me feel at home and made it easier to develop and debug code.
  • It’s very easy and reasonable to create a website in PHP using techniques like Model-View-Controller (MVC) designs that separate the view from the actual database model. The language doesn’t seem to pose any hindrance to this.
  • PHP has a “static” keyword that is sort of like a static version of a “this” reference. This was useful in creating a quasi-static “subclass” of my “Range” class for validating player and team sizes. This feature is formally known as late static binding.

The “When in Rome...” Parts

  • Class names use PascalCase while functions tend to use lowerCamelCase like Java whereas C# tends to use PascalCase for both. In addition, .NET in general seems to have more universally accepted naming conventions than PHP has.
  • PHP variables have a ‘$’ prefix which makes variables stick out:

    function increment($someNumber)
    {
    $result = $someNumber + 1;
    return $result;
    }
    This convention was probably copied from Perl’s scalar variable sigil. This makes sense because PHP was originally a set of Perl scripts intended to be a simpler Perl.
  • You access class members and functions using an arrow operator (“->”) like C++ instead of the C#/Java dot notation (“.”). That is, in PHP you say “$someClass->someMethod()” instead of “someClass.someMethod()
  • The arguments in a “foreach” statement are reversed from what C# uses. In PHP, you write:

    foreach($allItems as $currentItem) { ... }
    instead of the C# way:

    foreach(currentItem in allItems) { ... }
    One advantage to the PHP way is its special syntax that makes iterating through key/value pairs in an map easier:

    foreach($someArray as $key => $value) { ... }
    vs. the C# way of something like this:

    foreach(var pair in someDictionary)
    {
    // use pair.Key and pair.Value
    }
  • The “=>” operator in PHP denotes a map entry as in

    $numbers = array(1 => ‘one’, 2 => ‘two’, ...)
    In C#, the arrow “=>” is instead used for a lightweight lambda expression syntax:

    x => x * x
    To define the rough equivalent of the PHP array, you’d have to write this in C#

    var numbers = new Dictionary<int, string>{ {1, "one" }, {2, "two"} };
    On the one hand, the PHP notations for maps is cleaner, but it comes at a cost of having no lightweight lambda syntax (more on that later).
  • PHP has some “magical methods” such as “__construct” and “__toString” for the equivalent of C#’s constructor and ToString functionality. I like C#’s approach here, but I’m biased.

The “Ok, I guess” Parts

  • The free NetBeans IDE for PHP is pretty decent for writing PHP code. Using it in conjunction with PHP’s XDebug debugger functionality is a must. After my initial attempts at writing code with a basic notepad, I found NetBeans to be a very capable editor. My only real complaint with it is that I had some occasional cases where the editor would lock up and the debugger wouldn’t support things like watching variables. That said, it’s still good for being a free editor.
  • By default, PHP passes function arguments by value instead of by reference like C# does it. This probably caused the most difficulty with the port. Complicating things further is that PHP references are not like references in other languages. For example, using references usually incurs a performance penalty since extra work is required.
  • You can’t import types via namespaces alone like you can in C# (and Java for that matter). In PHP, you have to import each type manually:

    use Moserware\Skills\FactorGraphs\ScheduleLoop;
    use Moserware\Skills\FactorGraphs\ScheduleSequence;
    use Moserware\Skills\FactorGraphs\ScheduleStep;
    use Moserware\Skills\FactorGraphs\Variable;
    whereas in C# you can just say:

    using Moserware.Skills.FactorGraphs;
    PHP’s way makes things explicit and I can see that viewpoint, but it was a bit of a surprising requirement given how PHP usually required less syntax.
  • PHP lacks support for C#-like generics. On the one hand, I missed the generic type safety and performance benefits, but on the other hand it forced me to redesign some classes to not have an army of angle brackets (e.g. compare this class in C# to its PHP equivalent).
  • You have to manually call your parent class’s constructor in PHP if you want that feature:

    class BaseClass
    {
    function __construct() { ... }
    }

    class DerivedClass extends BaseClass
    {
    function __construct()
    {
    // this line is optional, but if you omit it, the BaseClass constructor will *not* be called
    parent::__construct();
    }
    }
    This gives you more flexibility, but it doesn’t enforce C#-like assumptions that your parent class’s constructor was called.
  • PHP doesn’t seem to have the concept of an implicit “$this” inside of a class. This forces you to always qualify class member variables with $this:

    class SomeClass
    {
    private $_someLocalVariable;
    function someMethod()
    {
    $someMethodVariable = $this->_someLocalVariable + 1;
    ...
    }
    }
    I put this in the “OK” category because some C# developers prefer to always be explicit on specifying “this” as well.
  • PHP allows you to specify the type of some (but not all kinds) of the arguments of a function:

    function myFunction(SomeClass $someClass, array $someArray, $someString) { ... }
    This is called “type hinting.” It seems that it is designed for enforcing API contracts instead of general IDE help as it actually causes a decrease in performance.
  • PHP doesn’t have the concept of LINQ, but it does support some similar functional-like concepts like array_map and array_reduce.
  • PHP has support for anonymous functions by using the “function($arg1, ...) {}” syntax. This is sort of reminiscent of how C# did the same thing in version 2.0 where you had to type out “delegate.” C# 3.0 simplified this with a lighter weight version (e.g. “x => x * x”). I’ve found that this seemingly tiny change “isn’t about doing the same thing faster, it allows me to work in a completely different manner” by employing functional concepts without thinking. It’s sort of a shame PHP didn’t elevate this concept with concise syntax. When C#’s lambda syntax was introduced in 3.0, it made me want to use them much more often. PHP’s lack of something similar is a strong discourager to the functional style and is a lesson that C++ guys have recently learned.
  • Item 4 of the PHP license states:
    Products derived from this software may not be called “PHP”, nor may “PHP” appear in their name, without prior written permission from group@php.net. You may indicate that your software works in conjunction with PHP by saying “Foo for PHP” instead of calling it “PHP Foo” or “phpfoo”
    This explains why you see carefully worded names like “HipHop for PHP” rather than something like “php2cpp.” This technically doesn’t stop you doesn’t stop you from having a project with the PHP name in it (e.g. PHPUnit) so long as the official PHP code is not included in it. However, it’s clear that the PHP group is trying to clean up its name from tarnished projects like PHP-Nuke. I understand their frustration, but this leads to an official preference for names like Zope and Smarty that seem to be less clear on what the project actually does. This position would be like Microsoft declaring that you couldn’t use the “#” suffix or the “Implementation Running On .Net (Iron)” prefix in your project name (but maybe that would lead to more creativity?).

The Frustrating Parts:

  • As someone who’s primarily worked with a statically typed language for the past 15 years, I prefer upfront compiler errors and warnings that C# offers and agree with Anders Hejlsberg’s philosophy:
    “I think one of the reasons that languages like Ruby for example (or Python) are becoming popular is really in many ways in spite of the fact that they are not typed... but because of the fact that they [have] very good metaprogramming support. I don’t see a lot of downsides to static typing other than the fact that it may not be practical to put in place, and it is harder to put in place and therefore takes longer for us to get there with static typing, but once you do have static typing. I mean, gosh, you know, like hey -- the compiler is going to report the errors before the space shuttle flies instead of whilst it’s flying, that’s a good thing!”
    But more dynamic languages like PHP have their supporters. For example, Douglas Crockford raves about JavaScript’s dynamic aspects:
    “I found over the years of working with JavaScript... I used to be of the religion that said ‘Yeah, absolutely brutally strong type systems. Figure it all out at compile time.’ I've now been converted to the other camp. I've found that the expressive power of JavaScript is so great. I've not found that I've lost anything in giving up the early protection [of statically compiled code]”
    I still haven’t seen where Crockford is coming from given my recent work with PHP. Personally, I think that given C# 4.0’s optional support of dynamic objects, the lines between the two worlds are grayer and that with C# you get the best of both worlds, but I’m probably biased here.
  • You don’t have to define variables in PHP. This reduces some coding “ceremony” to get to the essence of your code, but I think it removes a shock absorber/circuit-breaker that can be built into the language. This “feature” turned my typo into a bug and led to a runtime error. Fortunately, options like E_NOTICE can catch these, but it caught me off guard. Thankfully, NetBean’s auto-completion saved me from most of these types of errors.
  • PHP has built-in support for associative arrays, but you can’t use objects as keys or else you’ll get an “Illegal Offset Type” error. Because my C# API heavily relied on this ability and I didn’t want to redesign the structure, I created my own hashmap that supports object keys. This omission tended to reinforce the belief that PHP is not really object oriented. That said, I’m probably missing something and did it wrong.
  • PHP doesn’t support operator overloading. This made my GaussianDistribution and Matrix classes a little harder to work with by having to invent explicit names for the operators.
  • PHP lacks support for a C#-like property syntax. Having to write getters and setters made me feel like I was back programming in Java again.
  • My code ran slower in PHP. To be fair, most of the performance problem was in my horribly naive matrix implementation which could be improved with a better implementation. Regardless, it seems that larger sites deal with PHP’s performance problem by writing critical parts in compiled languages like C/C++ or by using caching layers such as memcached. One interesting observation is that the performance issue isn't really with the Zend Engine per-se but rather the semantics of the PHP language itself. Haiping Zhao on the HipHop for PHP team gave a good overview of the issue:
    “Around the time that we started the [HipHop for PHP] project, we absolutely looked into the Zend Engine. The first question you ask is 'The Zend Engine must be terribly implemented. That’s why it’s slow, right?' So we looked into the Zend Engine and tried different places, we looked at the hash functions to see if it’s sufficient and look some of the profiles the Zend Engine has and different parts of the Zend Engine. You finally realize that the Zend Engine is pretty compact. It just does what it promises. If you have that kind of semantics you just cannot avoid the dynamic function table, you cannot avoid the variable table, you just cannot avoid a lot of the things that they built... that’s the point that [you realize] PHP can also be called C++Script because the syntax is so similar then you ask yourself, 'What is the difference between the speed of these two different languages and those are the items that are... different like the dynamic symbol lookup (it’s not present in C++), the weak typing is not present in C++, everything else is pretty much the same. The Zend Engine is very close to C implementation. The layer is very very thin. I don’t think we can blame the Zend Engine for the slowness PHP has.”
    That said, I don’t think that performance alone would stop me from using PHP. It’s good enough for most things. Furthermore, I'm sure optimizers could use tricks like what the DLR and V8 use to squeak out more performance. However, I think that in practice, there is a case of diminishing returns where I/O (and not CPU time) typically become the limiting factor.

Parting Thoughts

Despite my brief encounter, I feel that I learned quite a bit and feel comfortable around PHP code now. I think my quick ramp-up highlights a core value of PHP: its simplicity. I did miss C#-like compiler warnings and type safety, but maybe that’s my own personal acquired taste. Although PHP does have some dubious features, it’s not nearly as bad as some people make it out to be. I think that its simplicity makes it a very respectable choice for the type of things it was originally designed to do like web templates. Although I still wouldn’t pick PHP as my first choice as a general purpose web programming language, I can now look at its features in a much more balanced way.

P.S. I’d love to hear suggestions on how to improve my implementation and learn where I did something wrong. Please feel free to use my PHP TrueSkill code and submit pull requests. As always, feel free to fork the code and port it to another language like Nate Parsons did with his JSkills Java port.