Phil Dreizen

XML and JSON are for TRANSPORT

I repeat: XML and JSON are for transporting data.

Just because you received data in XML or JSON form does not mean you need to continue examining, retrieving, and accessing that data in the format your program received it. You do not need to continuously parse your JSON and XML, you do not need to keep traversing your DOM object.

What you should do is work with XML and JSON (and whatever other format for transporting data) in your endpoints only. Take the XML/JSON you receive and turn it into data structures appropriate for you language. Turn it into primitives, into objects, into arrays and maps of primitives and objects. But DON'T KEEP IT AS XML OR JSON!!!

Code that isolates the transport format will be better for several reasons. The code that actually processes your data will be agnostic to whatever transportation formats that are available. If you later need your system to receive or transmit a new data format the code that does the actual processing will not need to be modified. Instead, the code at your endpoints will just need to be able to turn NEW_TRANSPORT_FORMAT into native data structures, and native data structures back into NEW_TRANSPORT_FORMAT. The code will also be easier to read. You won't find yourself in weird XML processing land, or constantly dealing with JSON parse exceptions land. You'll be in completely normal for your language land. This is especially true for XML, which can be especially painful to work with. But it's true for JSON too.

When it comes to the serialization and deserialization of data in binary formats, programmers don't make this mistake. No one would think to deserialize binary data multiple times for the sake of accessing their data. It's almost absurd. You deserialize your binary data once after receiving it, and serialize it once before transmitting it. But when it comes to marshalling and unmarshalling data in XML,JSON this mistake seems to happen often enough. (Never seen it happen with CSV data. Maybe it's TOO painful to work with?)

</RANT>

some very clever code to print an integer

While reading Microprocessors: A Programmers View, a very old book on different computer architectures, I came across this really clever bit of code for printing out an integer to ascii:

void itoa(unsigned int i){
    if(i >= 10) itoa(i/10);
    putchar('0' + i%10);
}

(I modified the name of the function, and the style of the parameters)

It's just a really nice example of recursion. (Of course, it appeared in an example about how SPARC's "register window" method of doing procedure calls would potentially perform poorly in code like this (this was in 1990)).

kupad.net: now with more comments! (alpha)

Because at least two people have requested that I add comments, I've implemented a comment system. This isn't well tested by me or anything, so if you encounter bugs please let me know about them! And please, make feature requests. To leave a comment, you'll need to sign in with a 3rd party, like google or yahoo.

Adding comments introduces some...issues. So I wasn't originally in a rush to get it done.

The first issue is trying to combat spam. There are lot's of options to deal with it. Widely used options like recaptcha are in a war of escalation with spammers. As a result they've gotten so difficult to read, I find them too hostile to non-spammers like me. I considered rolling my own Ascii Captcha - it would generate random words in ascii art, and prompt the user to enter the word generated. (In fact, I DID develop this and chose not to use it...yet...) Though a system like this would be fairly easy to break, any time spent doing it would be specific to kupad.net, and not really worth a spammers time. There are services like akismet that probably use baysian categorizers and the like to guess if a particular comment is spam. akismet is widely used right now, it's probably a good choice. Right now I don't have any of these in place...I'm hoping that since I'm requiring an openid login, spam will be reduced, though I don't actually know that it will help in anyway. I do have a simple honey pot in place. Apparently, spambots can't resist filling in form fields, and so I have a form field (no display) that must be left blank for a successful comment submission.

Then there's the concern that comes with any user submitted data: security. Inviting users to comment invites users to try break into the site. (Things like SQL injection). And, especially since comments are displayed right back on the page, another concern is users leaving malicious javascript code in the comments they leave (XSS). Third party libraries like htmlpurifier help with the later at least.

And what to do about anonymous users? I ultimately decided that having some kind of identity will reduce flaming. So, in order to leave a comment, you'll need to authenticate using OpenID. You'll be able to use lots of services (Google,Yahoo...) to authenticate this way.

Finally is the fact that there will be bugs. So I'm looking forward to angry friends telling me how they tried to leave a comment but couldn't. Why did I bother implementing this from scratch again?

Starting again...again.

Hi Everyone. I'm starting this website (and this server) from scratch after having letting it go stale. The server is now running Debian Wheezy.

Some technical details follow: I'm trying something that is, arguably, insane. I'm writing the blog portion of this site in php, from scratch, backing it with a dbm database. The keys are the time of the post, and the entry itself is a JSON string representing the post. Arguably NoSQL inspired -- not that I know much of anything about NoSQL. In any case, something about the arrangement seems "simple" to me. Also, it lends itself to CLI tools to manage the posts. Also, I must be some kind of masochist.

So, when deciding on what kind of dbm to use, I made a "fun" discovery: Wheezy packages php5 with qdbm support and no gdbm support. But packages python with gdbm support and no qdbm support. And there doesn't seem to be any way to rectify the sitution using the repositories. This is quite annoying if I had wanted to use python and php to interact with the same underlying db, which I was considering doing. On the bright side, it stopped me from doing that.