Just noticed that somebody had already made a product/service out of the idea to integrate Hadoop and Cassandra, two major hypes names in the Big Data, NoSQL space. – which is exactly what I had been planning as new architecture for my last project.
Datastax now offers Brisk:
- CassandraFS has the same interface as HDFS. So, in particular, you should be able to use most Hadoop add-ons with Brisk.
- CassandraFS has comparable performance to HDFS on sequential scans. That’s without predicate pushdown to Cassandra, which is Coming Soon but won’t be in the first Brisk release.
- Brisk/CassandraFS is much easier to administer than HDFS. In particular, there are no NameNodes, JobTracker single points of failure, or any other form of head node. Brisk/CassandraFS is strictly peer-to-peer.
- Cassandra is far superior to HBase for short-request use cases, specifically with 5-6X the random-access performance.
Checkout their white paper.
A project that I have been working on during my employment as a development team manager at VeriSign has been the base for an open source version:
EasyRec enables you to include item recommendations based on the behaviour of your website users.
From their website:
- User Actions are sent to the easyrec using the REST API. Possible actions are viewing, buying or rating an item.
- These user actions are stored in the database of the Recommendation Engine.
- The provided Analyzers periodically analyze all recorded data for identifying patterns to generate recommendations.
- These Recommendations can be accessed through calls to the easyrec webservice API and presented to a user.
I left VeriSign in 2008, so it will be quite interesting for me to see how it evolved.
I just came across Turiya Media, a startup concentrating on providing data mining for social and mobile games that sell virtual goods.
Founded in 2009, and they won the “Best Startup” prize at last year’s Game Developer Conference, but even now they still appear to be in stealth phase.
Interestingly, I have been thinking about data mining since I was working on Atlantis at plazz entertainment. Having talked with a lot of developers and publishers, it seems that most of them rely on basic data aggregations and averages as a basis for their decisions to optimize, while obviously it would be very rewarding to segment your customers into groups and and target them more specifically, predicting revenue curves and lifetime value, presenting them items to purchase that other, “similar” users have bought, etc.
If you are game developer (or publisher) using data mining techniques, or are interested in this topic, please contact me via Xing or LinkedIn.
I have been having mixed feelings about Flash development for a long time.
Fact is, Flash is here to stay – at least for the next 2-3 years, I assume, until maybe HTML5, Unity and/or others have gained more traction.
A question raised by a friend in a recent discussion was: What features, advantages etc. can Flash list on the “plus” side, besides an enormous installed user base? My reply was: “You already answered your question.”
The thing is, it does not actually matter how “bad”, performance-hogging, 3D-features lacking Flash actually is, if you want to reach a very broad audience, there still is no alternative to it.
Perhaps (un)surprisingly, little has changed since early 2008 when we decided to develop AT.LANT.IS in Flash.
Daniel James of Three Rings confirmed that with the Java version of Puzzle Pirates the majority of users did not even see the start screen of the game, because they needed to confirm to “Trust” the Java applet before – something they were not used to doing, raising their suspicion and in result leaving the game before it had even started. When they changed to Flash with Whirled, that problem was gone. (He was giving some exact numbers, as he likes to do, but I can’t remember exactly).
So, still a good opportunity for all you Flash coders charging a fortune for some AS3 lines…

I assume this is only interesting for a very limited crowd. Still, it is worth mentioning:
ImageMagick may behave as a complete (meaning 99%) CPU resource hog, if several convert processes are running at the same time, as described by several users in the forums.
What worked for me: disabling IM’s internal threading. Sounds weird, but apparently there are some issues (especially?/only?) on RHEL/CentOS machines.
Build with configure –disable-openmp, and several parallel processes did not stall the CPU any longer.