on Jun 15th, 2009State of higher education in India with a focus on Computer Science

I came back from attending a session which spoke about the state of graduate education in India and here is the summary:

  • Just over 450,000 students in India graduate with an Engineering degree
  • 150,000 students amongst them with a degree in either Computer Science or Information Technology.
  • There are about 1500 Engineering colleges in India.
  • Many of these colleges don’t even have a full professor on their rolls.
  • Currently there are about 750 students pursuing a Phd in 15 of the most reputed institutions in the country which means that, about 80 to 90 students graduate with a Phd from one of the 15 reputed institutions in India.
  • The 15 reputed institutions include the IIT’s, NIT’s, two of the IIIT’s (Hyderabad and Bangalore) and some autonomous institutions like BITS and Vellore.
  • The percentage of students who take up graduate education after their engineering in India is drastically low.
  • About a quarter of the students who secure Phd’s from universities from the US are Indians.
  • Students of Indian and Chinese origin make up half the graduate schools students in America.
  • Most people who secure their Phd’s from universities in India either join small and focused research groups in IT companies or take up faculty positions.
  • This year the amount of students applying for graduate education has increased dramatically, which only is reassuring evidence that graduate education is seen as a substitute for jobs and not as something of value.
  • A couple of IIT’s got about 700 applications for masters and phd positions.

Apart from all this the research output in India is not very high. Groups doing theory are considered to be doing some of the state of the art research, the other departments are not very highly regarded (I have a problem with this generalization, but we will keep that for another discussion). The researchers present in the discussion had plenty of points to contribute for the dismal state of higher education and some of the points mentioned were :

  • Lack of good, trained and motivated faculty members. This was attributed to the fact that salaries in academia were not on par with that of the industry. (pay commission’s revisions should do some good in this direction)
  • Lack of exposure to opportunities, challenges and rewards of research careers. ( this is true for colleges that are not very reputed, the quality of the faculty members are not up to the mark, which means they don’t have enough exposure … you get the point)
  • Societal pressures for securing jobs, that too through college placements, rather than pursuing something that the student really wants to do. A survey of the choices of the students during the engineering seat selection process will ascertain this fact. I even know of people who took up courses they had no interest in just because it was in a college where the placements were good.
  • Lack of funding for graduate students to attend conferences, workshops etc. ( though this was contested by a lot of people, I think , the problem lies in making the students aware of the funds that are available for such purposes )
  • Discrimination against the students who graduate from the IIT’s versus other institutions. (though strong alumni networks are not anything new, other colleges should target to strengthen their alumni networks and not work as silo’s )

This is where I found the IIIT’s (particularly Hyderabad and Bangalore) to be very innovative in their approach. They are situated in the heartland of what can be considered seat of innovation in India. Both of them have strong collaboration with the Indigenous and multinational companies based out of their respective cities and provide for a wonderful platform for students to explore a mix of both academic research and industry relevant parts of the information technology industry. Both IIIT-H and IIIT-Bangalore have achieved recognition for their quality in the industry and academia, and that too in good time. I am positive that in a few years time, these institutions will be deeply connected to the research and development communities of the information technology industry in India  and will contribute significantly to the intellectual output of the country.

disclaimer : the numbers mentioned in this post are thanks to Ashwani Sharma, part of the External Research Programs  team at Microsoft Research India.

on May 26th, 2009Design Patterns Quick Reference

I am back to designing some software and want to use all my knowledge of Object Orientation and patterns to tackle common problems with design. I found this great reference for the most commonly used design patterns that I must share. It lists all of the core design patterns, all 23 of them,  listed in the gang of four book. If you know what this is, take a print out of this and revisit your designs. Thanks to Mark Turansky for the original upload.

Design Patterns card 2

Design Patterns card 2

Design patterns card 1

Design patterns card 1

on Apr 28th, 2009What goes into a good resume

I will be graduating soon and I am looking out for good positions in Bangalore. My areas of interest can be found here. As a result, its time for me to do my resume again. I have always wondered as to what makes a good resume. Should there be an objective? I mean, its a resume and it means you are looking for a job, so why the objective? Or should you put your achievements ? The right question would be, what have you achieved that will be looked upon as achievements by others? Should I put experience above education? Should I put that section called personal info at the end?

Give me your inputs as to what should go into a resume and what shouldn’t. If this turns out to be a good discussion, I am sure it will help out a lot of people like me.

Update: After receiving some feedback about my own resume, I am adding some more tips.

  • Even if you don’t believe that technology matters ( like I do), you have to put technologies that you know in your resume. This is required as the HR’s who look at the resume’s usually filter out resumes based on skills mentioned. Not having the skills column is only going to get your resume away from good opportunities.
  • Its very important that you provide your contact information in multiple forms. Phone number, at least two email id’s, home phone etc.
  • Nobody cares that you won a first prize in your school’s annual dancing competition or that you have helped organize your college fest. A recruiter told me that such things are good only if you are applying for BPO jobs where you have to prove your leadership skills.
  • Do not write essays about your projects. Leave it short and let the recruiters/interviewers quiz you about the same. This gives more time for conversation and a healthy dialogue.

on Apr 13th, 2009Limitations and Challenges in Cloud Computing for Applications

I was supposed to be involved in a discussion about cloud computing at Cloudcamp Bangalore, but due to other commitments, I could not attend the event. I had a small writeup about the limitations and challenges in Application clouds. Here is the full text of it.

Cloud Computing is a way of providing dynamically scalable and available resources such as computation, storage etc as a service to users who can use it to deploy their applications and data. Cloud Computing can handle data in both the public and the private domain. But this seemingly harmless way of thinking about building applications has its own set of issues.I am primarily referring to application cloud providers, the kind where you deploy your applications. Not storage and service clouds. Google AppEngine would be a good example for the cloud that I am describing. I note some of them here :

From the Users perspective:

  1. New unstructured and non standard paradigm of programming: Each cloud has its own supported programming language and syntax requirements for programming, though most of these clouds expose the typical hashtable based cache and datastore interfaces. There is an urgent need for standardization of interfaces and methods of programming them. One of the reasons why shared hosting environments work great is because , as a programmer, I know that I can move my PHP/PERL code to another server and it will work without too much of a fuss. Moving from one of the dozen odd cloud providers to another requires considerable developmental efforts, not to forget time (for businesses, this could spell doom).  A look back at history shows languages like SQL, C etc being standardized to stop exactly this sort of undesirable proliferation.
  2. Restrictions on the programming model : For cloud based applications to be highly available, they must be easy to dynamically mirror on multiple machines. Once these applications are mirrored, they can be served on demand by load balancing servers which makes them highly available and the user doesn’t face delays in being serviced. This is an old trick used by busy websites from the early days of web publishing but these solutions were custom built for websites. So, extending this concept to cloud based platforms, servicing thousands of applications, mandates the platform providers to automate this task of replication and mirroring. This job is easier said than done. This process can be made seamless when the program stores as little state information as possible. By state, I mean transactional variables, static variables, variables in the context of the entire application etc. These things are almost a given in traditional programming environments but are very hard to come by in cloud based environments. The unnatural way of dealing with this situation is using the datastore or the cache to store state of an application. There are a lot of restrictions like lack of privileges to install third party libraries, no access to file system to write files etc ( which forces you to use the datastore and pay for it)
  3. A good local debugging experience: A good local development environment, debugging experience is a must for programming on the cloud. Most cloud providers do not provide good local development environments. There is also a lack of good IDE’s that can help with programming and debugging programs written for the cloud. The providers that do provide a local debug experience, do not simulate real cloud like conditions. Both from my personal experience and from conversations with other developers, I have come to realize that most people face problems when moving code from their local development servers to the actual cloud. This is only due to inconsistencies in the behavior of the local dev env compared to the cloud.
  4. Appropriate metrics and documentation of programming best practices : On a cloud, since a user pays for almost every CPU cycle, appropriate metrics on usage of processing time and memory must be presented to the users. Typically a profile of the application with function names and their corresponding time taken, memory used, processing cycles used will definitely help the developer tune his/her code to optimize on usage of processing power. The best solution for this is for cloud providers to abstract common code patterns into optimal libraries so that the users can be assured that they are running the most optimal code for a certain operation. An example of this is Apache PIG, which gives a scripting like interface to Apache Hadoop’s HDFS for data analysis. Also, Most cloud providers do not provide enough statistics and also profiling capabilities.

From the providers perspective:

Here I look at challenges that cloud providers have to face:

  1. Ensuring availability of the cloud: This proves to be crucial as Clouds host critical business applications, for whom, downtime would mean monetary losses. Effective monitoring and load balancing solutions are to be built. Most clouds employ virtualization technology to get the most out of any resource. In such cases, tools should be written to figure out a resource hog early and move the application to a more powerful grid or a machine, so that the other users get their share of the cloud without delays.
  2. Ensuring Consistency: Both the data and code is replicated on the cloud and maintaining consistency of data is extremely crucial. This is the reason why most transactional updates are not allowed on the cloud. Example: sequence objects, which are almost a given in traditional databases are not provided, probably because maintaining state across machines for such statements is non trivial. Problems like distributed updates, locking, partitioning, sharding etc  arise when dealing with data. Such constructs are to be provided to the users as most of it is given in the non cloud deployment space.
    Most datastores provided by cloud vendors (except the ones that provide cloud based database services) do not support relational models. Which means all object relations have to be programmatically established. This could always lead to bad code, unnecessary joins, cascading problems and tons of other problems that developers faced before working with relational datastores.
  3. Program verification : One of the biggest worries about deploying applications on the cloud is the correctness of the program in execution. Erroneous conditions, like infinite loops, can not only put the machine at the risk of being overloaded and unavailable, but also cost the user a significant amount of money. Tools like static analysis should be used to analyze code uploaded on the cloud and it should be checked for infinite loops, possible race conditions,  null references, unreachable code etc. The code uploaded should also be optimized or suggestions should be provided to the users about how they could optimize code to best utilize the available resources.

Conclusion : The cloud should become a complete nonrestrictive platform for applications. There should be no restrictions on the constructs, functionality and privileges on the cloud. Also, it should be dead simple to move everyday applications onto the cloud without too much of rework. This could mean writing migration utilities, import/export options and other artifacts that make the transition to a cloud much easier.  This will prove essential as most live applications, at least currently, do not run on a cloud and helping them migrate easily will mean more revenue and adoption.

on Mar 27th, 2009Moffe – My own friendfeed emulator.

moffeI started using friendfeed recently and have been using ff to post links and other interesting artifacts I find on the net. The ease with which I can post links from ff not only  increased my posts on ff, and indirectly on twitter, but also the frustration of my friends. Now instead of going directly to the link I share, they now had to go to friendfeed and then after another click go the address that I had shared.

I used friendfeed because I wanted to start a conversation based on the items I shared. But, again like all social media sites, my ability to get the conversation started depended directly on the number of people who were using friendfeed. So, I sat down to fix the problem myself and after one nights work, Moffe was born.

For the unitiated moffe stands for My Own FriendFeed Emulator. It gives the same features that friendfeed provides and also provides an easy way for people to leave comments on the items I share. Plus, the link that I shared, is federated directly into the page. The outcome is that now people can leave comments on items I share plus see the page all with one click. I have also incorporated canned comments for that restless user who doesn’t have time to write in comments. Plus, I get to monitor moffe on this cool dashboard.

In essence, its a microsharing service which lets me keep my content on my site and not rely on other services like friendfeed ,twitblogs etc. Thanks to Easy on the Slaw for giving me the twitter wrapper in PHP.