on Dec 19th, 2007Are days of the RDBMS numbered ?
Most programmers know databases and its importance. Thanks to the new generation of software as a service and web services, traditional RDBMS’s are sparingly used and the number is bound to deteriorate further as enterprises adopt the Saas platform.
Data has far outgrown the domains of just text. Today we talk of mutlimedia data, urls, semantic data and many more application specific formats. Information on the Web is in JSON, REST , XML , Microformats etc. With this vareity in data formats and representations comes the inherent need for flexibility in storage and querying of such information. Almost all database users know of the conceptual modelling required for the design of any database, the key principle being that more tighter the model, more efficient the database. The integrity of the database is only as good as the integrity of the data. But you cannot talk of data integrity with the kind of formats available today.
Clearly markup data dominates the web . Though databases have developed features to better support , store and validate markup data , the initial design of databases was never to store the wide variety of loosely organized data. Querying of such markup data is fruitless and so is the attempt to index, sort , aggregate this data. To develop a custom database capable of all the above mentioned operations could be a solution, but the given the non standardized nature of this data and its probability of change, you would have a tough time scouring the web to search for changes. Plus these databases will not be semantically inter operable.
Developers are taking notice of a new scheme of storing data, I call it the bucket store. The design is roughly the same as that of a hash table, where data blocks are stored in buckets and hashes are used to index or refer to these buckets. A little improvisation in terms of adding upper layers like domains, groups and so on to complement the schema, table in a database is done to make the data easily classifiable. The advantage with this scheme is heterogeneity in data formats and the absence of constraints.
Several products are offering such services at dirt cheap prices. Take Amazon’s S3 or the recently launched Simpledb or CouchDb which offers a host it yourself version of this storage. Amazon S3 has businesses running on top of it; of the many I can recall Slideshare running on S3. With the advent of more mashups and heterogeneous data being churned out by the web more of such non DBMS related storage options will be employed. Given that this paradigm does implement all the enterprise important features like security, access control , backups, transactions etc and mature modeling methodologies that can rival the ER are proposed , I don’t see any problem in this becoming the most viable and cost effective option for data storage.