Aaron Cordova's Blog

Oracle learns from MapReduce

oracle learns from mapreduce
Reading the technical whitepaper on Oracle's Exadata reveals they've learned a thing or two from the MapReduce camp:

1. push computation to the data - From the paper "Exadata pushes SQL processing as close to the data (or disks) as possible and gets all the disks operating in parallel. This reduces CPU consumption on the database server, consumes much less bandwidth moving data between database servers and storage servers, and returns a query result set rather than entire tables."

2. automate parallelization details - "The database servers and Exadata Storage Server Software communicate using the iDB – the Intelligent Database protocol. iDB is implemented in the database kernel and transparently maps database operations to Exadata-enhanced operations. iDB implements a function shipping architecture in addition to the traditional data block shipping provided by the database. iDB is used to ship SQL operations down to the Exadata cells for execution and to return query result sets to the database kernel."

3. build in fault-tolerance - "The disk mirroring provided by ASM, combined with hot swappable Exadata disks, ensure the database can tolerate the failure of individual disk drives. Data is mirrored across cells to ensure that the failure of a cell will not result in loss of data, or inhibit data accessibility."
This is ok - but how about moving the handling up a layer - into the application? That way, entire servers, switches, raid controllers, and entire racks can fail too.

Very good Oracle. Now if they could only learn a couple more lessons:

1. the value of using commodity hardware - this allows customers to leverage all the work being done by other users to make the hardware cost effective. By using the same type of servers used for everyday web/file/mail servers, you're always sure to get improvements and have a wide range of vendors - oops. That's one lesson Oracle refuses to learn. Customers hate the kind of lock-in Oracle promotes. And besides being widely used, commodity hardware is cheap. Oracle Exadata uses Infiniband extensively to achieve high enough throughput to support a lot of synchronization across it's architecture. Which brings us to our next lesson.

2. learn to relax - your constraints - Exadata implements full SQL, which does a lot of synchronization across the architecture, making fast interconnects like Infiniband necessary. Infiniband is expensive. Scalability has to include your budget too - sure, if everyone had Orace's money we could scale out using Infiniband, but who wants their network costs to grow exponentially? Relaxing constraints - like throwing out referential integrity, multi-row transactions n such - to avoid all global synchronization is the only way to scale beyond a few really expensive racks.