We created the Kiji Project to provide a framework for building Big Data applications. Kiji is a collection of modularized components that can be combined as needed. The first Kiji Project is KijiSchema, which provides a simple Java API for storing and managing typed data in HBase using Avro serialization. KijiSchema includes a simple DDL for defining and managing layouts and schemas and supports complex data types, column keys and time-series. It also manages evolving cell-level schemas and offers native MapReduce Input / Output Formats.
KijiMR connects KijiSchema to Hadoop MapReduce, simplifying the development of predictive models on up-to-date data. KijiMR includes built in bulk importers for loading data from a variety of sources directly into KijiSchema. The Gatherer interface allows developers to scan over columns of a table and emit key-value pairs for use in MapReduce pipelines. Producers implement computation functions for updating individual rows in a KijiSchema table. KijiMR also includes command-line tools making it easier to launch and test MapReduce jobs without writing boilerplate code, Builder APIs to programmatically construct MapReduce jobs, Key-value store lookups to efficiently perform map-side joins and machine learning model scoring.