Miró is our integrated analytical tool covering data extraction, manipulation, exploration, reporting, prediction (including uplift modelling), and test-driven data analysis. It features a web-based interface for mixed text and graphical output, as well as off-line script execution, and a Python API.
Almost every data science project begins with an exploratory phase in which the analyst learns about the data and tests ideas, usually using a mixture of ad hoc querying and aggregation, visualization, filtering, profiling, segmentation, deriving new fields and so forth. Miró is particularly well-suited to this phase, and enhances its utility by keeping an executable audit trail of what has been done, allowing this initial analysis to be efficiently translated into a more production-ready phase.
Miró implements production-oriented analytics, meaning that it focuses on allowing analysts to get results as quickly and painlessly as possible, from data import to production-ready or near-production-ready output. Its Unix-style command-line interface is normally accessed through a web browser, allowing rich text and graphical output, but is also fully functional through plain-text terminal, locally or on a remote server.
Miró generates high-quality, sometimes graphical output, drawing inspiration from Edward Tufte, minimizing chart junk and maximizing meaningful information content. It also has the ability to produce animated output, HTML reports, PDF reports (though LaTeX source generation), text files, Excel spreadsheets and to write directly to database tables.
Miró includes all the functionality from our open-source TDDA library for test-driven data analysis, together with various enhancements including
Miró reads and writes the same TDDA files as the open-source version, allowing the two to be mixed, but gives a more seamless, polished, supported experience compared with the open-source package.
In addition to standard predictive modelling approaches, Miró incorporates uplift modelling as a core analytical capability. Uplift models are used to analyse marketing campaigns in which a randomized control group has been kept. Uplift trees model the difference in behaviour between the members of the treated and control populations, helping marketers to understand which actions are effective for which segments and (equally importantly) which actions have negative effects on other segments. This is extremely powerful in the context of customer retention and sales campaigns such as cross-selling, up-selling and deep-selling. Miró not only features integrated significance-based uplift trees, but also a suite of support tools for operations including
Miró provides multiple interfaces, including a programmatic interface (an API), a command-line/scripting interface and interactive web access. The API layer makes it a powerful base for embedded analytical applications. Miró also includes a very powerful expression language for data manipulation.
Miró datasets contain an audit trail showing the sequence of operations that resulted in any final dataset, allowing diagnosis of problems and tracking of data provenance. It also allows the full history of datasets to be reliably traced, even when they may have been worked on across multiple sessions, perhaps on multiple machines, by multiple people.
Miró automatically generates detailed logs providing not only a further audit trail, but also the ability to rerun analysis sessions, either verbatim or with specified modifications. It logs both command sequences and output (in multiple forms) meaning that work is never accidentally lost, results can always be traced in ad hoc analyses can always be repeated or turned into re-usable scripts.
Miró is cross-platform (across Unix, Linux, Mac and Windows) with a focus on standards compliance.
All Miró functionality is available using its native back-end, in which data is stored in its own column-oriented data store and all manipulations are performed directly by Miró code. This is suitable for interactive use and batch use
A significant subset of Miró's functionality is also available using a database back end. In this mode, Miró connects to a database and collects metadata, but does not extract the main data from tables. Rather, Miró issues SQL (and in some cases calls in-database functions) to perform equivalent operations. Depending on the relative power and capacity of the machine running Miró and the database hardware, as well as data volume and the nature of the operations being performed, this can sometimes be faster and sometimes slower than extracting the data into Miró, performing whatever analysis is required, and writing any results back. The level of support varies across database systems, but includes Postgres, Greenplum, MySQL, SQLite and MongoDB
This approach also allows analytical workflows to be developed in one mode (most commonly using the native back end) and then deployed, with minimal or no changes, using a database. This is a popular development-production split for some clients.