Stochastic Solutions


Testing Data & Data Processes with AI & Python
Half-day Training • Edinburgh • 20th March 2019
Location: BMA Scotland, 14 Queen Street, Edinburgh, EH2 1LL, Scotland.
Tickets: £25 + VAT
DataFest 2019 brings together local and international talent, industry, academia and enthusiasts who all share at least one interest — data! With a desire across sectors to succeed at Data Driven Innovation, how can we be sure that our data — our raw material — is as good as it should be?
This training brings the ideas and benefits of test driven development to the arena of data analysis. Using the open source Python TDDA library(test-driven data analysis), we'll work with data in CSV files, Pandas DataFrames, and relational databases.

Part 1: Testing Data Processes and Pipelines

Introduction to reference tests and how these can be written for various kinds of analytical processes over different data types. Topics will include:

Part 2: Using AI to Generate Constraints from Data and their use for Detecting Bad Data

Using constraints to verify data, including:
Crucially, we will show not only how constrains can be used to detect change and problems in data, but also how those constraints can be automatically generated using AI methods in the tdda library.
The methods and tools are applicable to structured data and data pipelines using any software, not just Python.


The course is primarily aimed at practising data scientists with some familiarity with Python, or programmers coming to data science. Previous experience of testing and Pandas will be advantageous but is not required.
Although the specific library used is Python, the data testing is almost entirely language neutral, and even the testing of data processes can be used with other languages, from within a Python test script.
Non-programmers with an interest in QA for data and data processes will also benefit from some of the overview material, and are welcome to attend, but may need more help with the hands-on parts of the course.


It is essential that attendees bring a laptop (Mac, Linux or Windows) with a working python environment installed with Pandas, NumPy, as well as the TDDA library (tdda; available with pip from PyPI, and in source form on Github).
Detailed instructions on system configuration will be supplied to registered attendees before the session, as well as instructions on how to test the installation.
Help will be available at the venue in the 30 mins prior to the start of the workshop (from 13:30) for anyone unable to configure their environment.
DataFest 2019 Logo
Company number SC329851. Registered office: 16 Summerside Street, Edinburgh, EH6 4NU.
Copyright © Stochastic Solutions Limited 2007–2023.