What is PFA for?

Hardening a data analysis

ML/AI deployment has not gotten any easier. A recent Gartner report show(ed) only 53% of projects make it from artificial intelligence (AI) prototypes to production.

There are usually significant differences between environments for building models and environments for deploying models.

What is needed is a solution that is designed for portability, stability, safety, and security.

Modelers can build models in a Python development environment then deploy them to a pure Java environment with no code change with guaranteed

PFA offers

Development: insight comes from exploratory tinkering.Production: scalability comes from good design.
Development: insight comes from exploratory tinkering.Production: scalability comes from good design.

The Portable Format for Analytics (PFA) is a common language to help smooth the transition from development to production. PFA-enabled analysis tools can export machine learning or statistical models as JSON documents with a structure defined by the PFA specification. For instance, suppose a machine learning algorithm produces a classifier that has to be run in another application. If it produces that classifier in PFA format, any PFA-enabled application running on any system can execute it in a safe, controlled way.

Developer tools that speak PFA can deploy their inference engines on production environments that understand PFA. The only connection between the two worlds is the PFA document, a human-readable text file. In fact, this text file could have contributions from several statistical packages, or it could be modified by JSON-manipulating tools or by hand before it is delivered.

By contrast, scoring engines in custom formats present the system maintainers with three options: (a) try to install the data analyst’s tool across the production environment, including all of its dependencies, (b) port the algorithm and spend weeks chasing small (but compounding) numerical errors, and (c) dumb-down the analytic. None of these are good options.

Separation of concerns

PFA enables the safe deployment of models. Since inference engines written in PFA are not capable of accessing or manipulating their environment, they cannot jeopardize the production system. Data analysts can focus on the mathematical correctness of their algorithms and security reviews are only needed when the pipeline itself changes.

Tools such as Hadoop and Storm provide automated data pipelines, separating the data flow from the functions that are performed on data (mappers and reducers in Hadoop, spouts and bolts in Storm). Ordinarily, these functions are written in code that has access to the pipeline internals, the host operating system, the remote filesystem, the network, etc. However, all they should do is math.

PFA completes the abstraction by encapsulating these functions as PFA documents. From the point of view of the pipeline system, the documents are configuration files that may be loaded or replaced independently of the pipeline code.

This separation of concerns allows the data analysis to evolve independently of the pipeline. Since scoring engines written in PFA are not capable of accessing or manipulating their environment, they cannot jeopardize the production system. Data analysts can focus on the mathematical correctness of their algorithms and security reviews are only needed when the pipeline itself changes.

This decoupling is important because statistical and machine learning models usually change more quickly than applications and frameworks that run. Model details are often tweaked in response to discoveries about the data and models frequently need to be refreshed with new training samples.

Safe deployment applies to the action as well. This is important when critical target environments remain stable, deployment may have issues (such as pushes to edge devices) or there is concerns about the safety of the edge deceive itself. Also decoupling enables new forms of interaction with AI/ML such as codelss programing and persona assistants.

Just as the PFA inference engine is not capable of accessing or manipulating their environment, the underlying code that understands the PFA is untouched as new or updated inference engines are pushed out. Operations can focus on deploying targeted engines and not worry about the embedded application.

Flexibility and safety

As models push to the edge, new methods of model encapsulation are required. Edge devices require stricter execution parameters and safe code deployment. Traditionally, model deployment to the edge required customer code in non-machine learning oriented languages such as javascript, restricting what could be achieved. As edge device grew more powerful, with AI technology embedded in CPU architectures.

The Predictive Model Markup Language (PMML) was an attempt to bridge this gap by standardizing several of the most common kinds of scoring engines. Like PFA, PMML documents are intermediate text files (XML) produced by data analysis tools and consumed by an executable in the production environment. New functionality has been added to PMML over the past 17 years, but it is still based on tables of model parameters. Even a modest extension of a PMML inference engine requires a new version of PMML to be adopted, which can take years.

PFA serves this purpose with far more generality. Unlike PMML, PFA has control structures to direct program flow, a true type system for both model parameters and data, and its statistical functions are much more finely grained and can accept callbacks to modify their behavior. The author of a PFA document can construct new types of models from building blocks without waiting for the new model to be explicitly added to the specification.

PFA is more flexible than PMML, but safer than custom code. In the language of optimizations, it is the most flexible way to describe a scoring engine subject to the constraint that it won’t break the data pipeline.

Overview of PFA capabilities

The following contribute to PFA’s flexibility:

The following contribute to PFA’s safety:

To learn more, read the tutorials (which have interactive examples, so that you can see PFA in action) or the complete reference, which are linked in the sidebar or at the top of this page.