Python is one of the world’s most popular programming languages, used ubiquitously for machine learning and AI projects, with many open-source, free programs and projects added daily. One major drawback of Python—when taking it into an enterprise, multi-user, multi-project environment—is that it lacks proper, functional versioning. There is also no out-of-the-box mechanism to deliver load-balanced, multi-node Python processing.
These are major operational challenges in deploying Python as an enterprise-grade solution.
To plug the gap, Pyramid’s “Virtual Environments” manager allows administrators to create and govern multiple environments of Python on one or more dedicated “AI” servers. The environments can then be used to stage and run Python scripting within Pyramid’s many scripting and ML-related functions across the Model, Discover, and Formulate tools.
Python by itself has no inherent versioning switches, and it can only deploy specific versions of third-party packages for a given Python “installation.” This makes it extremely difficult and complicated to create and run different Python projects using a single setup, as each program may use a different version of the language engine or different package versions—or both.
Most third-party BI tools mostly treat Python as a light add-on, exposing limited Python capabilities (see the previous blog) within single-version Python environments.
Pyramid’s Virtual Environments manager can create and maintain multiple virtual environments on one or more “AI” servers. Different versions of Python can operate on the same host, so it is possible to deploy multiple coexisting versions. Environments can be added, removed, and edited. They can also be secured and exposed to different tenants in a multi-tenant deployment.
One of the externalities of the Pyramid approach is that each virtual environment can be hosted on one or more servers. As such, Pyramid can also deliver a multi-node, load-balanced solution for Python processing—where customers choose to replicate virtual environments across the Pyramid cluster.
Ingrid is the data scientist at GIA Logistics, which is using Pyramid to create analytical reports and dashboards on their operational data store housed in an SAP Hana database. Ingrid wants to run a Python script that reads an input table from SAP Hana, then have the script combine multiple CSV files and perform additional ETL data cleaning operations, all using the “geopy” library version 1.17.1 on Python 3.6.3 to retrieve an address coordinate from an address, producing a single output. She also wants to use another third-party script that calculates the geo distance from the closest branch. This script uses “geopy” library with version 2.0.0. that uses Python 3.7 since there are functionality limitations in the 1.17.1 geopy package.
Ingrid must be able to include both scripts in her data model, even though they use different versions of the same library, as well as different Python versions. She also has many workflows running Python scripts in the environment and will require multiple servers to accommodate this workload.
GIA Logistics has two Pyramid AI servers that have been deployed to both Python 3.6.3 and 3.7 environments to accommodate multiple users running Python scripts.
Ingrid copies her first script, which uses the Python Legacy Environment, containing the geopy1.7.1 library on Python 3.6.3, to retrieve the address coordinates from an address. The same script also performs additional ETL and data cleaning operations.
She then copies her second script using the Pyramid Python Environment that uses the geopy 2.0.0 library on Python version 3.7 to calculate the geo distance from the closest branch.
Ingrid is then able to print a report with all the latitude and longitude coordinates details. Then she can reuse them whenever required as they are now stored in the SAP Hana database.
Ingrid can then reuse both the distance and the coordinates as well, as they are now stored in the SAP Hana database.
Python is used extensively for machine learning and AI projects, but lacks a solid mechanism for versioning and load balancing. Users wanting to utilize specific third-party packages face a challenge to run different Python programs from the same setup.
Most third-party BI tools offer single-version Python environments, which are limited to small data sets and small business problems. These are meant to be used “in-line,” rather than as proper development processes that address real-world machine learning challenges.
Pyramid solves the entire problem set, allowing administrators to deploy multiple coexisting virtual environments for Python on multiple servers. Python packages can be installed with the ability to specify any version, providing the ability to run on multiple servers, across multiple users with multiple projects. And suddenly, an enterprise grade approach to delivering Python-based analytics is achievable.