Network Optimization and AI Inferencing Management for Telepathology Reference Implementation

Published: 03/02/2021

Edge Software Hub   /   Network Optimization and AI Inferencing Management for Telepathology  /  Documentation

Overview

The Network Optimization and AI Inferencing Management for Telepathology Reference Implementation uses a digital pathology workload example to showcase how an optimized software architecture can simplify and automate networking challenges and optimize AI model deployment and management within a hospital system.

Telepathology poses unique challenges to the medical community:  

  • Efficient and accurate data management for sharing within or outside of IT infrastructures.  
  • Very large files (80GB uncompressed).  
  • An ecosystem extending from the Whole Slide Imaging (WSI) equipment on the edge to on-premise and cloud server platforms. 
  • A multi-access network (i.e., wired, WiFi, 4G/5G, etc.), where logic is needed to properly route the data and security is of the utmost importance and cannot be violated. 

This RI provides solutions to some of those unique challenges: 

  • Automated network abstraction helps avoid complex data routing and traffic shaping and gives confidence in efficient data sharing and AI model utilization.
  • Decreases ‘hands-on’ management for data routing as well as AI model optimization within IT infrastructure. 

Select Configure & Download to download the reference implementation and the software listed below.  

Configure & Download


 

 

 

 

 

 

 

 


Time to Complete

Programming
Language

Available Software

60 - 90 minutes

Python*


OpenNESS version 20.12 

Google Cloud SDK 

OpenVINO™ Model Server (OVMS) version 2021.2


Target System Requirements 

Controller and Node Systems 

  • One of the following processors: 
    • Intel® Xeon® scalable processor. 
    • Intel® Xeon® processor D. 
  • At least 128 GB RAM. 
  • At least 256 GB hard drive. 
  • Intel® Ethernet Converged Network Adapter X710-DA4
  • CentOS* 7.8.2003.  
  • An Internet connection.

Client Systems

  • One of the following processors: 
    • Intel® Core™ processor. 
    • Intel® Xeon® processor. 
  • At least 8 GB RAM. 
  • At least 256 GB hard drive. 
  • Intel® Ethernet Converged Network Adapter X710-DA4
  • An Internet connection. 

How It Works

Component Overview 

The Network Optimization and AI Inferencing Management for Telepathology RI includes the OpenVINO™ Model Server (OVMS) along with OpenNESS software.  

Open Network Edge Services Software (OpenNESS) Toolkit 

The Open Network Edge Services Software (OpenNESS) toolkit enables developers to port existing applications running in the cloud, provides components to build platform software, the ability to build and deploy E2E edge services in the field as well as perform benchmarking for RFPs for network/on-premise edge. Learn more.  

OpenVINO™ Model Server (OVMS) 

OpenVINO™ Model Server (OVMS) is a scalable, high-performance solution for serving machine learning models optimized for Intel® architectures. The server provides an inference service via gRPC or REST API - making it easy to deploy new algorithms and AI experiments using the same architecture as TensorFlow* Serving for any models trained in a framework that is supported by OpenVINO.  

The server implements gRPC and REST API framework with data serialization and deserialization using TensorFlow Serving API, and OpenVINO™ as the inference execution provider. Model repositories may reside on a locally accessible file system (for example, NFS), Google Cloud Storage* (GCS), Amazon S3*, MinIO*, or Azure Blob Storage*. Learn more.  

Grafana 

Data visualization is done on Grafana and allows you to view the cluster monitoring dashboard. System usage specific metrics can be seen such as network I/O pressure, Total usage, Pods CPU usage, System servicing CPU usage, Containers CPU usage and All Processors CPU usage.

Figure 1. Architecture Diagram

The Client here is a machine with medical images on which inference is to be performed. It continuously sends RPC calls to OVMS, to which OVMS performs inference using the underlined hardware and gives the result back to the client. This result obtained is then pushed to InfluxDB and fetched by Grafana for Visualization. The generated labelled image is given in parallel to Flask server, which is integrated with Grafana. We also have Prometheus sending pod metrics like memory usage, CPU usage, etc. to Grafana. For manual execution and tinkering, follow the detailed startup steps mentioned below.


Get Started 

Prerequisites 

Make sure that the following conditions are met properly to ensure smooth installation process. 

1. Hardware Requirements 

  • Make sure you have a fresh CentOS 7.8.2003 installation with the Hardware specified in the Target System Requirements section. 

2. Proxy Settings 

  • If you are behind a proxy network, please ensure that proxy addresses are configured in the system. 
export http_proxy=<proxy-address>:<proxy-port> 

export https_proxy=<proxy-address>:<proxy-port>

3. Date & Time  

  • Make sure that the Date & Time are in sync with current local time.

4. IP Address Conflict 

  • Make sure that the Edge Controller IP is not conflicting with OpenNESS reserved IPs. For more details, please refer to IP address range allocation for various CNIs and interfaces in the Troubleshooting section.

Step 1: Install Google Cloud SDK*  

Follow the steps below from the controller device to install the Google Cloud SDK*.  

Note You should have a Google Cloud platform account to complete the installation and utilize the RI. 

1. Download the Google Cloud SDK package for Linux using the following command: 

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-318.0.0-linux-x86_64.tar.gz

2. Extract the downloaded package and install the Google Cloud SDK from the extracted directory using the following commands: 

tar -xf google-cloud-sdk-318.0.0-linux-x86_64.tar.gz 
./google-cloud-sdk/install.sh 
./google-cloud-sdk/bin/gcloud init 

3. Enter the account details and configure the cloud project when prompted. 

Figure 2. Google Cloud Project SDK

 

Note Restart the terminal after initializing the Google Cloud. 

Step 2: Install the Reference Implementation  

Note Before installing the Reference Implementation, make sure that the Ubuntu* 18.04 Client system is on the same network as the Server, so that they are able to communicate. 

Note On Ubuntu 18.04 Client system, follow the steps below from the Client system to be able to ssh as root user: 

  • Open the file /etc/ssh/sshd_config and go to the following line: 
PermitRootLogin without-password
  •  Change the mentioned line to (Uncomment the line if it is commented): 
PermitRootLogin yes 
  • Save the file and then restart the SSH server, using the following command: 
sudo service ssh restart
  •  Set sudo password, using the following command:  
sudo passwd 

Select Configure & Download to download the reference implementation and then follow the steps below from the Controller to install it. 

Configure & Download

1. Make sure that the Target System Requirements are met properly before proceeding further.   

  • For single-device mode, only one machine is needed. (Both Controller and edge node will be on same device.)  
  • For multi-device mode, make sure you have at least two machines (one for Controller and other for Edge Node).  

Note Multi Device is not currently supported for this release 

2. Open a new terminal as a root user,and and move the downloaded zip package to /root folder.  

sudo passwd
su
mv <path-of-downloaded-directory>/Network_Optimization_and_AI_Inferencing_Management_for_Telepathology.zip /root 

3. Go to /root directory using the following command and unzip the RI.   

cd /root  
unzip Network_Optimization_and_AI_Inferencing_Management_for_Telepathology.zip 

4. Go to Network_Optimization_and_AI_Inferencing_Management_for_Telepathology/ digital_pathology/ directory.  

cd Network_Optimization_and_AI_Inferencing_Management_for_Telepathology 

5. Change permission of the executable edgesoftware file.  

chmod 755 edgesoftware 

 6. Run the command below to install the Reference Implementation:  

./edgesoftware install

7. During the installation, you will be prompted for the Product Key. The Product Key is contained in the email you received from Intel confirming your download.

Note Installation logs are available at path: /var/log/esb-cli/Network_Optimization_and_AI_Inferencing_Management_for_Telepathology//install.log

Figure 3. Product Key

8. During the installation, you will be prompted to configure few things before installing OpenNESS. Refer the screenshot below to configure. 

Note Multi Device is not currently supported. Select Single Device when prompted to Select the type of installation.

Figure 4. OpenNESS Configuration

 

Note If you are using a Microsoft Azure* instance, please enter the Private IP address as the IP address of Controller. 

9. During the installation, you will be prompted for the IP address of the controller and client. Enter the correct IP addresses.

Figure 5. IP Address of Client and Controller

 

10. When the installation is complete, you see the message Installation of package complete and the installation status for each module.  

Figure 6. Installation Status

 

11. If OpenNESS is installed, running the following command should show output similar to the image below. All the pods should be either in running or completed stage.  

kubectl get pods -A 
Figure 7. Status of Pods

Step 3: Copy the Model Files to Google Cloud Storage 

1. Navigate to the node directory.  

cd /root/<path_to_the_RI_directory>/network_optimization_and_ai_inferencing_management_for_telepathology/Network_Optimization_and_AI_Inferencing_Management_for_Telepathology_1.0.0/Network_Optimization_and_AI_Inferencing_Management_for_Telepathology/TelePathology/node/

2. Create a Cloud Storage bucket using the Guide.  

3. Provide required Permissions to the storage bucket using the following steps: Click on the storage bucket to view the Bucket details, then click on Permissions and select Add

Figure 8. Required Permissions

 

  • Select allUsers on the New Members section. 
  • Select Storage Legacy Bucket Owner on Role section. 
  • Click on ADD ANOTHER ROLE.
  • Select Storage Object Viewer on the next Role section. 
  • Click on SAVE.
Figure 9. Add Roles
  • Click on ALLOW PUBLIC ACCESS when prompted.
Figure 10. Allow Public Access

 

4. Upload the models/ directory to Google Cloud storage bucket from an Internet browser. Alternatively, to upload the models/ directory to Google Cloud storage bucket from the terminal, use the following command: 

gsutil cp -r models/ gs://<your_google_cloud_storage_bucket_name>/ 

Step 4: Start the OpenVINO™ Model Server (OVMS) 

1. Navigate to the deploy directory from the terminal. 

cd deploy/

2. Run the command below. 

kubectl create namespace monitoring

 3. Run the command below to start the OVMS. 

helm install --set model_name=stardist-0001,model_path=gs://<bucket-name>/models/,http_proxy=<http-proxy>,https_proxy=<https-proxy>  ovms ./ovms

4. Use the following command to start the server. 

helm install server ./server --set hostIP=<controller-IP> 

5. Run the commands below from the same directory to deploy Grafana, InfluxDB and Prometheus containers. 

helm install grafana ./grafana --set hostIP=<controller-IP> 
helm install influxdb  ./influxdb 
helm install prometheus  ./prometheus 

6. Run the command below to make sure the status of the ovms pod is Running.

kubectl get pods

7. Run the command below to observe the logs. 

kubectl logs -f <ovms_pod_name>
Figure 11. Logs of OVMS Pod

 

Note On the logs from OVMS pod, verify that the INFO of models is similar to the highlighted section on the above screenshot. The number of versions of the model should be 1. 


Run the Application 

Note The following steps to run the application should be executed from a Client system. The Client should be a different system but on the same network as the Server, so that they are able to communicate. 

Once the installation is completed on the server, navigate to the following directory from the client system. 

# su 
# cd ~/root/TelePathology/client/

Activate the client, using the following command. 

# source tele_pathology/bin/activate 

Start the client using the following command. 

# chmod +x run.sh 
# ./run.sh <IP_address_of_Controller>
Figure 12. Run the Application

 

Once the server is started, sample input images from images/ directory on client are sent to the server system for inference and processing. The results are sent back to the client and can be visualized on a Dashboard on the client.


Data Visualization on Grafana 

1. Navigate to :30800 on your browser. 

2. Login with user as admin and password as admin.  

3. Click on Home

4. Select Telepathology Inference Metrics.

Figure 13. Inference Metrics on Grafana Dashboard

5. To view the cluster monitoring dashboard, click on the Dashboard name from the top-left corner of Grafana and select Kubernetes cluster monitoring dashboard.

Figure 14. Choosing the Grafana Dashboard

6. The dashboard with the system usage specific metrics can be visualized.

Figure 15. Kubernetes Cluster Monitoring on Grafana Dashboard

You can scroll down the dashboard and view specific metrics like Pods CPU usage, and Containers CPU usage as shown below. 

Figure 16. Kubernetes Cluster Monitoring on Grafana Dashboard (scroll down) 

Stop the Application

To remove the deployment of this reference implementation, run the following commands. 

Note This will remove all the running pods and the data and configuration stored in the device. 

helm delete server 
helm delete grafana 
helm delete influxdb 
helm delete prometheus

Summary and Next Steps

This reference implementation highlights flexibility, modularity and ease of deployment for on-premise and at the network edge through the OpenNESS application. Coupling this application with the OpenVINO™ Model Server’s scalability for AI model deployment and management provides the needed software components to enable a telepathology service.   

As a next step, you can experiment with accuracy and throughput by using other pathology datasets.   

To understand more about OpenNESS architecture, building blocks and implementation types, we recommend this GitHub page.  


Learn More 

To continue learning, see the following guides and software resources: 


Troubleshooting 

Pods Status Check 

Verify that the pods are Ready as well as in Running state using below command: 

kubectl get pods -A

 If they are in ImagePullBackOff state, manually pull the images using: 

docker login 
docker pull <image-name>

If any pods are not in Running state, use the following command: 

kubectl describe -n <namespace> pod <pod_name>

Docker Pull Limit Issue 

If docker pull limit error is observed, Login with your docker premium account. 

If Harbor Pods are not in Running state, please login using the below command: 

docker login

If Harbor Pods are in Running state, please login using the below commands: 

docker login 
docker login https://<Machine_IP>:30003  
<Username – admin> 
<Passsword - Harbor12345> 

Installation Failure 

If the OpenNESS installation has failed on pulling the OpenNESS namespace pods like Grafana, Telemetry, TAS, etc., please reboot the system and after reboot, execute the following command: 

reboot 
su  
swapoff -a  
systemctl restart kubelet (Wait till all pods are in “Running” state.) 
./edgesoftware install 

Pod Status Shows “ContainerCreating” for Long Time 

If Pod status shows ContainerCreating or Error or CrashLoopBackOff for a while (5 minutes or more), run the following commands: 

reboot 
su  
swapoff -a  
systemctl restart kubelet (Wait till all pods are in “Running” state.) 
./edgesoftware install 

Subprocess:32 Issue 

If you see any error related to subprocess, run the command below: 

pip install --ignore-installed subprocess32==3.5.4 

ImportError 

While using the run.sh from client and observe the ImportError, run the following commands:

apt-get install ffmpeg libsm6 libxext6 -y

IP Address Range Allocation for Various CNIs and Interfaces 

The OpenNESS Experience kits deployment uses/allocates/reserves a set of IP address ranges for different CNIs and interfaces. The server or host IP address should not conflict with the default address allocation. In case if there is a critical need for the server IP address used by the OpenNESS default deployment, it would require modifying the default addresses used by the OpenNESS. 

The following files specify the CIDR for CNIs and interfaces. These are the IP address ranges allocated and used by default just for reference. 

flavors/media-analytics-vca/all.yml:19:vca_cidr: "172.32.1.0/12" 
group_vars/all/10-default.yml:90:calico_cidr: "10.243.0.0/16" 
group_vars/all/10-default.yml:93:flannel_cidr: "10.244.0.0/16" 
group_vars/all/10-default.yml:96:weavenet_cidr: "10.32.0.0/12" 
group_vars/all/10-default.yml:99:kubeovn_cidr: "10.16.0.0/16,100.64.0.0/16,10.96.0.0/12" 
roles/kubernetes/cni/kubeovn/controlplane/templates/crd_local.yml.j2:13:  cidrBlock: "192.168.{{ loop.index0 + 1 }}.0/24"

The 192.168.x.y is used for SRIOV and interface service IP address allocation in Kube-ovn CNI. It is not allowed for the server IP address, which conflicts with this range. Completely avoid the range of address defined per the netmask as it may conflict in routing rules. 

For example, if the server/host IP address is required to use 192.168.x.y while this range by default is used for SRIOV interfaces in OpenNESS. The IP address range for cidrBlock in roles/kubernetes/cni/kubeovn/controlplane/templates/crd_local.yml.j2 file can be changed to 192.167.{{ loop.index0 + 1 }}.0/24 to use some other IP segment for SRIOV interfaces.

Support Forum 

If you're unable to resolve your issues, contact the Support Forum.  


Citations

*github.com/mpicbg-csbd/stardist
@inproceedings{schmidt2018,
  author    = {Uwe Schmidt and Martin Weigert and Coleman Broaddus and Gene Myers},
  title     = {Cell Detection with Star-Convex Polygons},
  booktitle = {Medical Image Computing and Computer Assisted Intervention - {MICCAI} 
  2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part {II}},
  pages     = {265--273},
  year      = {2018},
  doi       = {10.1007/978-3-030-00934-2_30}
}

@inproceedings{weigert2020,
  author    = {Martin Weigert and Uwe Schmidt and Robert Haase and Ko Sugawara and Gene Myers},
  title     = {Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy},
  booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV)},
  month     = {March},
  year      = {2020},
  doi       = {10.1109/WACV45572.2020.9093435}
}
**bbbc.broadinstitute.org/BBBC038
"We used image set BBBC038v1, available from the Broad Bioimage Benchmark Collection [Caicedo et al., Nature Methods, 2019]."

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.