AWS for Industries
Intelligent rig operations classification with HITL on AWS
In the oil and gas industry, rig reports are essential for monitoring the performance and operations of drilling rigs. These reports contain a vast amount of unstructured data, including comments from rig personnel regarding rig activities such as drilling progress, equipment maintenance, process interruptions, and safety observations (Figure 1). In blog #1, we discussed a scalable solution that can extract data from rig reports and convert unstructured PDF reports into structured databases. In this blog, we will discuss how to use the extracted data from these reports for machine learning (ML) purposes—specifically, rig operation code conversion from one operator’s code to another operator’s code using text classification.
The purpose of logging daily rig operations is twofold: first, to have a record of what operations have been performed, and second, to learn from historical data to improve future rig operations and optimize rig performance, reduce downtime and carbon emissions, and improve process safety. To analyze daily rig operational data, various operations are classified into primary and secondary codes, which are specific to each individual oil company. In the case of joint venture programs, oil companies need to convert rig operation codes from each other daily. Traditionally, these operation codes are converted manually, which is a tedious, time-consuming, and costly task and can lead to delays in identifying and addressing critical issues that could impact rig operations. A self-adaptive ML model to perform automated code conversion from one operator to another can transform the way oil companies traditionally perform the same task. In this blog post, we propose an intelligent rig operations classification solution built on HAQM Web Services (AWS).
Figure 1. Sample of drilling operation summary log (commonly known as a time log)
For data extraction from rig reports, the solution uses HAQM Textract—an ML service that automatically extracts text, handwriting, layout elements, and data from scanned documents. (More details on scalable data extraction are available in blog #1.) Next, the solution uses HAQM Comprehend—a natural language processing (NLP) service that uses ML to uncover valuable insights and connections in text—to automate the classification of comments on rig reports, facilitating rapid and accurate analysis of rig operations. (Read more about analyzing reports with NLP here.) Finally, the solution sends these results to HAQM Augmented AI (HAQM A2I), which allows you to conduct a human review of ML systems to guarantee precision, to initiate a human-in-the-loop (HITL) workflow for manual review of predicted classes with low confidence scores. Manual corrections are stored and can be used to train new HAQM Comprehend models to improve accuracy in the next training cycle. This way, even if limited labeled data is available to train the HAQM Comprehend model, oil companies can use the proposed solution to automate data extraction and can achieve classification accuracy close to 100 percent after several training cycles and use this accurate data to generate valuable insights into rig operations, helping operators optimize drilling processes and improve rig performance.
Solution architecture
Figure 2. Architecture diagram for the drilling operation classification inference pipeline using drilling reports
The high-level architecture (Figure 2) for the data extraction and classification inference pipeline consists of several AWS services. First, the rig report PDFs are added to the input bucket in HAQM Simple Storage Service (HAQM S3), an object storage service offering cutting-edge scalability, data availability, security, and performance. Then, in the data extraction step, HAQM Simple Queue Service (HAQM SQS)—which provides fully managed message queuing for microservices, distributed systems, and serverless applications—initiates HAQM Textract using a TextractSubmission lambda function. Once HAQM Textract finishes its job, it sends a message to HAQM Simple Notification Service (HAQM SNS), a fully managed Pub/Sub service for A2A and A2P messaging. This message triggers a PDF2Json lambda function, which is configured to extract relevant information from the document (such as the time log table from a drilling report) and stores this data in an HAQM S3 bucket. Moving the data to HAQM S3 triggers another HAQM SQS queue to manage the HAQM Comprehend classification job as a batch process. Next, in the classification step, the extracted data is passed to a ClassificationJobSubmission lambda function, which performs an HAQM Comprehend inference job using a pretrained HAQM Comprehend custom classifier. The solution uses AWS Systems Manager Parameter Store—a capability of AWS Systems Manager that provides secure, hierarchical storage for configuration data management and secrets management—to store which model to use for inference. This step helps classify the extracted information into predefined categories.
After classification, in the validation step, a RulesEngine lambda function uses a confidence score threshold to determine if the classification results require further human validation. If the confidence score is lower than the threshold, the comments are dispatched to HAQM A2I for human validation. Alternatively, if the confidence score is higher than the threshold, the data can be sent directly to HAQM S3 for further processing.
Next, a FlattenA2IOutput lambda function combines the results obtained from HAQM A2I human validation with the automated validation process, performing an extract, transform, load (ETL) operation to flatten the output from HAQM A2I. Finally, an ExportA2IOutput lambda function is used to update time log tables with converted rig operation classification codes, which can be put into HAQM QuickSight—a service that powers data-driven organizations with unified business intelligence (BI) at hyperscale—to build dashboards, providing visualizations and insights for further analysis and decision-making. These dashboards can be integrated with oil companies’ in-house applications.
HAQM Comprehend custom classification
To use HAQM Comprehend for training the custom model, the first step is to prepare the data by creating a labeled dataset. This dataset can be obtained from a database where the business already has labeled comments and corresponding classes. The labeled dataset is then used to train a custom classification model using the custom classifier feature of HAQM Comprehend. The custom classifier feature facilitates the creation of a model specific to users’ needs, using their own labeled data. Users can select the language of the text and the categories for classification—in this case, rig activities such as drilling, tripping, and circulation. In that situation, the trained model can then be used to classify the remaining comments on rig reports.
To evaluate the accuracy of the custom model, the user can use the built-in evaluation tool in HAQM Comprehend, which provides metrics such as precision, recall, and F1 score. Overall, the HAQM Comprehend custom classification feature provides a flexible and scalable solution for training custom models for specific use cases, such as classifying comments on rig reports. We will discuss HAQM Comprehend model retraining in the next sections.
HAQM A2I
After the initial training of the ML pipeline using HAQM Comprehend, it is important to continue retraining the model to improve its accuracy over time. One effective way to achieve this is by using HITL validation through HAQM A2I. A RulesEngine lambda function implements HAQM A2I to allow human reviewers to validate classification results using a custom web user interface, which provides instructions for reviewing and a link to the actual document for quick review (Figure 3b). Based on HAQM Comprehend prediction confidence, the classified comments with lower confidence scores are routed for human review. An example is shown below (Figure 3b) where each comment (remark) is routed for human review. In this case, we are showing time log entries in a batch of five comments per review. The first comment was wrongly classified, which can be fixed by changing the “Category” and “Activity” fields during the human review process. These corrections are stored in HAQM S3 and then fed back into the training dataset to improve classification accuracy. As more and more data are processed and correctly classified, with retraining, the model will become increasingly accurate and will require less human validation over time. This continual feedback and human review help to verify that data accuracy remains as close to 100 percent as possible, promoting accurate business decisions, helping to improve the model’s predictive power, and adapting to changing data patterns and trends.
Figure 3a. Time log table as found in a drilling report
Figure 3b. HAQM A2I HITL user interface to correct classification results for drilling operations
Model retraining
The model retraining pipeline begins with a schedule in HAQM EventBridge, a service that helps users build event-driven applications at scale across AWS, existing systems, or software-as-a-service (SaaS) applications. This schedule initiates the pipeline (Figure 4), triggering a data preparation lambda function, which uses a container image stored in HAQM Elastic Container Registry (HAQM ECR), a fully managed container registry offering high-performance hosting. The container image contains the necessary data preparation scripts, and it is used to create a processing job in HAQM SageMaker, a service that is used to build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows. This processing job prepares the data for training. Once the processing job is completed, a message is sent to HAQM EventBridge, which in turn triggers a training lambda function. The training lambda function initiates an HAQM Comprehend training job, which begins the process of training the model.
Figure 4. HAQM Comprehend model training pipeline
Business impact of the solution
The intelligent drilling report classification solution can have a significant impact on the oil and gas industry. By automating the classification of comments on drilling reports, the solution can provide valuable insights into rig operations, including identification of potential issues and performance optimization.
Two of the key benefits of the solution are improved safety and reduced nonproductive time (NPT). By automatically identifying comments related to these classes, companies can quickly analyze which operational changes are leading to unfavorable events and can take corrective action to avoid costly downtime, damage to equipment, and loss of life. In addition, by identifying issues early, companies can proactively implement measures to prevent future incidents.
Another important benefit of the solution is cost reduction. By automating the classification of drilling reports, companies can save time and resources that would otherwise be spent on manual data processing. This saved time can free up employees to focus on more valuable tasks, such as analyzing the insights gained from the data. In addition, by optimizing rig performance, companies can reduce operational costs, such as fuel and maintenance expenses, which can have a significant impact on the bottom line.
Furthermore, the solution can increase efficiency by providing near real-time insights into rig operations. By quickly identifying potential issues, companies can take corrective action before they become larger problems. This ability can help companies optimize rig performance, reduce downtime, and increase production. The solution can also help companies identify areas where they can improve processes and procedures, leading to even greater efficiencies over time.
Overall, the intelligent rig report classification solution can have a transformative impact on the oil and gas industry. As a result, companies that implement the solution are likely to gain a competitive advantage in the market, leading to long-term success and growth.
Conclusion
In conclusion, the implementation of the intelligent drilling report classification solution on AWS can have a significant impact on the oil and gas industry. With the help of NLP and ML technologies, the solution can provide valuable insights into rig operations and can identify potential issues before they become major problems. By retraining the ML model with the help of HITL validation, the solution can continually improve its accuracy and relevance over time. The benefits of the solution include improved safety, reduced costs, increased efficiency, and better operational performance. Overall, the intelligent drilling report classification solution on AWS can be a game-changer for the oil and gas industry and can pave the way for a more automated and efficient future. For example, in blog #3, the authors will describe how to use generative artificial intelligence (AI) to search for information in thousands of drilling reports.