Big image data analysis for COPD

Motivation / Scope

Chronic-Obstructive- Pulmonary-Disease (COPD) is a chronic disorder which kills millions of individuals worldwide each year and disables millions more. It is typically, although not exclusively, a disease of older adults and is primarily attributed to chronic exposure to lung irritating substances contained in cigarette smoke and air pollution but even chronic untreated asthma and infectious diseases.

In most forms of COPD, airways become progressively obstructed and lung tissue slowly destroyed. Although there is currently no cure for COPD there are treatments being tested which could possibly help slow down its rate of progression. Critical to accessing the treatment of such therapies is the use of objective and precise measures of the extent or stage of the disease in individuals affected.

Assessment of COPD through radiologic imaging

One assessment of COPD is through radiologic imaging with CT scans. With these scans, a radiologist can assess the general extent of COPD rather rapidly, but it is difficult for a human observer to determine fine variations in the extent of the disease. This is only possible through the use of objective calculated indexes. Several such quantitative measurements of COPD extent are currently being evaluated. Most of these index the amount of lung destruction based on the reduction in lung density to x-rays as measured in CT. Two common indexes are PD15 and LAA-850, but the best parameters to use to calibrate these indexes to best reflect disease extent are still under study.

Different approaches for COPD assessment

Approach 1: Manual Lung Segmentation and Analysis (past)

In the not too distant past, the calculation of lung quantities was performed “manually”, at least in the sense of the identification of lung tissue. In this system, a radiologist, or more often, a radiology resident, would examine each slice of a Thoracic CT exam of a patient one by one. A typical Thoracic CT exam contains a few hundred images of the lungs usually in the axial plane. The resident would use interactive computer software (often ad-hoc software) to outline the lungs in each slice. This process usually takes 10 to 30 seconds, depending on experience. A typical manual lung segmentation on one CT scan of 300 images can therefore take 1 to 2 hours. Once lungs are identified, it is usually straight forward to perform the calculation of any of a number of quantitative indexes of COPD staging, such as PD15 and LAA-850.

Considering costs, if a radiologist performs this task themselves, the cost in time would be on the order of a few hundred CHF. A radiology resident would cost about 50 CHF, and a medical student or purposely trained biologist would cost a little less. However, very few intelligent, educated persons could tolerate the tedious work involved in manually segmenting lungs in CT images for more than a few dozen exams before moving on to other interests.

Approach 2: Semi automatic (today)

There are actually several levels of “Semi”-automatic lung quantification, but perhaps the most common situation is one in which the radiologist is required to provide an initial seed (such as the location of the trachea) on one or two slices and then the software identifies the entirety of the lungs in all slices and performs the quantization calculation. Another common scenario is one in which the software initially performs a fully automatic lung segmentation, but the radiologist must check the results and make manual corrections for cases of erroneous segmentation. This approach can consumes 15 minutes per exam at a typical radiologist’s cost of CHF 50. This approach is less tedious than the fully manual approach and is generally well tolerated for a few dozen or even a few hundred exams. Beyond this number, again the monotony of this approach often becomes unbearable for most users.

Approach 3: Fully automatic (future)

For extracting COPD quantities (i.e. PD15/LAA-850) from truly large numbers of CT exams, such as thousands or tens of thousands a fully automatic approach is necessary. There are a few free and commercial software packages already on the market for performing completely automatic lung segmentation and quantification. While some of these products could in theory be run in batch to process large numbers of CT exams, non appear to be produced intentionally with the aim of processing large numbers of exams and may not provide a robust solution. The 4Quant solution was build from the ground-up with the intent of providing a robust system for quantifying tens of thousands or even millions of CT exams. It takes advantage of many of the big- data techniques which 4Quant has successfully applied in previous non-medical big-data applications. With the 4Quant solution, the radiologist identifies a list of existing CT exams to process, and the COPD-RIQUA performs all of the processing without any further user interaction. In fact, users don't even need to view the actual lung images, just the final numeric quantities.

Comparison of approaches (pros and cons)

Tables Approach 1 Approach 2 Approach 3
Description manual approach batch processing parallel processing
Quantification of extent of disease 2 hours 10 min 0.5 sec
Throughput 1 radiologist can process 4 patients per day 1 radiologist can process 48 patients per day 1 radiologist can process 57’600 per day
Limitations error prone / tedious work less tedious but exhausting radiologist can focus on relevant questions

Limited number of subjects

Most studies so far, have used a limited number of subjects, because the manual or semi-automatic methods available for calculating COPD indexes is rather work intensive. To truly assess the value of specific quantification measures will require the analysis of thousands if not tens of thousands of cases. Furthermore, the results of these limited studies are often assessed in a limited statistical manor.

Detailed description of our big data approach

Completely automatic method for calculating COPD indexes

The purpose of this work is to develop a completely automatic method for calculating COPD indexes on large archives of CT lung data already in existence and to then perform advanced Big-Data analysis statistics to improve understanding of these indexes. The hope is that from such analysis, the ideal quantitative indices for quickly assessing an individual's COPD staging will be determined and eventually contribute to improved individualised therapy.

Results

The 4Quant COPD-RIQUA application was developed in collaboration with 4Quant’s clinical partner, the Department of Radiology and Nuclear Medicine of the University Hospital of Basel (USB), Switzerland. Together they have tested this module on a proof-of- concept study of 50 COPD CT cases and are preparing a pilot study of several hundred cases over the next weeks. USB is currently establishing an international collaboration to accumulate several thousands of COPD CT cases. This data will be analyzed using the COPD-RIQUA application.

Outlook

The 4Quant COPD-RIQUA solution goes beyond simply producing a set of several thousand COPD quantification values. 4Quant in collaboration USB intends to apply its experience in big data analysis, deep learning and …. to extract the most effective quantifiers of COPD CT’s. The goal would be to produce an effective, robust method for automatically processing future patients’ CTs to produce an array of clinical predictive indicies. These indicies would provide the most precise objective indicators of patient COPD staging and prognosis. These could be used to help predict the effectiveness of existing therapies and help clinicians choose the most appropriate, idividualized therapy for each patient. These indicies could then be utilized in clinical trials of new drugs to receive an early indication of the effectiveness of these drugs. Clinical trials of drugs which prove effective could be improved and trials of non-effective drugs could be terminated earlier than currently required thereby saving research organizations and subjects of unnecessary burden and costs.