On April 18, 2020, we held an AMA on Reddit to answer questions about Cough against COVID, our open access effort to build an AI tool that uses cough sounds, symptoms and contextual information for early screening of COVID-19. Dr Rahul Panicker from Wadhwani Institute for Artificial Intelligence, Dr Peter Small from Global Good and Professor James Zou from Stanford answered questions from a community with around two million users. Here are edited excerpts from that chat.
- Is checking for symptoms like cough and fever effective against a virus that can spread to others before symptoms start? How can we identify community spread when it may be happening before symptoms are present?
James Zou: Right, this tool is not designed for the individuals who are infected but do not show any symptoms. There is a substantial number of cases who show relatively mild symptoms early on in the infection but who may not get tested — for example, due to limited testing capacity. If cough can help to suggest which individuals are more likely to be positives, then this can be used to prioritize testing and other healthcare support.
- Do you folks expect to be able to eventually tap into other respiratory sounds (apart from cough sounds) for improving your system? The sounds I refer to are here.
Peter Small: Thanks for pointing me to that, super interesting. I’m sure we can! As a clinician, I know there is a wealth of information in these sounds. For a century, people like me have used our ears to tease out that information, some did better than others (frankly I never felt particularly skilled with my stethoscope compared to others). If we had the right data to train acoustic AI I’m confident we could infer far more clinical information and make everyone an expert clinician — even patients themselves.
- What ethical considerations or impact have you implemented or addressed with this project?
Rahul Panicker: Ethical questions are very important to consider here, and it’s front and centre for us. There are multiple aspects to consider for an initiative like this. And that is why we have a coalition with diverse expertise here, from clinical to AI to med-tech development and validation to public health to large scale deployment. For one, all data will be anonymized before being made accessible. Secondly, we are assessing criteria for access to the dataset, to ensure some checks to avoid the proliferation of spurious solutions. Third, we will ensure a rigorous assessment of any solution we develop, including testing for biases. A key part of this is including feedback loops post-deployment. Fourth, any deployment by us will be free and make best efforts to ensure global access. I could keep going.
- How are you going to distinguish the sound of a cough? Sound of dry cough which occurs early in the disease will be different from the productive cough later in the disease as the mucus and secretion build up in the respiratory tract. How do you plan to do it on a large scale? Sound quality depends on the quality of the capturing devices. Or is it going to be limited to the hospital setting with a standardised device?
Rahul Panicker: I will let Peter, the clinician here, answer the first question. To do this on a large scale, we are doing crowdsourced data collection online on coughagainstcovid.org. In addition, we are also collecting data in clinical settings, where we expect higher quality ground truth, though the quantity may be lower. You’re right, sound quality does depend on the capturing device. However, we expect that most of the useful information will be in the temporality of sound pattern, and less in the amplitude, which is what microphones tend to distort more. This is what all our friends who are experts in speech processing tell us. Ultimately, for this to be widely useful, we need it to work with varying microphone qualities. Robustifying the algorithm will be a key part of the AI development.
Peter Small: The beauty of machine learning is that we just need collections of coughs annotated by the medical conditions of the cougher (which can be fully anonymous) then the computer finds their distinguishing characteristics – its magic to me, so I’ll let my colleagues explain how. The great thing about mobile phones is that they all have high-quality microphones so sound quality is not an issue and special recorders are not required. In this first step, we are using solicited coughs to gather those critical data sets. Everyone can (should?) donate their cough to science.
- Do you collect multiple cough sounds from the user or just one? Is there a concern that people will cough differently when they know it’s for the app? Could that possibly change the diagnosis?
Rahul Panicker: Yes, what you bring up is certainly a possibility, and we do collect multiple coughs from one person. Also, we collect samples of speech as well, to help calibrate to individuals. You’re right, there may be variation in coughs. We’re looking to see if there are features that show up regardless. And, being an online tool, we’ll keep experimenting as we or others gather new information. This is very much an experiment, but one that we believe is worth trying at this point!
- Since the cough sound is collected using different devices with varying degrees of quality, how do you expect to resolve the difference in device quality? In the same light, cough sound can be different based on the distance to the device itself. How do you expect to resolve this issue as well?
James Zou: Good question. For the machine learning algorithm to be robust, we need it to see cough data from diverse devices and situations (e.g. distance) during the training process. That way the algorithm can learn to identify features in the cough that are likely to be invariant across devices and that are potentially indicative of COVID. Collecting multiple coughs and deep breath sounds from the same individual could also improve the algorithm.
- What partners are you planning to use to consolidate this kind of information? A large number of apps have started to collect this kind of information. Are you planning to coordinate collections of databases for data mining or are you planning on app development at the device level? What funding base? Are you hiring directly or contracting?
Rahul Panicker: Possible deployment models include over a Whatsapp chatbot, a web interface, API calls by existing symptom checkers, and even possibly dedicated apps. This initiative is funded by the Bill and Melinda Gates Foundation. We currently have capabilities across the collaborators to develop and deploy the solution. But help is welcome. A couple of areas of help welcomed are in data collection and more channels for deployment. In addition, spread the word asking folks to donate their coughs at coughagainstcovid.org
Peter Small: Yes, there are about 8 different groups that I know of who are working on acoustic AI and cough – the more mind-share this gets the better the chances it can be made to work. Many of them (us included) are committed to open access to the data and free access to any resulting solution in low and middle-income countries. Addressing this global pandemic with acoustic AI is too big a technical challenge and the potential impact too urgent to be addressed by small siloed efforts. We have recently reached out to these groups and asked that all our data be pooled together.
I’ll let others reply about hiring, contracting and funding.
- I am curious about the end goal accuracy of the test. How accurate do you think the AI will get at recognizing these specific coughs? Surely people with the same sickness/disease will have different coughs and those with completely different illnesses have a chance to have very similar sounding coughs?
James Zou: These are good questions! It’s hard to say too much about accuracy before we have collected enough data to train and test the models. We are collaborating closely with clinicians, and their experience is that there is a wealth of information in people’s coughing and breathing sounds. So we think it’s a promising hypothesis to test whether this information can inform COVID-19 diagnosis.
- What is your target sensitivity and specificity for this diagnostic tool? What evidence have you gathered so far that cough data has a realistic shot of achieving these targets?
Rahu Panickerl: The key goal is early screening and not the diagnosis. The hope is that this can then be followed by a diagnostic test, which is in short supply in many places. Towards this, our current target sensitivity (recall) is >80%. We think that precision (or positive predictive value) is a better metric to shoot for than sensitivity since a useful target sensitivity will change with prevalence. At this point, we’d like the precision of >50%. There is evidence from other past studies suggesting the ability to distinguish between respiratory diseases. Secondly, there is anecdotal evidence from clinicians and patients that the COVID cough is different (though may not be different from certain other severe flu coughs). We’re hoping that these, along with other contextual information will provide a useful lift over purely symptomatic early-stage screening, which is of limited utility as far as we know.
- Since up to 40-80% may be asymptomatic, (no cough, no fever) do you think you will be held liable for spreading misinformation and false negative data among communities?
Peter Small: It’s critical that whatever comes of this be appropriately implemented – I would never say that someone definitely does not have COVID based on cough and we need to be sure that such abuse is not propagated! Cough is both a symptom (a condition experienced and reported by a patient) and a sign (a finding observed by a health care worker). Having spent my professional career fighting TB, another disease spread thru the air, I’ve long wonder about asymptomatic transmission. I wonder if some folks (like myself who cough every day or my wife who seems to never cough) cough develop a COVID cough (or any cough) but not really notice it for a few days when it gets severe or is associated with other symptoms. Thus, while I’m asymptomatic, I do have a cough. Having the ability to objectively measure cough with AI would be one way to better understand asymptomatic transmission.
- Has your project been reviewed and approved by an IRB?
Rahul Panicker: We have facility-based data collection efforts that are undergoing IRB review. The crowdsourcing effort is a citizen science initiative. But we are maintaining the highest standards. For example, our data platform is HIPAA compliant. We pay close attention to data security and anonymization as well.
- Are you focusing on the collection of data from already identified patients with collaboration with hospitals? Is that data more valuable as compared to the data public submits (mostly of whom might be COVID negatives)?
Peter Small: This is a data-hungry project! We and others working on the use of acoustic AI to combat COVID and other diseases need lots of data to train and test on.
We are looking for coughs that have known etiologies – ideally definitively diagnosed and documented. You are correct that collaborations with researchers and health facilities may have a high yield for relatively small numbers of such data but given the success of other “citizen science” efforts Wadhwani AI is hopeful that we can get very high numbers of slightly less certain data.
But what the world really needs is a tool for recognizing the cause of a cough early before they spread COVID (or TB or whatever). Institutions will over-represent late-stage COVID and thus, we are exploring ways to get cough sounds from people early in the disease – ideally from the first few coughs after they get sick – perhaps even before they themselves become aware they are coughing. Everyone’s ideas and coughs are welcome.