DR. Stanley njoku
PH.D IN BUSINESS ANALYTICS & DATA SCIENCE
The Solutions to Most Business Needs could be Embedded in their internal dataset
I am deeply passionate about uncovering solutions within large data sets and collaborating with stakeholders to enhance business outcomes!
Dissertation:
Using Machine Learning To Improve Clinical Trials for New Drug Development
The complexity surrounding new drug development has spanned over decades with a prolonged process ranging between 10-15 years, costing an average $2.8 billion with a fail rate of over 90% to develop a single new approved drug by the Food and Drug Administration (FDA) (Ekins et al., 2019).
For any drug to be approved by the FDA, it must undergo rigorous clinical trials phases that involve human subjects, from phase I through Phase III. Phase II of the clinical trials accounting for the most significant number of failures (Vijayan, Kihlberg, Cross, & Poongavanam, 2022).

Many clinical trials have been withdrawn, suspended, terminated or delayed due to insufficient recruitment of human subjects to show efficacy and toxicity. To successfully demonstrate efficacy and toxicity and gain FDA approval, sufficient and convincing safety data must be available. However, recruiting sufficient human objects in clinical trials poses severe challenges for the pharmaceutical industry.
Approximately 80% of clinical trials failed to meet their recruiting timeline costing between $600,000 and $8 million per day according to experts.
Can Machine Learning Techniques improve Clinical Trials by reducing recruitment timeline?
Prolonged recruitment and not meeting the designed enrollment population of human subjects has resulted in most clinical trials being terminated, withdrawn, or suspended. Other factors such as lack of fundings may also resolve around retaining and attracting human subjects.
The numbers of human subjects needed to demonstrate statistically significant at predefined level of efficacy is crucial (Fogel D. B., 2018). More than half of clinical trials expenses are associated with delays due to prolonged recruitment (Cai, T. etc. 2021).
Each clinical protocol must provide an estimated duration of the study and the maximum number of human subjects to measure effectiveness and safety and demonstrate statistically level of efficacy (Fogel D. B., 2018).
How inclusive Are clinical trials patients’ selections?
Review and analyze inclusiveness in patients’ selection and the predominant race/gender in clinical trials data.
Are there race and gender disparity in clinical trials?
In this study, race and gender disparity will be analyze using clinical trials studies published on clinicaltrial.gov.
This Quantitative study research Extracted Numerous datasets from various databases for exploratory and logical insights
The qualitative approach analyzed the data for insights, while the quantitative method applied various machine learning techniques.
PHD – Research Data Summary Table
– Total Records (unique): ~426.6 Millions
– Total Words: ~9.44 Million
Data Source | Dataset Description | Volume | Purpose / Insight |
ClinicalTrials.gov | Total Clinical Trial Studies (2012–2022) | 165,630 records | Benchmark dataset for overall trial performance analysis |
ClinicalTrials.gov | Failed Interventional Trials | 23,167 records | Understand causes of failure (terminated, withdrawn, suspended) |
ClinicalTrials.gov | Recruiting Clinical Trials (as of May 14, 2023) | 64,685 records | Identify trials currently seeking participants |
— Interventional Trials | Focused recruitment dataset | 47,128 records | Used for phase-level recruitment analysis and sponsor performance |
— Observational Trials | Excluded from deep analysis | 17,557 records | Not used for main recruitment insights |
PhysioNet (Beth Israel ED) | Emergency Department Admissions | ~425,000 patients | Used to assess feasibility of patient recruitment via real-world hospital data |
PhysioNet (All Files) | Combined detailed ED records (e.g., diagnosis, vitals) | 361 million records | Benchmarking, hypothesis testing, and diversity/inclusion evaluation |
Social Media (TNBC Foundation) | Triple Negative Breast Cancer Foundation Text Data | 8,625,000 words | Analyzed for patient concerns and community voice |
Social Media (ACS) | American Cancer Society Text Data | 815,374 words | Supplementary social insights |
Dissertation Overview and chapters
The aim of this research is to explore ways machine learning techniques could reduce clinical timeline in the recruitment process of new drug development. Mixed method approaches will be used with a diversified method, combining inductive and deductive thinking, and offsetting limitations of exclusively.
chapter I: INTRODUCTION
CHAPTER II: LITERATURE REVIEW
CHAPTER III: METHOD
CHAPTER IV: RESULTS
CHAPTER V: FINDINGS & RECOMMENDATIONS
- Limitation, Findings…
Diversity in Clinical Trials
Underrepresented of minority groups, pregnant women, children and elderly is a major challenge in clinical trials (Ramamoorthy et al., 2022).
Business Analytics and Data Science (PhD)
In progress….
14+ years of experience!
In a wide of professional work experiences in various industries.
Education
ASc, Bachelor of Science (BSc) & Master of Applied Science (MASc).
Experts believe that Most companies lose 20 – 30% percent in revenue every year due to Inefficiencies
In this digital era any business slow in adapting digital solutions and innovative strategies are most likely to continue to lose 20-30% in revenue annually due to inefficient workflows. Although forward thinking businesses are simply benefiting through adapting Agile methodology for successful product deliveries. Process inefficiency will continue to be the biggest threat to companies without robust and innovative strategies. Harvard Business Review, shows that 60 percent of companies experience an increase in revenue and profits after using an Agile approach.
“Progress is impossible without change, and those who cannot change their minds cannot change anything.”
– George Bernard Shaw
Popular Data Science questions
The most common Data Science questions for professional data scientists.