Software Engineer
with an AI & Data focus.

I’m Michael McCallion — a Software Engineer and MSc Artificial Intelligence graduate. I build machine learning pipelines, data-driven applications, and clean full-stack projects.

Dissertation Project

My MSc research investigated whether human “breathprints” can act as a reliable biometric identifier, using machine learning on volatile organic compound (VOC) breath data.

Human Breathprints as Biometric Identifiers: Assessing Uniqueness and Reliability

MSc Artificial Intelligence Machine Learning VOCs / Breath Analysis Feature Selection + Classification

Traditional biometrics (fingerprint, face, iris) are effective but can be intrusive or inconvenient in some healthcare contexts. This project explores breath as a non-invasive biometric: you simply exhale into a sensor, producing a VOC “signature” that may be distinctive enough to identify an individual within a group.

Research question

Can VOC patterns in exhaled breath uniquely and reliably identify an individual (i.e., act like a fingerprint), despite natural variability caused by factors like diet, health, environment, and time?

  • Assess uniqueness via patterns across individuals
  • Reduce high-dimensional VOC data into informative components
  • Prevent overfitting given limited samples
  • Evaluate classification performance using standard metrics

Dataset & constraints

The dataset contains breath samples from 11 subjects collected across multiple days. VOCs were measured using mass spectrometry methods (SESI-MS / Q-TOF in the source study).

  • High dimensionality: many VOC features
  • Small sample size: risk of overfitting
  • Variability: VOCs fluctuate over time

Pipeline (Iteration 1)

Goal: reduce dimensionality early, then select features.

  • StandardScaler (normalise feature scales)
  • PCA (retain most variance, reduce dimensions)
  • LDA (supervised separation between subjects)
  • Regression-based feature selection: Lasso / Ridge / ElasticNet
  • Classifiers: Logistic Regression, SVM, Random Forest

Pipeline (Iteration 2)

Goal: test whether ordering matters (feature selection first).

  • StandardScaler
  • Regression feature selection first (Lasso / Ridge / ElasticNet)
  • LDA after selection (maximise class separability)
  • Same classifiers: Logistic Regression, SVM, Random Forest

Key finding: ordering matters

The core insight from this work is that changing the sequence of dimensionality reduction and feature selection can significantly affect classification outcomes. In my experiments, the second iteration (feature selection → LDA) produced a major improvement in Logistic Regression performance.

Iteration 1 — Classification metrics

Model Accuracy Precision Recall F1
Logistic Regression 25.64% 28.61% 25.64% 24.46%
SVM 30.77% 40.72% 30.77% 33.03%
Random Forest 33.33% 32.46% 33.33% 32.01%

Interpretation: performance is modest overall; the best accuracy here is Random Forest (~33%), suggesting the problem is hard under this pipeline and/or the dataset is challenging (small + variable).

Iteration 2 — Classification metrics

Model Accuracy Precision Recall F1
Logistic Regression 56.41% 65.81% 56.41% 55.99%
SVM 41.03% 61.05% 41.03% 44.74%
Random Forest 38.46% 32.46% 33.33% 32.01%

Interpretation: Logistic Regression improves dramatically under the revised ordering, indicating that the “shape” of the feature space created by the pipeline can be more important than the classifier choice alone.

What the results suggest

  • Promise: Breathprints can contain identifying information, but performance depends heavily on processing choices.
  • Hard problem: High-dimensional signals + temporal variability make classification non-trivial.
  • Representation matters: Better feature space can unlock stronger results even with simpler models.

Limitations & future work

  • Scale: Larger labelled datasets to improve generalisation
  • Stability: Evaluate repeatability across longer time windows
  • Methods: Explore non-linear reduction and stronger ensembles
  • Validation: Test on unseen cohorts / external datasets

Next steps would focus on improving robustness and validation to move from “promising” to “reliable in practice”.

Other Projects

A few additional projects from my MSc modules, focused on practical machine learning and interactive demos.

Reinforcement Learning: Q-Learning Gridworld Agent

Deep Learning Module Reinforcement Learning Q-Learning Epsilon-Greedy

Built a model-free reinforcement learning agent that learns an optimal policy in a 5×5 gridworld with obstacles. The agent improves over episodes by balancing exploration vs exploitation using an epsilon-greedy strategy.

What it demonstrates

  • Discrete state/action RL with a Q-table
  • Reward shaping (goal reward + step penalty)
  • Learning progression visible through shorter paths

Try it live

This is an interactive demo — you can train the agent and watch it navigate the grid.

About Me

Software Engineer with industry experience and an MSc in Artificial Intelligence (Distinction).

I’m a Software Engineer with experience building and supporting reliability-critical software in a regulated manufacturing environment at Abbott Diabetes Care. I worked mainly across C# and SQL, developing internal applications used on the factory floor, troubleshooting production issues, and improving system reliability through clean, maintainable code.

I recently completed an MSc in Artificial Intelligence (Distinction), where I focused on applied machine learning and evaluation. My dissertation explored whether human “breathprints” can be used as a non-invasive biometric identifier using VOC data. I enjoy building practical projects that combine strong engineering with ML and data.

Outside of coding: I’m into fishing and playing the guitar. I’m always building small projects to learn and stay sharp.

  • ✅ Software Engineering: C# / SQL / Python
  • ✅ AI/ML: feature engineering, model evaluation, classification
  • ✅ Building and deploying full-stack projects (Flask + web UI)
  • ✅ Comfortable owning work end-to-end: build → test → deploy

Contact

If you want to chat about roles, projects, or collaborations — fire me a message.

Get in touch

The best way to reach me is via email - I typically reply within a day. I'm always happy to chat about roles, projects, or collaborations!

Email Me