Loading [Contrib]/a11y/accessibility-menu.js

Neonatal Seizure Detection from EEG Signals using Deep Learning Methods

Research Article | DOI: https://doi.org/10.31579/2578-8868/378

Neonatal Seizure Detection from EEG Signals using Deep Learning Methods

  • Gurneet Kaur 1*
  • Sukhwinder Singh 1

Head of Neurosurgical Department, Juaneda Miramar, Palma de Mallorca, Balearic Islands, Spain.

*Corresponding Author: Gurneet Kaur, Head of Neurosurgical Department, Juaneda Miramar, Palma de Mallorca, Balearic Islands, Spain.

Citation: Gurneet Kaur, Sukhwinder Singh (2025), Neonatal Seizure Detection from EEG Signals using Deep Learning Methods, J. Neuroscience and Neurological Surgery, 17(5); DOI:10.31579/2578-8868/378

Copyright: © 2025, Gonçalo Januário. This is an open-access article distributed under the terms of The Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Received: 09 May 2025 | Accepted: 30 May 2025 | Published: 26 June 2025

Keywords: epilepsy ; electroencephalography ; deep learning ; wavelet decomposition ; daubechies transform ; convolutional neural networks

Abstract

Neonatal epilepsy, a serious neurological condition in newborns, is typically marked by irregular brain activity that results in repeated seizures. This could have detrimental effect on neurodevelopment and may raise risk of cognitive and behavioral abnormalities. In order to prevent long-term neurological damage and developmental delays, an accurate detection of such seizures is crucial. Neonatal seizures can be subtle and may not exhibit any clear physical signs and hence, complicating timely diagnosis and treatment. To assist in diagnosis, electroencephalogram (EEG) can analyze, but manual inspection is time-consuming and burdensome for neurologists. Therefore, an automated EEG review would enable more frequent monitoring of neonates at risk. Hence, an automated system for detecting seizures has been created, trained on multichannel annotated EEG recordings from 79 full-term neonates who were admitted to Helsinki University Hospital. The system extracts wavelet coefficients from 10 seconds EEG segments using multilevel Daubechies 4 (db4) wavelet decompositionand uses these features to train a 2-Dimensional Convolutional Neural Network (2D-CNN). This model has achieved 100% accuracy, sensitivity, and F1-score, outperformed existing state-of-the-art methods and provided a reliable solution for automated seizure detection.

Introduction

Neonatal seizures, the most commonly encountered neurological disorder often indicating underlying brain dysfunction. The neonatal period is usually considered to be the first 4 weeks/28 days of a full-term newborn’s life. This is critical stage of brain growth and development [1]. When compared with adults and children, it was found that neonates are more susceptible to seizures because of incomplete inhibitory control mechanisms and a higher ratio of excitatory neurotransmitters. These seizures carry significant risks, including neuronal damage and long-term neurodevelopmental impairments. It is estimated that around 7-10% of neonates are at a risk of death while, 23-50% are likely to develop some abnormality [2]. Unlike seizures in older populations, neonatal seizures often lack physical signs, making diagnosis particularly difficult [3]. While various neuro-imaging and signal processing techniques have been explored, electroencephalography (EEG) remains the benchmark for seizure detection particularly in neonates due to its cost-effective, noninvasive nature and remarkable capability to capture real-time electrophysiological activity. However, interpreting neonatal EEG requires considerable clinical expertise and is also susceptible to variability and subjectivity in experts’ opinions [4,5]. This issue led to a burgeoning development of automated, computer-aided seizure detection systems.

Early approaches focused on long-established data-driven machine learning (ML) techniques.

Machine learning-based classifiers were mostly fed features are extracted through various domain analysis such as time or frequency [6]. Machine learning approaches mainly comprises two parts: feature extraction and classification. Classifiers like Naïve Bayes classifiers, Support Vector Machines (SVMs) and Random Forests are commonly used [7]. Numerous techniques like Discrete Wavelet Transform, Principal Component Analysis, and signal chunking have been used for feature extraction. Many studies used above techniques and achieved accuracies up to 99% with Naive Bayes classifier [8], and around 100% using SVM classifiers [9]. For instance, Biswal et al. [10] utilized a Naïve Bayes classifier to analyze 3,277 EEG reports, which were categorized based on the presence or absence of epileptiform discharges or seizures. This approach resulted in an AUC of 99.05%. Meanwhile, Runarsson et al. [11] created a real-time neonatal seizures detection system using Support Vector Machines (SVM) and half-wave attribute histograms. This system demonstrated high specificity (up to 100%) and sensitivity (over 80%). However, the requirement of handcrafted features can limit scalability and generalizability in machine learning methods.

Recent advances in arena of deep learning (DL) had considerably improved performance in automated seizure detection. Models such as Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and hybrid architectures such as CNN-LSTM and CNN-RNN, can automatically extract and learn useful features from raw EEG data [12]. Gramacki et al. [13] trained a deep learning framework on chunks of EEG signals obtained from sliding window technique, resulting in an average accuracy of 96-97% on a Helsinki dataset. Another study by Hogan et al. [ 2] used same dataset and trained a CNN model and achieved AUC as 0.982. Whereas, an approach by O'Shea et al. [14] used two distinct CNNs on a large clinical dataset, and found out that the 11-layers deep architecture significantly surpassed the performance of the shallower architectures, boosting the accuracy from 82.6% to 86.8%. Furthermore, O'Shea et al. [15] proposed a fully CNN model on raw multi-channel EEG signals leveraging weakly labeled data to enhance training efficiency and achieved an high accuracy of 98.5%. To determine the severity levels of neonatal epileptic episodes in actual medical dataset, Debelo et al. [16] employed a deep CNN which was quite efficient with accuracy, specificity and precision to be around 92%. An innovative 1D-CNN model was suggested by Sameer et al. [17] with remarkable accuracy up to 99.83% using only an average of seven epochs for training resulting in a substantial reduction in training time. Li et al. [18] introduced a 1-dimensional convolutional neural network (1D-CNN) model that incorporates meta-learning, which involves continuously adjusting the model's weights until an optimal weight configuration is reached. This model achieved an average performance of over 92.63% in sensitivity, specificity, and F1 score. Meanwhile, Ullah et al. [19] proposed a pyramidal 1D-CNN to address the challenge of a large number of learnable parameters, as it uses 61

Materials and Methods

This research utilizes a neonatal EEG dataset comprising multi-channel EEG recordings from 79 neonates. These infants were admitted to the NICU at Helsinki University Hospital, Finland, between 2010 and 2014[26]. The EEG recordings were conducted using 19 electrodes [30] according to the "International 10–20 system (Fp1, Fp2, F3, F4, F7, F8, Fz, C3, C4, Cz, P3, P4, Pz, T3, T4, T5, T6, O1, O2)", with 256Hz as

sampling rate and a one-second resolution, measured in microvolts (µV)[30]. Three experts independently annotated the EEG recordings, and during this procedure, the signals were in a standard bipolar montage ('double banana'). Figure 1A shows the electrode arrangement in standard bipolar montage. Among the neonates, 40 had seizures confirmed by all experts, 17 had experienced seizures as identified by one or two experts, and 22 were free of seizures. In total, 1,379 seizures were annotated. Annotation details are provided in Table 1A. The dataset is publicly accessible at [31].

Figure 1A: Full 18-channel (double banana) montage.[41]

 EXP3 EXP12 EXP0
Infant #ABCInfant #ABCInfant #ABC
125423622--3---
42736--410---
5557812-18---
7621612--127---
9383234-928---
1134124-1-29---
1356626--730---
14452638331-632---
151962043--435---
1630456146--437---
17433548-1742---
199131056--145---
2017192461--348---
2111164-251249---
2286765--453---
25121446812-55---
3122274-5657---
34111 Sum EXP1258---
3621217357859---
38191624 60---
3967670---
401218871---
41451058 EXP0
447108000
47353 
50101010
51418
52122
62111
6351225
66222
67161619
6914915
71442
73647
75111
76233
77133
78222224 
79586
 EXP3
385394470

Table 1A: Numbers of seizures annotated by 3 experts (marked as A, B and C) for every infant [13],

The dataset includes 79 edf files with information of EEG recordings and 3 annotation files corresponding to each expert [13]. The raw signals were in unipolar montage and hence, could not be used directly; instead, a standard bipolar montage was created using 18 electrode pairs such as “Fp2-F4, F4-C4, C4-P4, P4-O2, Fp1-F3, F3-C3, C3-P3, P3-O1, Fp2-F8, F8-T4, T4-T6, T6-O2, Fp1-F7, F7-T3, T3-T5,

T5-O1, Fz-Cz, Cz-Pz”. In this research, data from 40 neonates, each annotated by all three experts, and 17 neonates, annotated by one or two experts, were chosen. To ensure consistency, only seizure events agreed upon by at least two experts were retained. For example, if two out of three experts labeled a time segment as a seizure, it was accepted as such in the final dataset. To segment the EEG signals, a sliding window approach was used, a common method in seizure detection studies [13,32,33]. A window size of 10 seconds with 1-second overlap was applied for seizure segments. Non-seizure segments used non- overlapping windows. This resulted in 5,294 seizure and 19,858 non-seizure windows, saved in CSV format. For feature extraction, each EEG window was processed using the Discrete Wavelet Transform (DWT) and further was decomposed using Daubechies 4 (db4) wavelet [34]. The signal was decomposed into four levels, separating it into approximation (A) coefficients and detail (D) coefficients. The model uses coefficients A4, D4, D3, and D2, which correspond to the delta and theta bands(A4), alpha band(D4), beta band(D3) and gamma band(D2) respectively. This decomposition helps isolate key frequency components relevant to seizure detection, as illustrated in Figure 1B. Each 10-second EEG window, sampled at 256 Hz, contains 2,560 samples. Using Daubechies-4 (db4) wavelet decomposition upto 4 levels [34], the signal is split into approximation (A4) and detail coefficients (D2, D3, D4), resulting in 2,560 coefficients per window (A4: 160, D4: 160, D3: 320, D2: 640, plus D1: 1,280). With 18 channels, each window forms an 18 × 2,560 feature matrix. The dataset was divided into a 70:30 ratio, with 70% reserved for the training set and 30% designated for the testing set. The features were reshaped into 3D tensors for CNN input—specifically, the dimension of a single tensor was 20 × 2,560 × 18 with corresponding label vectors of size 20 × 1. Each tensor contains 10 seizure and 10 non-seizure windows. This transformation from 1D to 3D is essential for compatibility with 2D-CNN model. The tensors formed are visually represented in Figures 1C and 1D. Figure 1C represents a single layer of a tensor with dimension 20 X 2560 where 20(10+10) is number of seizure and non-seizure instances and 2560 are number of wavelet coefficients. Figure 1B represents a single tensor with 18 layers depth-wise representing 18 channels.

The architecture of the model, drawing inspiration from Gramacki et al. [13], is composed of three blocks

Figure 1B: Wavelet Decomposition using db4

Figure. 1C: A depiction of Tensor layer and label vector

Figure 1D: The 3D tensor fed to the CNN model.

Each block includes a Conv2D layer, a batch normalization layer, a ReLU activation function, a max pooling layer, and a dropout layer [13]. The architecture is represented in Figure 1E. The filter count starts at 128 in the first block, decreases to 64 in the second, and further reduces to 32 in the third block. The dropout rate is set at 25% for the first two blocks, while the final convolutional block has a dropout rate of 50%. Following 

the convolutional layers, there are two dense layers, one with 64 units and the other with 32 units, both employing L2 regularization. This is succeeded by a dense layer that gives a single unit output. The model utilizes binary cross-entropy as its loss function. It operates at 0.001 learning rate using the Adam optimizer and was trained for 300 epochs with batch size of 32. An elaborated summary of the model parameters is provided in Table 1B.

Figure 1E: Architecture of proposedmodel

ParameterValue
Optimization AlgorithmAdam Optimizer
Activation FunctionSigmoid
Loss FunctionBinary Cross Entropy
Learning Rate0.001
Batch Size32
Epochs300

Table 1B: Model Parameters

The most common metrics for assessing model performance include accuracy, sensitivity, and specificity [13]. These metrics are based on a confusion matrix, which demonstrates the number of correct and incorrect predictions for each class. Predictions are typically categorized as “True Positive (TP)” (accurately predicts the positive class), “True Negative (TN)” (accurately predicts the negative class), “False Positive (FP)” (wrongly predicts the negative class), and “False Negative (FN)” (wrongly predicts the positive class). The formulas for calculating accuracy, sensitivity, and specificity are represented as follows [35]:

Results

The proposed method was evaluated using the same performance metrics described earlier. The extracted features were fed into the model, and after training, the output layer produced a vector of size 20×1. In this vector, each component had a value between 0 and 1. To improve the model's assessment, a threshold of 0.5 was used: elements with values less than or equal to 0.5 were assigned a classification of 0, while those with values above 0.5 were classified as 1. It is important to highlight that the output elements were mostly found within the intervals [0, 0.3] or [0.7, 1]. The threshold facilitates a clearer distinction between actual and predicted outputs. This step improved the comparability of the results. The accuracy and loss values obtained by the proposed model in training, testing and validation phases are summarized in Table 2A. The training accuracy along with testing accuracy was obtained as 100% whereas, the loss was approximately approaching to 0%. The slight ambiguity in the loss values can be attributed to the penalty term introduced by the L2 regularizer. Figures 2A and 2B depict the confusion matrices for the training phase and testing phase, respectively. By calculating the evaluation metrics from confusion metrics, the model achieved 100 % accuracy and sensitivity, with an F1 score of 1. Additionally, Figures 2C and 2D illustrate the accuracy curve and loss curve during the training phase. The graphs indicate fluctuations in the early epochs; however, the curves quickly stabilized, reaching nearly 100%.

Figure. 2A: Confusion Matrix for training Phase.

Figure 2B: Confusion Matrix for testingPhase.

Figure2C: AccuracyCurve for trainingphase

Figure.2D: Loss Curve for training phase

 AccuracyLoss
Training1.00.0005
Validation1.00.0005
Testing1.00.0004

Table 2A: Accuracy and Loss values in various phases.

Discussion

The performance of proposed the CNN-based model is capable of accurately distinguishing between seizure and non-seizure occurrences in EEG signals by utilizing features derived from wavelet transformations. The model achieved perfect classification performance, with 100% accuracy, sensitivity, and an F1 score of 1 in training and as well as testing phase. This indicates a strong generalization capability, likely aided by the distinct separation of output probabilities and the robustness of the chosen wavelet features. The use of dropout layers and L2 regularization effectively mitigated over-fitting, as evident from the stable loss and accuracy curves.

Numerous other approaches have been devised to detect eplileptic seizures. To evaluate the effectiveness of the proposed method, a comparison was conducted with several existing deep learning techniques using the same dataset. The baseline model by Gramacki et al. [13] employed a 10-second sliding window technique to extract features and form sub-datasets, which were then fed to CNN, yielding an average accuracy of 96–97% across various sub-datasets.Tanveer et al. [26] utilized a combination of three 2-D models, achieving an accuracy rate of 96.3%. Visalini et al. [28] implemented a “triplet half-band filter” along with “wavelet packet decomposition” before inputting the data into a Deep Belief Network, which resulted in an accuracy of 98.7%. Raeisi et al. [29] introduced a “Spatio-Temporal Graph Attention Network (ST-GAT)” that effectively captures both temporal and spatial characteristics, achieving 96.6% accuracy on the Helsinki dataset [31]. Daly et al. [36] introduced a deep neural network leveraging longer EEG segments, data augmentation, residual connections, and a robust optimizer, achieving 97.73% accuracy on over 4,570 hours of clinical EEG recordings. The results are summarized in Table 3A. While these methods demonstrate high accuracy, none achieved perfect classification. In contrast, the proposed method reached 100% accuracy, sensitivity, and F1 score, highlighting its superior performance on the same dataset.

Several other studies have also explored time-frequency domain analysis for EEG signal classification; however, their reported accuracies were often not satisfactory. As shown in Table 3B, Cho et al. [37].

AuthorsMethodsAccuracy
Gramacki et al. [4]Sliding window design and CNN model96-97%
Tanveer et al. [21]Ensemble model using 3 different CNN model96.3%
Visalini et al. [23]Deep BeliefNetwork with Triplet Half-Band filter and Wavelet Packet Decomposition98.7 %
Raeisi et al. [24]Spatio-Temporal Graph Attention Network (ST-GAT)96.6 %
Daly et al. [30]CNN withresidual connections anddata augmentation97.73%

Table 3A: Comparison with existing methods using same dataset

AuthorsMethodsAccuracy
Cho et al. [31]Wavelet Transform and SVM80.54%
Khan et al. [32]Discrete Wavelet Tranform and CNN model87.8%
Sharma et al. [33]Analytic time-frequency flexible wavelet transforms and fractal dimension and LS-SVM100%
Thasneem et al. [34]Wavelet based features and linear classifier99.5%

Table 3B: Comparison with existing methods using time-frequency analysis.

Applied wavelet transform followed by Support Vector Machine (SVM) classification, resulting in 80.54% accuracy. Thasneem et al. [38] improved on this by combining wavelet-based features with a linear classifier, reaching 99.5% accuracy. Some studies employed approaches particularly wavelet analysis comparable to the proposed method. For instance, Khan et al. [39] extracted features using discrete wavelet transform and classified the data using a CNN, but achieved a lower accuracy of 87.8%. While Sharma et al. [40] reported 100% accuracy, their method differed considerably, incorporating machine learning models with a flexible wavelet transform and fractal dimension-based features.

The suggested method streamlines the classification process by employing a simplified deep learning framework, achieving results that are either on par with or better than those obtained through intricate preprocessing or manually crafted feature extraction.

Results

Several methods have been developed for detecting neonatal seizures using EEG data. Before the advent of deep learning, traditional machine learning algorithms were widely employed, utilizing features derived from various domains for analysis. In this study, wavelet decomposition using the Daubechies 4 transform was performed on a 10 second window of EEG data. This data was split into a 70:30 ratio where, 70% is used for training and 30% is reserved for testing. The proposed method achieved 100

References

a