~/abin
May 11, 2026

ADT308 Comprehensive Course Work β€” Full Notes

ADT308 Comprehensive Course Work β€” Full Notes

Format: 50 MCQ Γ— 1 mark | 1 hour | No negative marking Modules: ML | DSA | OS | DBMS | Data Science


MODULE 1 β€” Machine Learning

1.1 ML Categories

  • Supervised β€” labelled training data. Examples: classification, regression, earthquake prediction (seismic data β†’ next quake).
  • Unsupervised β€” finds hidden structure in unlabelled data. Examples: clustering, PCA.
  • Reinforcement Learning β€” agent learns via reward/penalty.
  • 3 types of ML techniques (Q1 JAN24 answer).

1.2 ML Models

  • Predictive β€” output involves target variable.
  • Descriptive β€” pattern discovery, no target.
  • ML = AI subset of algorithms that improve performance at a task over time with experience (All of the above).

1.3 Feature Types

  • Nominal β€” names, no order (red, blue).
  • Ordinal β€” ordered grades (A, B, C, D, E, F) ← grade example.
  • Categorical β€” discrete categories.
  • Boolean β€” true/false.

1.4 Training Modes

  • Batch / Offline learning β€” entire dataset trained at once (both a and b).
  • Online learning β€” incremental updates.

1.5 Metrics

  • Accuracy = correct predictions / total predictions.
  • Precision = TP / (TP + FP) β€” ratio of correct positives to predicted positives.
  • Recall = TP / (TP + FN).
  • F1 = harmonic mean of precision and recall.
  • Confusion Matrix β€” table comparing predicted vs actual labels.
  • Residual = Observed βˆ’ Predicted.
  • ROC curve β€” TPR vs FPR.

1.6 Algorithms

Algorithm Type Key Fact
Linear Regression Supervised regression Minimize least square error; line of best fit; Y = b0 + b1X + Ξ΅
Logistic Regression Supervised classification Discriminative
NaΓ―ve Bayes Supervised Generative model; popular for text; works with missing features by integrating posteriors
SVM Supervised Solves convex optimization; depends on kernel selection + kernel params + soft margin (All)
Decision Tree Supervised Uses Gini index (purity of nodes); prone to overfitting
Random Forest Supervised ensemble Bagging-based; high complexity, time-consuming
KNN Supervised Lazy / instance-based / non-parametric learner
K-means Unsupervised Heuristic = Lloyd’s algorithm; uses Euclidean distance; fails on outliers, varied densities, non-convex shapes
Hierarchical (Agglomerative) Unsupervised Produces dendrogram (tree showing closeness)
DBSCAN Unsupervised Density-based; outlier detection
PCA Unsupervised Dimensionality reduction
Apriori Association Finds frequent itemsets

1.7 Important Concepts

  • Bayes’ theorem: P(H|E) = P(E|H)Β·P(H) / P(E)
  • SVM kernel trick = implicitly map data to higher-dimensional space.
  • Soft margin SVM allows misclassifications; hard margin does not.
  • Bias-Variance tradeoff = model complexity vs generalization ability.
  • Regularization = reduces model complexity (avoid overfitting).
  • Cross-validation = assess generalization on independent data.
  • LOOCV issue = high variance.
  • Pre-processing order: Normalize β†’ PCA β†’ normalize PCA output β†’ training.
  • Gradient = 0 at extrema (min/max of function).
  • Ensemble learning = train multiple models, combine predictions.
  • AdaBoost = assigns weights to base models by performance.
  • Boosting = decreases bias + variance, 2-class classifier.
  • Vanishing gradient fix = ReLU activation / Batch normalization.
  • Inference engines = forward + backward chaining.

MODULE 2 β€” Data Structures & Algorithms

2.1 Linear Structures

Stack (LIFO)

  • Push/pop at top
  • Used for: balanced parentheses, expression evaluation, function call
  • Stack from queue β†’ needs 2 queues

Queue (FIFO)

  • Enqueue rear, dequeue front
  • Queue from stack β†’ needs 2 stacks
  • Circular queue: size 11, front/rear at q[2], 9th element β†’ q[10]

Deque β€” insert/delete at both ends.

  • Standard ops: InsertFront, InsertRear, DeleteFront, DeleteRear

Priority Queue β€” efficiently implemented with Binary Heap / Fibonacci Heap.

Hash Table

  • Key-value pairs
  • Linear probing for collision: insert 142 at next free slot after collision at index 2
  • (2x+5) mod 7 hash: insert 1,4,9,6 β†’ 1, _, 9, 6, _, _, 4

2.2 Expression Notation

  • Infix a+b-c+d*(e/f) β†’ Postfix ab+c-def/*+
  • Prefix eval + - * 3 2 / 8 4 1 = (3*2) βˆ’ (8/4) + 1 = 6 βˆ’ 2 + 1 = 5
  • Postfix eval 10 5 + 60 6 / * 8 - = 15 * 10 βˆ’ 8 = 142

2.3 Trees

BST properties

  • Left subtree < root < right subtree
  • Both subtrees also BST
  • In-order traversal β†’ ascending sorted order

Traversals

  • Preorder: root, L, R
  • Inorder: L, root, R
  • Postorder: L, R, root
  • Given preorder 15,10,12,11,20,18,16,19 β†’ postorder = 11,12,10,16,19,18,20,15

Heap β€” complete binary tree; max/min heap.

2.4 Graphs

  • Complete graph edges = n(n-1)/2
  • Simple graph adjacency matrix: symmetric, 0 diagonal
  • DFS with k tree edges on n vertices β†’ n βˆ’ k connected components
  • Dijkstra β€” shortest path from source to all (weighted)
  • BFS β€” unweighted shortest path
  • Prim’s / Kruskal’s β€” Minimum Spanning Tree

2.5 Sorting

Algo Best Avg Worst Notes
Bubble O(n) O(nΒ²) O(nΒ²)
Insertion O(n) O(nΒ²) O(nΒ²) Best when already sorted
Selection O(nΒ²) O(nΒ²) O(nΒ²) Max swaps = nβˆ’1
Merge O(n log n) O(n log n) O(n log n) Divide & Conquer
Quick O(n log n) O(n log n) O(nΒ²); O(n log n) with median pivot
Heap O(n log n) O(n log n) O(n log n)
Counting O(n+k) O(n+k) O(n+k) 0 comparisons between input
Radix O(dΒ·n) β€” β€” Card sorter method
  • Binary search: O(log n); cannot apply to sorted linked list (no random access).
  • Insertion sort definition: putting element in appropriate place in sorted list yields larger sorted list.

2.6 Linked List

  • Worst-case delete intermediate node (given pointer): O(n) β€” need previous pointer.
  • Recursive print: 1β†’2β†’3β†’4β†’5β†’6 with fun(startβ†’nextβ†’next) β†’ prints 1 3 5 5 3 1.

MODULE 3 β€” Operating Systems

3.1 Process Management

Process States: New β†’ Ready β†’ Running β†’ Waiting/Blocked β†’ Terminated

  • Time slice expires β†’ Ready state
  • Process fails β†’ log written to log file

CPU Scheduling

  • Basis of multiprogramming OS
  • Non-preemptive: FIFO (First-In First-Out), SJF (non-preemptive)
  • Preemptive: Round Robin, SRTF, Multilevel Queue (with feedback)

Round Robin properties

  • Works like SJF for large time quantum
  • Does NOT use prior burst time knowledge
  • Small quantum β†’ poor response for short processes

SJF avg waiting time β€” sort by burst time when arrived, compute waiting.

Aging β€” increase priority of waiting jobs to prevent starvation (ensures finite-time termination).

3.2 Deadlock

4 Necessary Conditions: Mutual exclusion, Hold-and-wait, No preemption, Circular wait.

Approach Algorithm
Prevention Negate one of 4 conditions
Avoidance Banker’s Algorithm β€” examines resource allocation state
Detection + Recovery Wait-for graph

Invalid prevention scheme: “Never request a resource after releasing any resource” (this is invalid).

  • Cycle in wait-for graph with single-instance resources β†’ deadlock.
  • Multi-instance: cycle β‰  deadlock necessarily.

3.3 Memory Management

Paging

  • Page size 4KB, process 103KB β†’ internal fragmentation = 4 βˆ’ (103 mod 4) = 1 KB
  • Frame 4KB, PTE 2 bytes β†’ addressable physical = 2^16 frames Γ— 2^12 bytes = 2^28 bytes
  • TLB: 32-bit virtual, 4KB page β†’ 20-bit page number; 128 entries 4-way set assoc β†’ 32 sets (5 bits) β†’ TLB tag = 20 βˆ’ 5 = 15 bits

Page Replacement Algorithms

Algo Belady’s anomaly Notes
FIFO Yes
LRU No
Optimal No Theoretical best
  • Reference string 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1 size 4 FIFO β†’ 14 faults.
  • LRU 4 frames on A,B,C,D,A,B,E,A,B,C,D,E β†’ 8 faults.

Demand paging β€” page brought to memory on requirement. Thrashing β€” processes frequently access pages not in memory. Dirty bit β€” page modified after load into cache/memory. Bootstrap program initializes system, starts OS.

3.4 File Systems

Allocation methods

  • Contiguous, Linked, Indexed (and extension for large files in Unix)
  • Max file size in indexed = f(block size, blocks for index, address size)
  • File info stored in separate directory structure
  • OS keeps open-file table for active files

Disk Scheduling

  • SCAN β€” head moves end-to-end servicing requests, reverses at end
  • C-SCAN β€” head returns to start without servicing on return
  • LOOK / C-LOOK β€” like SCAN/C-SCAN but reverse at last request, not disk end

Memory allocation strategies (first/best/worst fit) β€” select free hole from available holes.

3.5 Misc OS

  • Disk requires device driver.
  • Real-time OS: RTLinux, QNX, VxWorks. Palm OS = NOT real-time.
  • Protection: every CPU address checked against relocation and limit registers.
  • Page fault service 10ms, mem access 1Β΅s, 99.99% hit β†’ avg = 0.9999Γ—1Β΅s + 0.0001Γ—10000Β΅s β‰ˆ 1.9999 Β΅s.
  • RR + FIFO is NOT non-preemptive. FIFO alone = non-preemptive.

MODULE 4 β€” DBMS

4.1 Architecture / Views

  • Conceptual view = total database content (logical structure)
  • Internal/Physical view = storage details
  • External view = user-specific
  • DBMS = application software

4.2 E-R Model

Construct Symbol
Entity Rectangle
Attribute Ellipse
Relationship Diamond
Weak entity Double-outlined rectangle
Multivalued attribute Double ellipse
Derived attribute Dashed ellipse
  • Generalization β€” bottom-up (combine entities).
  • Specialization β€” top-down (split entity into sub-entities).
  • Aggregation β€” relationship as entity.

E-R β†’ Relational mapping (key Q!)

  • 3 entities E1, E2, E3 + R1(1:M E1-E2) + R2(M:M E1-E2) + R3(M:M E2-E3), no attribute relationships β†’ 5 tables
    • E1, E2, E3 (3 entity tables) + R2 (M:M needs own table) + R3 (M:M needs own table). R1 = 1:M merges into M side.

4.3 Normalization

NF Rule
1NF Atomic attributes, no repeating groups
2NF 1NF + no partial dependencies on candidate key
3NF 2NF + no transitive dependencies
BCNF every determinant is a candidate key (stronger than 3NF)
4NF handles multi-valued dependencies
5NF handles join dependencies
  • Every relation from E-R model is in 1NF (atomic by default).
  • Every BCNF ∈ 3NF, not vice versa.
  • For R(A,B,C,D,E) with {Bβ†’A, Aβ†’C, BCβ†’D, ACβ†’BE}: find candidate key, check FDs β†’ highest NF.
  • For R(A,B,C,D,E,F) with {Cβ†’F, Eβ†’A, ECβ†’D, Aβ†’B}: EC is candidate key (covers all attrs via closure).

Armstrong’s Axioms: Reflexivity, Augmentation, Transitivity. (Pseudotransitivity = NOT a primary axiom.)

Decomposition desirable property: Dependency preservation + lossless join.

4.4 SQL

Category Commands
DDL CREATE, ALTER, DROP, TRUNCATE
DML SELECT, INSERT, UPDATE, DELETE β€” query + modify tuples
DCL GRANT, REVOKE
TCL COMMIT, ROLLBACK
  • DROP TABLE = remove relation from DB.
  • Drop column: ALTER TABLE t DROP COLUMN c;
  • COUNT() counts rows.
  • Aggregate functions: AVG, SUM, COUNT, MIN, MAX.
  • NULL check: WHERE addr IS NULL (not =NULL).
  • CREATE SCHEMA β€” multiple CREATE TABLE/VIEW/GRANT in single transaction.
  • Self-join via table aliases β€” TRUE.
  • Outer-join is NOT fundamental relational algebra. Natural β‰  outer join.

Relational Algebra

  • Οƒ (selection) β€” filter rows
  • Ο€ (projection) β€” select columns ← PROJECTION
  • β‹ˆ (natural join)
  • Equivalent of SELECT name, course_id FROM instructor, teaches WHERE instructor_ID = teaches_ID β†’ SELECT name, course_id FROM instructor NATURAL JOIN teaches;

4.5 Keys

Key Definition
Super key Set of attrs uniquely identifying tuple
Candidate key Minimal super key
Primary key Chosen candidate key
Foreign key References primary key of another table β†’ represents relationship

4.6 Relationships

  • One-to-many = One teacher β†’ many classes.
  • Self-join via aliases.
  • Join of R(m) β‹ˆ S(n) β†’ max = mΒ·n, min = 0.

4.7 Distributed DB

  • Location transparency β€” query must specify fragments (data location handled implicitly).

MODULE 5 β€” Data Science

5.1 Foundations

  • Data Science languages: R, Python (R chosen in MCQs).
  • Modern data science attributed to William S (William S. Cleveland).
  • Data Science = inspecting + cleaning + transforming data (all of the above).
  • Data fishing = data dredging.

5.2 Data Representation

  • Data frame = structured representation of data.
  • rep() β€” vector of repeated values in R.
  • Coding = process of quantifying data.

5.3 Visualization

  • Scatterplot β€” relationship between 2 quantitative variables.
  • Bar/Pie β€” categorical.
  • Line β€” trends over time.

5.4 Data Preparation

  • After acquiring data: Data cleansing.
  • Missing data handling: remove column / replace with median / ignore row (all valid).
  • Bayesian missing features β†’ integrate posteriors over missing features.
  • Outlier detection: Clustering, PCA, DBSCAN. Association rule mining = NOT outlier detection.

5.5 Statistics

  • Perfect negative correlation = βˆ’1.
  • Correlation +1 = perfect positive. 0 = no correlation.

5.6 Classification & Clustering Recap (overlap with M1)

  • Classification algos: Decision Tree, NaΓ―ve Bayes, SVM, KNN.
  • K-means, DBSCAN, Apriori = NOT classification (clustering/association).
  • Hierarchical clustering output = dendrogram.
  • High-dimensional clustering β€” K-Means (DBSCAN/hierarchical struggle with curse of dimensionality).

5.7 Regression

  • Simple linear: Y = b0 + b1X + Ξ΅.
  • Regression = describes associations + models relationships (all).

5.8 Data Warehousing

  • Heterogeneous DB integration approaches = 2 (Query-driven + Update-driven).
  • Data discrimination = classification mapping to predefined classes.

5.9 Applications

  • Recommendation systems, image/speech recognition, price comparison.
  • Privacy Checker β‰  data science app.

QUICK-FIRE ANSWER BANK (Most-Asked Verbatim Repeats)

Q Answer
Bagging algo Random Forest
Boosting weight-by-perf AdaBoost
Generative model NaΓ―ve Bayes
Lazy learner KNN
K-means heuristic Lloyd’s
K-means distance Euclidean
Hidden structure unlabelled Unsupervised
PCA purpose Dimensionality reduction
Confusion matrix purpose Predicted vs actual labels
SVM problem type Convex optimization
Belady’s anomaly FIFO
Deadlock avoidance Banker’s
Bootstrap Initializes system
Process after time-slice Ready
Non-preemptive FIFO
Process fail log Log file
Frame 4KB, PTE 2B addressable 2^28 bytes
Internal frag 4KB / 103KB 1 KB
Dirty bit Page modified after load
Aging Priority increase
Not RTOS Palm OS
Disk req driver Disk
E-R attribute Ellipse
E-R weak entity Double rectangle
Min tables E1/E2/E3 R1(1:M) R2(M:M) R3(M:M) 5
Select columns op PROJECTION
Drop relation DROP
Count rows COUNT(*)
4NF handles Multi-valued dependency
Minimal super key Candidate key
Inter-table relationship key Foreign key
Conceptual view Total DB content
Top-down sub-entity Specialization
Data Science lang R
Quantifying data Coding
Structured data rep Data frame
2 quantitative vars plot Scatterplot
Vector of repeats in R rep()
Perfect negative correlation βˆ’1
Dendrogram producer Hierarchical clustering
Frequent itemsets Apriori
Vanishing gradient fix ReLU / Batch norm
SVM kernel trick Implicit higher-dim mapping
Soft vs hard SVM Soft allows misclassification
Postfix of a+b-c+d*(e/f) ab+c-def/*+
Divide & conquer sort Merge
Binary search complexity O(log n)
Heap sort worst O(n log n)
Bubble sort O(nΒ²)
Complete graph edges n(nβˆ’1)/2
Best sort for sorted list Insertion
BST ascending traversal In-order
Stack from queue needs 2 queues
Priority queue impl Binary/Fibonacci heap
Balanced parentheses DS Stack
Counting sort comparisons 0
Card sorter method Radix

NUMERICAL PROBLEM PATTERNS

  1. Page fault calculation β€” given reference string + frames + FIFO/LRU β†’ simulate.
  2. Internal fragmentation = page_size βˆ’ (process_size mod page_size).
  3. Addressable memory = 2^(PTE bits) Γ— page_size.
  4. TLB tag bits = page_number_bits βˆ’ set_index_bits.
  5. SJF average wait time = sort by burst at each decision, compute (start βˆ’ arrival).
  6. CPU utilization = Ξ£(burst_i / period_i).
  7. Hash insert position = apply h(k), linear probe on collision.
  8. Postfix/prefix evaluation β€” stack-based left-to-right.
  9. Postorder from preorder + BST property β€” reconstruct tree.
  10. Find candidate key β€” compute attribute closure, identify minimal cover.

EXAM STRATEGY

  • No negative marking β†’ answer ALL 50.
  • Allocate 1 min/Q average. Numerical Qs may need 2 min.
  • Skip-and-return for unclear Qs; lock in repeats first.
  • “All of the above” common correct option for definition Qs.
  • First-attempt accuracy matters β€” careful reading > speed.