ADT308 Comprehensive Course Work β Full Notes
ADT308 Comprehensive Course Work β Full Notes
Format: 50 MCQ Γ 1 mark | 1 hour | No negative marking Modules: ML | DSA | OS | DBMS | Data Science
MODULE 1 β Machine Learning
1.1 ML Categories
- Supervised β labelled training data. Examples: classification, regression, earthquake prediction (seismic data β next quake).
- Unsupervised β finds hidden structure in unlabelled data. Examples: clustering, PCA.
- Reinforcement Learning β agent learns via reward/penalty.
- 3 types of ML techniques (Q1 JAN24 answer).
1.2 ML Models
- Predictive β output involves target variable.
- Descriptive β pattern discovery, no target.
- ML = AI subset of algorithms that improve performance at a task over time with experience (All of the above).
1.3 Feature Types
- Nominal β names, no order (red, blue).
- Ordinal β ordered grades (A, B, C, D, E, F) β grade example.
- Categorical β discrete categories.
- Boolean β true/false.
1.4 Training Modes
- Batch / Offline learning β entire dataset trained at once (both a and b).
- Online learning β incremental updates.
1.5 Metrics
- Accuracy = correct predictions / total predictions.
- Precision = TP / (TP + FP) β ratio of correct positives to predicted positives.
- Recall = TP / (TP + FN).
- F1 = harmonic mean of precision and recall.
- Confusion Matrix β table comparing predicted vs actual labels.
- Residual = Observed β Predicted.
- ROC curve β TPR vs FPR.
1.6 Algorithms
| Algorithm | Type | Key Fact |
|---|---|---|
| Linear Regression | Supervised regression | Minimize least square error; line of best fit; Y = b0 + b1X + Ξ΅ |
| Logistic Regression | Supervised classification | Discriminative |
| NaΓ―ve Bayes | Supervised | Generative model; popular for text; works with missing features by integrating posteriors |
| SVM | Supervised | Solves convex optimization; depends on kernel selection + kernel params + soft margin (All) |
| Decision Tree | Supervised | Uses Gini index (purity of nodes); prone to overfitting |
| Random Forest | Supervised ensemble | Bagging-based; high complexity, time-consuming |
| KNN | Supervised | Lazy / instance-based / non-parametric learner |
| K-means | Unsupervised | Heuristic = Lloyd’s algorithm; uses Euclidean distance; fails on outliers, varied densities, non-convex shapes |
| Hierarchical (Agglomerative) | Unsupervised | Produces dendrogram (tree showing closeness) |
| DBSCAN | Unsupervised | Density-based; outlier detection |
| PCA | Unsupervised | Dimensionality reduction |
| Apriori | Association | Finds frequent itemsets |
1.7 Important Concepts
- Bayes’ theorem: P(H|E) = P(E|H)Β·P(H) / P(E)
- SVM kernel trick = implicitly map data to higher-dimensional space.
- Soft margin SVM allows misclassifications; hard margin does not.
- Bias-Variance tradeoff = model complexity vs generalization ability.
- Regularization = reduces model complexity (avoid overfitting).
- Cross-validation = assess generalization on independent data.
- LOOCV issue = high variance.
- Pre-processing order: Normalize β PCA β normalize PCA output β training.
- Gradient = 0 at extrema (min/max of function).
- Ensemble learning = train multiple models, combine predictions.
- AdaBoost = assigns weights to base models by performance.
- Boosting = decreases bias + variance, 2-class classifier.
- Vanishing gradient fix = ReLU activation / Batch normalization.
- Inference engines = forward + backward chaining.
MODULE 2 β Data Structures & Algorithms
2.1 Linear Structures
Stack (LIFO)
- Push/pop at top
- Used for: balanced parentheses, expression evaluation, function call
- Stack from queue β needs 2 queues
Queue (FIFO)
- Enqueue rear, dequeue front
- Queue from stack β needs 2 stacks
- Circular queue: size 11, front/rear at q[2], 9th element β q[10]
Deque β insert/delete at both ends.
- Standard ops: InsertFront, InsertRear, DeleteFront, DeleteRear
Priority Queue β efficiently implemented with Binary Heap / Fibonacci Heap.
Hash Table
- Key-value pairs
- Linear probing for collision: insert 142 at next free slot after collision at index 2
- (2x+5) mod 7 hash: insert 1,4,9,6 β
1, _, 9, 6, _, _, 4
2.2 Expression Notation
- Infix
a+b-c+d*(e/f)β Postfixab+c-def/*+ - Prefix eval
+ - * 3 2 / 8 4 1= (3*2) β (8/4) + 1 = 6 β 2 + 1 = 5 - Postfix eval
10 5 + 60 6 / * 8 -= 15 * 10 β 8 = 142
2.3 Trees
BST properties
- Left subtree < root < right subtree
- Both subtrees also BST
- In-order traversal β ascending sorted order
Traversals
- Preorder: root, L, R
- Inorder: L, root, R
- Postorder: L, R, root
- Given preorder 15,10,12,11,20,18,16,19 β postorder =
11,12,10,16,19,18,20,15
Heap β complete binary tree; max/min heap.
2.4 Graphs
- Complete graph edges = n(n-1)/2
- Simple graph adjacency matrix: symmetric, 0 diagonal
- DFS with k tree edges on n vertices β n β k connected components
- Dijkstra β shortest path from source to all (weighted)
- BFS β unweighted shortest path
- Prim’s / Kruskal’s β Minimum Spanning Tree
2.5 Sorting
| Algo | Best | Avg | Worst | Notes |
|---|---|---|---|---|
| Bubble | O(n) | O(nΒ²) | O(nΒ²) | |
| Insertion | O(n) | O(nΒ²) | O(nΒ²) | Best when already sorted |
| Selection | O(nΒ²) | O(nΒ²) | O(nΒ²) | Max swaps = nβ1 |
| Merge | O(n log n) | O(n log n) | O(n log n) | Divide & Conquer |
| Quick | O(n log n) | O(n log n) | O(nΒ²); O(n log n) with median pivot | |
| Heap | O(n log n) | O(n log n) | O(n log n) | |
| Counting | O(n+k) | O(n+k) | O(n+k) | 0 comparisons between input |
| Radix | O(dΒ·n) | β | β | Card sorter method |
- Binary search: O(log n); cannot apply to sorted linked list (no random access).
- Insertion sort definition: putting element in appropriate place in sorted list yields larger sorted list.
2.6 Linked List
- Worst-case delete intermediate node (given pointer): O(n) β need previous pointer.
- Recursive print:
1β2β3β4β5β6withfun(startβnextβnext)β prints1 3 5 5 3 1.
MODULE 3 β Operating Systems
3.1 Process Management
Process States: New β Ready β Running β Waiting/Blocked β Terminated
- Time slice expires β Ready state
- Process fails β log written to log file
CPU Scheduling
- Basis of multiprogramming OS
- Non-preemptive: FIFO (First-In First-Out), SJF (non-preemptive)
- Preemptive: Round Robin, SRTF, Multilevel Queue (with feedback)
Round Robin properties
- Works like SJF for large time quantum
- Does NOT use prior burst time knowledge
- Small quantum β poor response for short processes
SJF avg waiting time β sort by burst time when arrived, compute waiting.
Aging β increase priority of waiting jobs to prevent starvation (ensures finite-time termination).
3.2 Deadlock
4 Necessary Conditions: Mutual exclusion, Hold-and-wait, No preemption, Circular wait.
| Approach | Algorithm |
|---|---|
| Prevention | Negate one of 4 conditions |
| Avoidance | Banker’s Algorithm β examines resource allocation state |
| Detection + Recovery | Wait-for graph |
Invalid prevention scheme: “Never request a resource after releasing any resource” (this is invalid).
- Cycle in wait-for graph with single-instance resources β deadlock.
- Multi-instance: cycle β deadlock necessarily.
3.3 Memory Management
Paging
- Page size 4KB, process 103KB β internal fragmentation = 4 β (103 mod 4) = 1 KB
- Frame 4KB, PTE 2 bytes β addressable physical = 2^16 frames Γ 2^12 bytes = 2^28 bytes
- TLB: 32-bit virtual, 4KB page β 20-bit page number; 128 entries 4-way set assoc β 32 sets (5 bits) β TLB tag = 20 β 5 = 15 bits
Page Replacement Algorithms
| Algo | Belady’s anomaly | Notes |
|---|---|---|
| FIFO | Yes | |
| LRU | No | |
| Optimal | No | Theoretical best |
- Reference string
7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1size 4 FIFO β 14 faults. - LRU 4 frames on
A,B,C,D,A,B,E,A,B,C,D,Eβ 8 faults.
Demand paging β page brought to memory on requirement. Thrashing β processes frequently access pages not in memory. Dirty bit β page modified after load into cache/memory. Bootstrap program initializes system, starts OS.
3.4 File Systems
Allocation methods
- Contiguous, Linked, Indexed (and extension for large files in Unix)
- Max file size in indexed = f(block size, blocks for index, address size)
- File info stored in separate directory structure
- OS keeps open-file table for active files
Disk Scheduling
- SCAN β head moves end-to-end servicing requests, reverses at end
- C-SCAN β head returns to start without servicing on return
- LOOK / C-LOOK β like SCAN/C-SCAN but reverse at last request, not disk end
Memory allocation strategies (first/best/worst fit) β select free hole from available holes.
3.5 Misc OS
- Disk requires device driver.
- Real-time OS: RTLinux, QNX, VxWorks. Palm OS = NOT real-time.
- Protection: every CPU address checked against relocation and limit registers.
- Page fault service 10ms, mem access 1Β΅s, 99.99% hit β avg = 0.9999Γ1Β΅s + 0.0001Γ10000Β΅s β 1.9999 Β΅s.
- RR + FIFO is NOT non-preemptive. FIFO alone = non-preemptive.
MODULE 4 β DBMS
4.1 Architecture / Views
- Conceptual view = total database content (logical structure)
- Internal/Physical view = storage details
- External view = user-specific
- DBMS = application software
4.2 E-R Model
| Construct | Symbol |
|---|---|
| Entity | Rectangle |
| Attribute | Ellipse |
| Relationship | Diamond |
| Weak entity | Double-outlined rectangle |
| Multivalued attribute | Double ellipse |
| Derived attribute | Dashed ellipse |
- Generalization β bottom-up (combine entities).
- Specialization β top-down (split entity into sub-entities).
- Aggregation β relationship as entity.
E-R β Relational mapping (key Q!)
- 3 entities E1, E2, E3 + R1(1:M E1-E2) + R2(M:M E1-E2) + R3(M:M E2-E3), no attribute relationships β 5 tables
- E1, E2, E3 (3 entity tables) + R2 (M:M needs own table) + R3 (M:M needs own table). R1 = 1:M merges into M side.
4.3 Normalization
| NF | Rule |
|---|---|
| 1NF | Atomic attributes, no repeating groups |
| 2NF | 1NF + no partial dependencies on candidate key |
| 3NF | 2NF + no transitive dependencies |
| BCNF | every determinant is a candidate key (stronger than 3NF) |
| 4NF | handles multi-valued dependencies |
| 5NF | handles join dependencies |
- Every relation from E-R model is in 1NF (atomic by default).
- Every BCNF β 3NF, not vice versa.
- For R(A,B,C,D,E) with {BβA, AβC, BCβD, ACβBE}: find candidate key, check FDs β highest NF.
- For R(A,B,C,D,E,F) with {CβF, EβA, ECβD, AβB}: EC is candidate key (covers all attrs via closure).
Armstrong’s Axioms: Reflexivity, Augmentation, Transitivity. (Pseudotransitivity = NOT a primary axiom.)
Decomposition desirable property: Dependency preservation + lossless join.
4.4 SQL
| Category | Commands |
|---|---|
| DDL | CREATE, ALTER, DROP, TRUNCATE |
| DML | SELECT, INSERT, UPDATE, DELETE β query + modify tuples |
| DCL | GRANT, REVOKE |
| TCL | COMMIT, ROLLBACK |
- DROP TABLE = remove relation from DB.
- Drop column:
ALTER TABLE t DROP COLUMN c; - COUNT() counts rows.
- Aggregate functions: AVG, SUM, COUNT, MIN, MAX.
- NULL check:
WHERE addr IS NULL(not=NULL). CREATE SCHEMAβ multiple CREATE TABLE/VIEW/GRANT in single transaction.- Self-join via table aliases β TRUE.
- Outer-join is NOT fundamental relational algebra. Natural β outer join.
Relational Algebra
- Ο (selection) β filter rows
- Ο (projection) β select columns β PROJECTION
- β (natural join)
- Equivalent of
SELECT name, course_id FROM instructor, teaches WHERE instructor_ID = teaches_IDβSELECT name, course_id FROM instructor NATURAL JOIN teaches;
4.5 Keys
| Key | Definition |
|---|---|
| Super key | Set of attrs uniquely identifying tuple |
| Candidate key | Minimal super key |
| Primary key | Chosen candidate key |
| Foreign key | References primary key of another table β represents relationship |
4.6 Relationships
- One-to-many = One teacher β many classes.
- Self-join via aliases.
- Join of R(m) β S(n) β max = mΒ·n, min = 0.
4.7 Distributed DB
- Location transparency β query must specify fragments (data location handled implicitly).
MODULE 5 β Data Science
5.1 Foundations
- Data Science languages: R, Python (R chosen in MCQs).
- Modern data science attributed to William S (William S. Cleveland).
- Data Science = inspecting + cleaning + transforming data (all of the above).
- Data fishing = data dredging.
5.2 Data Representation
- Data frame = structured representation of data.
rep()β vector of repeated values in R.- Coding = process of quantifying data.
5.3 Visualization
- Scatterplot β relationship between 2 quantitative variables.
- Bar/Pie β categorical.
- Line β trends over time.
5.4 Data Preparation
- After acquiring data: Data cleansing.
- Missing data handling: remove column / replace with median / ignore row (all valid).
- Bayesian missing features β integrate posteriors over missing features.
- Outlier detection: Clustering, PCA, DBSCAN. Association rule mining = NOT outlier detection.
5.5 Statistics
- Perfect negative correlation = β1.
- Correlation +1 = perfect positive. 0 = no correlation.
5.6 Classification & Clustering Recap (overlap with M1)
- Classification algos: Decision Tree, NaΓ―ve Bayes, SVM, KNN.
- K-means, DBSCAN, Apriori = NOT classification (clustering/association).
- Hierarchical clustering output = dendrogram.
- High-dimensional clustering β K-Means (DBSCAN/hierarchical struggle with curse of dimensionality).
5.7 Regression
- Simple linear: Y = b0 + b1X + Ξ΅.
- Regression = describes associations + models relationships (all).
5.8 Data Warehousing
- Heterogeneous DB integration approaches = 2 (Query-driven + Update-driven).
- Data discrimination = classification mapping to predefined classes.
5.9 Applications
- Recommendation systems, image/speech recognition, price comparison.
- Privacy Checker β data science app.
QUICK-FIRE ANSWER BANK (Most-Asked Verbatim Repeats)
| Q | Answer |
|---|---|
| Bagging algo | Random Forest |
| Boosting weight-by-perf | AdaBoost |
| Generative model | NaΓ―ve Bayes |
| Lazy learner | KNN |
| K-means heuristic | Lloyd’s |
| K-means distance | Euclidean |
| Hidden structure unlabelled | Unsupervised |
| PCA purpose | Dimensionality reduction |
| Confusion matrix purpose | Predicted vs actual labels |
| SVM problem type | Convex optimization |
| Belady’s anomaly | FIFO |
| Deadlock avoidance | Banker’s |
| Bootstrap | Initializes system |
| Process after time-slice | Ready |
| Non-preemptive | FIFO |
| Process fail log | Log file |
| Frame 4KB, PTE 2B addressable | 2^28 bytes |
| Internal frag 4KB / 103KB | 1 KB |
| Dirty bit | Page modified after load |
| Aging | Priority increase |
| Not RTOS | Palm OS |
| Disk req driver | Disk |
| E-R attribute | Ellipse |
| E-R weak entity | Double rectangle |
| Min tables E1/E2/E3 R1(1:M) R2(M:M) R3(M:M) | 5 |
| Select columns op | PROJECTION |
| Drop relation | DROP |
| Count rows | COUNT(*) |
| 4NF handles | Multi-valued dependency |
| Minimal super key | Candidate key |
| Inter-table relationship key | Foreign key |
| Conceptual view | Total DB content |
| Top-down sub-entity | Specialization |
| Data Science lang | R |
| Quantifying data | Coding |
| Structured data rep | Data frame |
| 2 quantitative vars plot | Scatterplot |
| Vector of repeats in R | rep() |
| Perfect negative correlation | β1 |
| Dendrogram producer | Hierarchical clustering |
| Frequent itemsets | Apriori |
| Vanishing gradient fix | ReLU / Batch norm |
| SVM kernel trick | Implicit higher-dim mapping |
| Soft vs hard SVM | Soft allows misclassification |
Postfix of a+b-c+d*(e/f) |
ab+c-def/*+ |
| Divide & conquer sort | Merge |
| Binary search complexity | O(log n) |
| Heap sort worst | O(n log n) |
| Bubble sort | O(nΒ²) |
| Complete graph edges | n(nβ1)/2 |
| Best sort for sorted list | Insertion |
| BST ascending traversal | In-order |
| Stack from queue needs | 2 queues |
| Priority queue impl | Binary/Fibonacci heap |
| Balanced parentheses DS | Stack |
| Counting sort comparisons | 0 |
| Card sorter method | Radix |
NUMERICAL PROBLEM PATTERNS
- Page fault calculation β given reference string + frames + FIFO/LRU β simulate.
- Internal fragmentation = page_size β (process_size mod page_size).
- Addressable memory = 2^(PTE bits) Γ page_size.
- TLB tag bits = page_number_bits β set_index_bits.
- SJF average wait time = sort by burst at each decision, compute (start β arrival).
- CPU utilization = Ξ£(burst_i / period_i).
- Hash insert position = apply h(k), linear probe on collision.
- Postfix/prefix evaluation β stack-based left-to-right.
- Postorder from preorder + BST property β reconstruct tree.
- Find candidate key β compute attribute closure, identify minimal cover.
EXAM STRATEGY
- No negative marking β answer ALL 50.
- Allocate 1 min/Q average. Numerical Qs may need 2 min.
- Skip-and-return for unclear Qs; lock in repeats first.
- “All of the above” common correct option for definition Qs.
- First-attempt accuracy matters β careful reading > speed.