Credit Risk in Indian Banking: RBI Data Analysis

Posted on July 28, 2026 by Editor

Credit Risk in Indian Banking: What RBI’s Data Actually Shows

Every risk professional in Indian banking eventually asks the same question: is credit risk actually improving, or does it just look that way in aggregate numbers? Based on the Reserve Bank of India’s own published data, the answer is both. System-wide asset quality has genuinely strengthened over the past five years. But the composition of that risk is shifting in a direction that deserves closer attention.

This piece works through RBI’s Financial Stability Reports (FSR), sectoral credit data, and the newly finalized Expected Credit Loss (ECL) framework. Together they show what’s really happening with credit risk in Indian banking between 2020 and 2025. It does not rely on a proprietary survey or projected estimates dressed up as findings. In fact, every figure below is sourced directly to a named RBI report. That distinction matters: in a domain where regulators, auditors, and rating agencies check your numbers, credibility is the entire product.

Three questions structure the analysis. How has aggregate asset quality moved since the 2020 pandemic shock? Where is risk concentrating today, even as headline numbers improve? And what does the incoming ECL regime signal about how Indian banks will need to manage credit risk going forward?

Methodology and Data Sources

This analysis draws on primary RBI publications, cross-checked across multiple reporting periods for consistency.

RBI Financial Stability Reports (FSR): Published twice yearly. They consolidate Gross NPA (GNPA) and Net NPA (NNPA) ratios of Scheduled Commercial Banks (SCBs), capital adequacy (CRAR), bank-group-wise asset quality, and stress test results. Specifically, this piece uses FSR editions from January 2021 through December 2025.

RBI Sectoral Deployment of Bank Credit data: Monthly data on credit growth to industry, services, agriculture, and personal loans. It’s sourced from 41 banks representing roughly 95% of non-food credit.

RBI’s ECL framework releases: The draft ECL directions (October 2025) and final directions (April 2026). These describe the shift from incurred-loss to forward-looking PD/LGD/EAD-based provisioning, effective April 1, 2027.

One scope note: RBI does not publish a standardized “default rate by loan product” table. Where this article cites loan-category or bank-group figures, they are GNPA ratios: the share of gross advances classified as non-performing. It’s the metric RBI itself uses, and the one directly comparable across periods.

Finding 1: Asset Quality Has Improved for Five Consecutive Years

SCB GNPA: from 8% to 2.1% in five years

Period	GNPA Ratio	NNPA Ratio	Source
March 2020	8.4%	—	RBI FSR, Jan 2021
September 2020	7.5%	—	RBI FSR, Jan 2021
March 2024	2.8%	0.6%	RBI FSR, Jun 2024
March 2025	2.3%	0.5%	RBI FSR, Jun 2025
September 2025	2.1%–2.2%	—	RBI FSR, Dec 2025
March 2027 (projected, baseline)	1.9%	—	RBI FSR, Dec 2025

RBI’s January 2021 report recorded a September 2020 GNPA ratio of 7.5%, down from 8.4% in March 2020. That was a system still absorbing the pandemic shock. GNPA had fallen to 2.8% by June 2024, then to 2.3% by March 2025. It touched a multi-decade low of 2.1% by September 2025, and RBI projects further improvement to 1.9% by March 2027 under its baseline scenario.

In practice, this reflects five years of balance sheet cleanup: post-IBC resolution of legacy corporate stress, tighter underwriting after the 2018–2020 NBFC stress episode, and stronger capital buffers overall. Meanwhile, system-wide CRAR remains comfortably above regulatory minimums, with public sector banks at 16% and private banks at 18.1% as of September 2025.

In short, aggregate GNPA is a lagging confirmation of underwriting discipline, not a leading indicator. A PD model trained mainly on 2020–2022 stressed data will overstate current default risk. One trained only on 2023–2025 benign data risks understating tail risk in the next downturn.

Explore our Credit Risk Modeling Certification Training for a structured approach to PD estimation across credit cycles.

Finding 2: Improvement Isn’t Even Across Bank Groups

PSBs are catching up fast

For instance, PSB GNPA fell sharply from 3.7% in March 2024 to 2.8% in March 2025. Meanwhile, private bank GNPA held roughly stable at 2.8% over the same period, and foreign banks improved from 1.2% to 0.9%.

Even so, this convergence matters. For most of the post-2015 asset-quality-review era, PSB asset quality lagged private banks significantly, largely on corporate exposures. That gap has now nearly closed at the aggregate level. However, remaining risk differs by bank group, which leads to the more consequential finding below.

Finding 3: Unsecured Retail Is Where New Risk Concentrates

The retail risk hiding inside a good headline number

This is the most important finding for practitioners, because it sits underneath the reassuring headline number. According to RBI’s December 2025 FSR, roughly 53.1% of retail loan slippages now originate from unsecured products like personal loans and credit cards. At private banks, unsecured loans account for nearly 76% of fresh slippages. GNPA on unsecured retail loans stood at 1.8%, versus 1.1% for overall retail advances.

In other words, the 2.1% aggregate GNPA figure blends a very clean secured/corporate book with a smaller, faster-deteriorating unsecured retail book. RBI flagged this as a fintech-adjacent risk, tied to fast credit growth in small-ticket personal loans to borrowers under 35 through digital lending channels.

This pattern, in fact, tracks with operational experience. Unsecured lending has weaker recovery mechanics (no collateral to liquidate, higher LGD), shorter behavioral history on new-to-credit borrowers, and faster origination cycles that compress underwriting review. Moreover, it is the segment where forward-looking provisioning matters most, since unsecured risk builds up quietly between formal NPA recognition points.

As a result, portfolio-level GNPA alone is no longer sufficient. Overall, segment-level GNPA and vintage curves for unsecured retail belong alongside the aggregate number in any board-level risk dashboard.

Finding 4: ECL Will Formalize This Shift

Why the 2027 ECL shift matters here

RBI has issued directions introducing forward-looking ECL provisioning, replacing the incurred-loss model. It takes effect April 1, 2027, for scheduled commercial banks excluding RRBs, Small Finance Banks, and payments banks. ECL provisioning must be based on a bank’s own historical PD and LGD data spanning at least five years, subject to RBI-specified floors. Accounts 30–90 days past due move into Stage 2, a materially earlier trigger than the current framework.

Overall, the shift aligns India’s prudential norms with global IFRS 9 standards. In addition, it requires closer integration between finance and risk functions, as forward-looking macroeconomic scenarios become a formal input to provisioning.

Indeed, this is a direct regulatory response to Finding 3. An incurred-loss model recognizes impairment only after default has effectively occurred. ECL requires estimating expected loss, via PD, LGD, and EAD, well before that point, catching unsecured deterioration earlier in the cycle.

Even so, for banks building this capability, it isn’t a compliance task to fully outsource. RBI has explicitly made a bank’s board and senior management responsible for the adequacy of the ECL framework. Consequently, internal teams need working fluency in PD/LGD/EAD construction, not just the ability to read vendor output. However, it’s worth noting that the standard formula, Expected Loss = PD × LGD × EAD, assumes independence between the three components. In practice they’re correlated: LGD tends to rise in the same downturns that push PD higher. That’s why RBI’s stress tests apply adverse scenarios jointly rather than multiplying baseline figures in isolation.

What This Means for Banks and Risk Teams

First, aggregate GNPA improvement is real but incomplete. Segment-level monitoring, especially for unsecured retail, deserves as much attention as the headline ratio.
PD/LGD model recency matters. RBI’s own five-year minimum spans both a stressed period (2020–2021) and a benign one (2023–2025). Models need to represent both.
Collateral still matters, but isn’t the whole story. Unsecured products drive a disproportionate share of new slippages. In turn, this argues for tighter underwriting in that segment, not a wholesale retreat from unsecured lending.
Finally, the 2027 ECL deadline is closer than it looks. In practice, building five years of clean PD/LGD data and validation capability is a multi-year undertaking. Banks starting in 2026 are already behind institutions that began in 2024–2025.
Recovery rate discipline matters for LGD. LGD = 1 − Recovery Rate only holds up when ‘recovery rate’ is the economic, discounted, net-of-cost rate, not the nominal amount eventually collected.

Explore our Credit Risk Modeling Certification Training to build PD, LGD, and EAD modeling skills ahead of the 2027 ECL transition, or see Understanding Credit Risk: Definition and Types for foundational concepts referenced throughout.

FAQ

What is the current GNPA ratio of Indian banks?

As of September 2025, SCB GNPA stood at 2.1%, a multi-decade low, per RBI’s December 2025 Financial Stability Report.

Is unsecured lending riskier than secured lending right now?

Yes, and the gap is widening. Unsecured retail GNPA was 1.8% versus 1.1% for overall retail advances, and unsecured products drove over half of all retail slippages.

When does RBI’s ECL framework take effect?

RBI’s ECL Directions were issued 27 April 2026 and take effect April 1, 2027. They apply to commercial banks, excluding small finance banks, payments banks, and local area banks.

Does EL = PD × LGD × EAD fully capture expected loss?

It’s the standard starting formula, but it assumes PD, LGD, and EAD move independently. In stress, they’re correlated — which is why RBI applies adverse scenarios jointly rather than multiplying baseline values.

Conclusion

The data supports a measured conclusion, not a triumphant one. Indeed, Indian banking’s asset quality genuinely improved for five straight years, and RBI’s own numbers back that up without embellishment. However, the same data shows risk isn’t disappearing. Instead, it’s relocating toward unsecured retail lending, addressed through a regulatory shift that will demand more rigorous PD, LGD, and EAD modeling capability than most institutions currently have in-house. For risk analysts, credit officers, and model validators, that combination is telling: improving headline numbers alongside a harder compliance mandate. It’s exactly why 2025–2027 is a build-capability window, not a wait-and-see one.

This analysis is based on RBI’s Financial Stability Reports, Sectoral Deployment of Bank Credit data, and RBI’s ECL Directions (2025–2026). Figures are reported as published at the cited dates; readers should consult original RBI releases for the most current data.

Ready to Build These Skills Hands-On?

Understanding the theory behind PD, LGD, and EAD is the first step. Building bankable, interview-ready models — in Python or SAS, on real credit datasets, aligned to Basel and IFRS 9 — is what actually moves a career forward.

Explore Dexlab Analytics’ Credit Risk Modeling certification program to build PD, LGD, and EAD models from scratch, work through IFRS 9 ECL frameworks, and learn model validation techniques used by practicing risk teams.

What is Credit Risk? Understanding Credit Risk Definition and Types

Posted on July 17, 2026 by Editor

Understanding Credit Risk: Definition and Types

Introduction

In March 2021, a little-known family office called Archegos Capital Management defaulted on margin calls from its banks. Within days, Credit Suisse lost $5.5 billion. Nomura lost close to $2.5 billion. Morgan Stanley and UBS lost roughly $1 billion and $774 million. Combined, global banks lost more than $10 billion — not because of a market crash, but because of one counterparty’s concentrated, hidden leverage.

That’s credit risk in its purest form. It’s the possibility that someone you’ve extended money or exposure to won’t pay it back, and the cascading damage that follows when large exposures go bad at once.

Most people equate credit risk with loan defaults. That’s only part of the picture. Credit risk shows up in derivatives, trade settlements, corporate bonds, and interbank lending. It appears anywhere one party depends on another to deliver.

This guide breaks down what credit risk actually means. It covers the distinct types every risk professional needs to recognize, and how banks measure and manage it in practice, in India and globally.

What is Credit Risk?

Credit risk is the possibility that a borrower or counterparty fails to meet a financial obligation, causing a loss to the lender. That’s the formal definition. In plain terms: it’s the risk that you lend money, extend credit, or enter a contract with someone, and they don’t hold up their end.
The Reserve Bank of India’s Guidance Note on Credit Risk Management frames it more precisely. Credit risk can be an individual transaction risk — the chance that one specific loan goes bad. Or it can be a portfolio risk, which looks at how credit losses behave in aggregate across a bank’s book. A single bad loan is a manageable, expected cost of doing business. Thousands of correlated bad loans, going bad at once because they share a common vulnerability, is a solvency event.

Credit risk isn’t limited to banks lending to individuals or businesses. It appears in:

Loans and advances — the most familiar form, where a borrower fails to repay principal or interest
Bonds and fixed-income securities — where an issuer defaults on coupon payments or principal at maturity
Derivatives contracts — where a counterparty can’t meet its obligations under a swap, option, or forward
Trade finance and settlement — where one party in a transaction fails to deliver cash or securities as agreed
Guarantees and letters of credit — where a bank stands behind another party’s obligation and gets called on to pay

Under the Basel framework, credit risk-weighted assets typically make up the largest share of the capital a bank must hold. That’s larger than market risk or operational risk combined, for most commercial banks. This is why understanding credit risk isn’t a niche specialty. It’s the foundation most of banking risk management sits on.

Types of Credit Risk

Credit risk isn’t one uniform threat. RBI’s own framework splits transaction-level credit risk into default risk and rating migration risk. It splits portfolio-level risk into intrinsic risk and concentration risk. Layered on top of that, banking practice recognizes several distinct sub-types worth understanding individually. Three matter most for anyone building a working knowledge of the field: counterparty risk, concentration risk, and settlement risk.

The distinction isn’t academic. Each type demands a different measurement approach, a different mitigation strategy, and often a different team within a bank’s risk function. A credit officer underwriting a retail loan thinks primarily about default risk on that single borrower. A treasury desk trading derivatives thinks primarily about counterparty risk and daily mark-to-market exposure. A chief risk officer reviewing the whole institution thinks about concentration across the entire book. Does the bank have too much riding on one sector, one region, or one large group of related borrowers? Confusing these categories, or managing them with a single generic framework, is exactly how risks that look small individually compound into something systemic.

Counterparty Risk

Counterparty risk is the risk that the other party in a financial contract fails to fulfill their side of the deal. This isn’t a traditional borrower — it’s a trading or derivatives counterparty. It’s especially relevant in derivatives, securities lending, and prime brokerage relationships. There, exposure isn’t a fixed loan amount. It’s a fluctuating mark-to-market value.

The Archegos collapse is the clearest recent illustration. Archegos used total return swaps to build enormous, concentrated positions in a small number of stocks. It never owned the shares directly, and never disclosed the size of its bets. Its prime brokers — Credit Suisse, Nomura, Morgan Stanley, Goldman Sachs, and UBS — each saw only their own slice of Archegos’s exposure. None had visibility into the full picture. When the fund’s portfolio value fell roughly 30% in four days in March 2021, it couldn’t meet margin calls. Its brokers had to liquidate billions of dollars in positions simultaneously. Those unable to exit fast enough absorbed massive losses.

Analysts who studied the collapse point to a specific, technical failure mode: wrong-way risk. This occurs when a bank’s exposure to a counterparty grows precisely as that counterparty’s ability to pay deteriorates. Archegos’s swap exposure ballooned in tandem with its portfolio’s decline. The worse things got, the more the banks were owed, and the less able Archegos was to cover it. The European Central Bank later reviewed 23 major banks’ derivatives exposures. It found “material shortcomings” in how counterparty credit risk was governed across the industry.

Concentration Risk

Concentration risk arises when a bank’s credit exposure clusters too heavily around a single borrower, sector, geography, or risk factor. Diversification is supposed to protect a lender. If one borrower fails, the loss should be a small fraction of the total book. Concentration undermines that protection entirely.

Archegos illustrates this too, in a second way. The fund’s own portfolio was concentrated in a handful of technology and media stocks. When those specific names fell, there was no offsetting position to cushion the blow. The losses hit every position at once. Banks face the mirror image of this problem in lending. Heavy exposure to one industry, real estate for instance, or one large corporate group, means a single sector downturn can impair a disproportionate share of the loan book. RBI’s prudential framework directly addresses this. It sets single-borrower and group-borrower exposure limits specifically to prevent Indian banks from building the kind of concentrated exposure that made Archegos so dangerous to its lenders.

Settlement Risk

Settlement risk is sometimes called Herstatt risk, after the 1974 collapse of Bankhaus Herstatt. It’s the risk that one party in a transaction delivers its side — cash or securities — while the counterparty fails to deliver theirs. This typically happens because of timing gaps, or the counterparty’s failure between trade execution and final settlement. Herstatt was shut down by German regulators mid-day. By then it had already received Deutsche Mark payments from counterparties, but it hadn’t yet sent back the US dollars it owed in return. Banks on the other side of those foreign exchange trades lost their payments outright.

That single event reshaped global payments infrastructure. It led directly to the creation of CLS Bank, a settlement system designed specifically to eliminate this timing gap in foreign exchange transactions. CLS does this by settling both legs of a trade simultaneously. Settlement risk remains a live concern anywhere payment and delivery aren’t simultaneous. It’s a reminder that credit risk isn’t only about long-term loans — it can materialize in a transaction that’s supposed to finish within hours.

Default Risk Explained

Default risk sits at the center of every credit risk framework. It’s the risk that a borrower simply stops paying. Understanding what causes defaults, and how professionals frame the probability of one occurring, is foundational to everything else in credit risk modeling.

It’s also the oldest form of credit risk banks have grappled with. Long before derivatives, prime brokerage, or cross-border settlement systems existed, lenders were already trying to predict which borrowers would repay and which wouldn’t. Every other risk type covered in this guide is, in some sense, a more specialized variant of the same underlying question: will the party on the other side of this transaction deliver what they owe? Default risk asks that question in its most direct form, applied to a straightforward loan or credit exposure.

What Causes Defaults

Defaults rarely happen for one isolated reason. They typically result from a mix of factors:

Cash flow deterioration. A borrower’s income or business revenue falls below what’s needed to service debt. For individuals, this might mean job loss. For corporates, it might mean a demand shock or margin compression.
Over-leverage. A borrower takes on more debt than their income can realistically support, leaving no buffer when conditions turn even mildly unfavorable.
Macroeconomic shocks. Recessions, interest rate spikes, or sector-wide downturns push otherwise healthy borrowers into distress simultaneously — the mechanism behind concentration risk turning into realized losses.
Governance and fraud. Misrepresented financials, diverted funds, or outright fraud can turn an apparently strong borrower into a default risk overnight. Archegos itself is a governance case as much as a market one — Bill Hwang was later convicted on fraud and racketeering charges tied to how the fund misled its own counterparties about position size and concentration.
Willful default. Occasionally, a borrower who can pay simply chooses not to, usually when the cost of default (reputational or legal) is judged lower than the cost of repayment.

The Probability Framework

Rather than treating default as a binary surprise, credit risk professionals model it as a probability. A Probability of Default (PD) is expressed as a percentage likelihood that a borrower defaults within a defined time horizon, typically 12 months for regulatory purposes. A PD of 2% doesn’t mean a specific borrower is “slightly risky.” It means that, across a large pool of similar borrowers, roughly 2 in 100 are expected to default within that window.

This probabilistic framing matters. It converts an unpredictable individual event into something a bank can price, provision for, and manage at portfolio scale. Building a reliable PD estimate is a discipline in its own right — deciding which borrower characteristics matter, which statistical technique to use, and how to validate the result. We cover the full methodology, including logistic regression and survival analysis, in our detailed guide to PD estimation.

What matters at this stage is the underlying logic: default isn’t modeled as a yes/no outcome for an individual borrower. It’s modeled as a rate across a population, calibrated against historical data and adjusted for current conditions.

Credit Risk in International Banking

Credit risk doesn’t stop at national borders, and neither does its regulation. Two frameworks matter most for anyone working in or around Indian banking. One is the International Accounting Standards Board’s approach to credit loss recognition. The other is RBI’s domestic prudential framework.

Why does a domestic Indian bank need to care about an international accounting standard? Because capital markets are global, even when a bank’s loan book isn’t. Foreign investors, rating agencies, and cross-border lenders all benchmark a bank’s provisioning and capital adequacy against international norms, whether or not that bank operates outside India. A framework that looks conservative by domestic standards can still look under-provisioned by global standards. That gap shows up directly in borrowing costs, credit ratings, and investor confidence.

The IASB and IFRS 9

The International Accounting Standards Board (IASB) issued IFRS 9 specifically to fix a weakness exposed by the 2008 financial crisis. The old “incurred loss” model only recognized credit losses after a default had already happened — by which point it was too late for provisions to cushion the blow. IFRS 9 replaced that with an Expected Credit Loss (ECL) model. It requires banks to recognize losses based on forward-looking estimates, before default occurs. IFRS 9 has been adopted across more than 140 jurisdictions worldwide, making it one of the most widely applied accounting standards in global banking.

RBI’s Framework and India’s ECL Transition

India has historically run on a different system: the Income Recognition and Asset Classification (IRAC) norms. These classify loans as standard, sub-standard, doubtful, or loss, based on how many days they’re overdue, and provision accordingly. That’s closer to the old “incurred loss” approach IFRS 9 was designed to replace.

That’s now changing. RBI has confirmed that an ECL-based provisioning framework, with prudential floors, will apply to all Scheduled Commercial Banks from April 1, 2027. Under the new norms, banks will classify financial assets into Stage 1, Stage 2, or Stage 3. That classification depends on assessed credit losses at initial recognition, and at each subsequent reporting date — directly mirroring the IFRS 9 structure used globally.

RBI has also been actively updating its credit risk rules to align with Basel Committee on Banking Supervision (BCBS) standards. In 2026, it revised its counterparty credit risk framework specifically to bring India’s derivatives exposure measurement closer to global norms. That’s a change regulators internationally have prioritized in the years following the Archegos collapse. India’s Capital-to-Risk-Weighted-Assets Ratio (CRAR) requirement of 9% for scheduled commercial banks also exceeds the 8% global Basel III minimum. That reflects RBI’s consistently more conservative capital stance relative to international baselines.

Measuring Credit Risk

Defining and categorizing credit risk only gets a risk team so far. Managing it requires quantifying it — turning a qualitative concern into numbers that inform pricing, provisioning, and capital decisions. Four metrics form the core toolkit.

Probability of Default (PD)

As covered above, PD is the percentage likelihood that a borrower defaults within a set time horizon. It’s typically estimated through logistic regression or survival analysis, using historical repayment data, bureau scores, and financial ratios as inputs. PD is the starting point for nearly every downstream credit risk calculation.

Loss Given Default (LGD)

Default doesn’t automatically mean total loss. LGD measures the proportion of exposure a lender expects to actually lose after accounting for recoveries — collateral liquidation, guarantees, or restructuring. Getting LGD right requires more care than it first appears. It has to reflect the economic recovery rate, discounted for the time value of money and net of recovery costs — not just the raw cash eventually collected. A recovery that takes three years to materialize is worth meaningfully less than the same amount recovered immediately.

Credit Ratings

Credit ratings translate a borrower’s creditworthiness into a standardized letter grade, from investment-grade (AAA down to BBB-) to speculative or junk grades below that. In India, these come from agencies like CRISIL, ICRA, and CARE; internationally, from S&P, Moody’s, and Fitch. Ratings aren’t a substitute for internal PD modeling. But they serve two practical purposes. They give banks an independent, externally validated view of risk. And they directly feed into regulatory risk-weighting under the Basel Standardised Approach, where a lower rating translates into a higher capital charge against that exposure.

Expected Loss and Exposure at Default (EAD)

Exposure at Default (EAD) measures how much a lender is actually on the hook for at the moment a default happens. For revolving credit like credit cards, this can run higher than the current outstanding balance, since distressed borrowers often draw down more of their available limit before defaulting. Combining EAD with PD and LGD produces Expected Loss (EL) — the amount a bank should provision for a given exposure on average. It’s worth noting this combined figure is a standard industry simplification. It assumes PD, LGD, and EAD move independently, which understates risk in downturns, when all three tend to move against the lender simultaneously. Regulators address this specific gap by requiring “downturn” LGD and EAD estimates, rather than accepting benign-cycle averages alone.

Used together, these four metrics let a bank move from “this borrower feels risky” to a specific number. That number can be priced into a loan, provisioned for on the balance sheet, and reported to regulators with a defensible methodology behind it.

Conclusion

Credit risk is broader and more structured than “will this loan get repaid.” It spans counterparty exposure in derivatives, concentration across a portfolio, settlement timing in payments, and default at the level of an individual borrower. Each has distinct causes, distinct historical failures, and distinct measurement approaches. Understanding what is credit risk at this level of detail is the foundation every subsequent risk modeling technique builds on, from PD estimation through to full expected-loss calculation.

The Archegos collapse, the 2027 shift to ECL-based provisioning in India, and RBI’s ongoing alignment with Basel counterparty risk standards all point to the same conclusion. Credit risk management is not static. It evolves in direct response to the failures that expose its gaps.

Explore our Credit Risk Modeling Certification Training to master Risk Analytics. Build the PD, LGD, and EAD models covered here from scratch. Work through India’s transition to ECL provisioning, and learn the model validation techniques practicing risk teams use every day.

Complete Guide to Credit Risk Modeling Techniques

Posted on July 13, 2026 by Editor

Complete Guide to Credit Risk Modeling Techniques

A practical framework for PD, LGD, EAD and expected loss modeling in banking

Introduction

Between March 2018 and March 2021, gross non-performing assets at India’s public sector banks fell from a peak of 14.58% to 9.11% of advances. By September 2025, the system-wide gross NPA ratio for all scheduled commercial banks had dropped further, to a multi-decade low of 2.15%. That turnaround wasn’t accidental. Tighter underwriting, RBI’s Asset Quality Review, the Insolvency and Bankruptcy Code, and — underlying all of it — better credit risk models drove it.

The lesson for anyone building a career in banking analytics is straightforward. Credit risk is the single largest risk category banks carry on their balance sheets. How well a bank measures it directly determines capital adequacy, profitability, and long-term survival. A bank that underestimates default risk lends into losses it didn’t provision for. A bank that overestimates it prices good borrowers out of the market and loses share to competitors.

This guide walks through the complete framework professionals use to model credit risk. It covers the regulatory logic that makes credit risk modeling non-negotiable. It covers the three parameters that quantify it — PD, LGD, EAD. And it covers the statistical and machine learning techniques used to estimate each one. Maybe you’re a risk analyst preparing for an FRM exam. Maybe you’re a data scientist moving into banking analytics. Maybe you’re evaluating a credit risk modeling certification. Either way, this article builds the structured foundation technical interviews and real project work both expect.

By the end, you’ll understand the formulas. You’ll understand why each one exists, how professionals estimate it in practice, and where Indian banks apply it today.

What is Credit Risk and Why It Matters

Credit risk is the possibility that a borrower fails to meet a contractual debt obligation. The borrower could be an individual, a company, or a counterparty. Either way, it results in a financial loss for the lender. It is the risk that repayment doesn’t happen as agreed: a missed EMI, a defaulted corporate bond, a counterparty that can’t settle a derivative contract.

For a bank, credit risk isn’t one risk among many — it’s usually the dominant one. Loans and advances typically make up the largest share of a bank’s assets. Under the Basel framework, credit risk-weighted assets (RWA) form the biggest component of the capital a bank must hold. When credit risk is mismeasured, everything built on top of that measurement goes wrong too: capital adequacy ratios, loan pricing, provisioning, dividend capacity.

Why Credit Risk Modeling Matters to a Bank’s Survival

Three consequences flow directly from how well a bank models credit risk:

Capital adequacy. Under Basel III, banks must hold capital proportional to the risk-weighted value of their assets. The Capital-to-Risk-Weighted-Assets Ratio (CRAR) for India’s scheduled commercial banks stood at a strong 17.2% as of September 2025 — well above the regulatory minimum. Risk measurement and provisioning discipline improved sharply after 2015, and it shows.
Provisioning under IFRS 9. IFRS 9 replaced the incurred-loss model with an expected credit loss (ECL) model. Banks must now recognize losses before a default happens, based on modeled probabilities. That makes credit risk models a direct input into the profit and loss statement, not just a back-office risk tool.
Pricing and portfolio strategy. A bank that can accurately rank borrowers by risk can price loans correctly, set appropriate limits, and choose which segments to grow or shrink. A bank that can’t ends up either losing good customers to competitors with sharper pricing, or accumulating bad loans that eventually show up as NPAs.

The RBI Regulatory Backdrop

The Reserve Bank of India has steadily tightened the credit risk framework banks operate under. Its 2015 Asset Quality Review forced transparent recognition of stressed assets that restructuring had previously hidden. More recently, in November 2023, the RBI raised risk weights on unsecured consumer credit — from 100% to 125% for banks. NBFC exposures saw a further increase. The goal: curb underpriced risk-taking in retail lending. The RBI has also begun articulating principle-based guidance for AI use in credit decisioning through its evolving regulatory framework. Model governance, not just model accuracy, is now squarely on the regulator’s radar.

This regulatory environment is precisely why credit risk modeling has become a core competency banks and NBFCs actively hire for — not an optional analytics add-on.

Core Components of Credit Risk Modeling

Every credit risk model, regardless of the statistical technique behind it, answers one question: how much money could the bank lose on this exposure, and how likely is that loss?

The industry-standard approach breaks “expected loss” into three independently modeled components:

1. Probability of Default (PD)

PD is the likelihood that a borrower will fail to meet their obligations within a defined time horizon. That’s typically 12 months for regulatory capital purposes, or lifetime PD under IFRS 9 for certain asset stages. PD is expressed as a percentage. A PD of 3% means a 3% chance of default within the horizon, based on the borrower’s characteristics.

2. Loss Given Default (LGD)

Default doesn’t always mean total loss. If a borrower defaults, the bank usually recovers something — through collateral liquidation, guarantees, or restructuring. LGD is the proportion of the exposure the bank expects to actually lose after recoveries. It’s expressed as a percentage of the exposure. An LGD of 40% means that, on average, the bank recovers 60% of what it’s owed after a default.

3. Exposure at Default (EAD)

EAD is the total value the bank carries at the moment of default — not the sanctioned limit, but the amount actually outstanding. For a term loan, this is close to the outstanding balance. Revolving facilities like credit cards or cash credit limits work differently. There, EAD must account for the possibility that the borrower draws down more of the limit before defaulting.

Putting It Together: Expected Loss

These three parameters combine into the foundational credit risk formula:

Expected Loss (EL) = PD × LGD × EAD

Consider a simple example: a bank has an outstanding exposure (EAD) of ₹1 crore to a borrower with a PD of 4% and an LGD of 45%.

Expected Loss = 0.04 × 0.45 × ₹1,00,00,000 = ₹1,80,000

This ₹1.8 lakh isn’t a one-off loss estimate for a single account. It’s the amount the bank should provision for, on average, across a portfolio of similar loans. Multiply this calculation across thousands of accounts, segmented by product, geography, and borrower type. The result is a portfolio-level expected loss figure that feeds directly into provisioning, pricing, and capital planning.

Why This Formula Is a Simplification

EL = PD × LGD × EAD is the standard, industry-wide formula — it’s not wrong. But it rests on an assumption worth naming explicitly: that PD, LGD, and EAD are independent of one another. In reality they aren’t. Recoveries tend to get worse exactly when default rates spike, because both track the same macroeconomic cycle. A recession depresses collateral values and pushes more borrowers into default at the same time. Multiplying three independent point estimates understates loss in exactly the scenarios that matter most. That’s why regulators require downturn LGD and downturn EAD/CCF add-ons rather than accepting benign-cycle averages.

From Single-Period EL to Lifetime ECL

The formula above is also a single-period number, typically 12 months. Under IFRS 9, lifetime ECL for Stage 2 and Stage 3 assets isn’t one multiplication. It’s a discounted sum of marginal expected losses across every remaining period of the loan’s life:

Lifetime ECL = Σₜ (marginal PDₜ × LGDₜ × EADₜ) × discount factorₜ

That distinction matters in practice. A 12-month EL figure and a lifetime ECL figure for the same loan can differ substantially. Conflating the two is a common — and consequential — modeling error.

Each of the three components — probability of default, loss given default, and EAD — demands its own data, statistical techniques, and validation standards. Getting the overall expected loss number right depends entirely on getting each of these three right individually. It also depends on staying honest about where the simplifying assumptions break down.

Probability of Default (PD) Estimation

PD estimation is where most credit risk modeling careers begin, because it has the richest data history and the most mature statistical toolkit.

The Data Foundation

PD models draw on historical loan performance data: borrower demographics, financial ratios, repayment history, bureau scores (like CIBIL in India), and macroeconomic variables. The target variable is binary. Did the borrower default within the observation window, typically defined as 90 days past due (DPD), or not?

Method 1: Logistic Regression

Logistic regression remains the industry workhorse for PD modeling, and for good reason. It’s interpretable, regulator-friendly, and produces a probability output directly — exactly what PD requires.

The logistic regression model estimates:

PD = 1 / (1 + e^-(β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ))

X₁ through Xₙ are borrower characteristics: debt-to-income ratio, bureau score, vintage, loan-to-value ratio, and so on. Analysts estimate the β coefficients from historical default data.

Worked example: Suppose a simplified model uses two variables — bureau score and debt-to-income (DTI) ratio — and produces this equation:

Logit(PD) = -3.5 + (-0.02 × Bureau Score) + (2.1 × DTI)

For a borrower with a bureau score of 720 and a DTI of 0.35:

Logit(PD) = -3.5 + (-0.02 × 720) + (2.1 × 0.35) = -3.5 − 14.4 + 0.735 = -17.165

PD = 1 / (1 + e^17.165) ≈ 0.00003, or effectively near-zero risk — consistent with a strong bureau score and moderate leverage.

In practice, banks convert these raw PD outputs into a credit scorecard. It’s a points-based system: each variable band — score range, income bracket, tenure — contributes points that sum to a final score. That score maps back to a PD and a risk grade, AAA to D for instance. Most retail lending decision engines run on exactly this system.

Method 2: Survival Analysis

Logistic regression answers “will this borrower default within 12 months?” It doesn’t naturally answer “when.” Survival analysis models the time to default instead, using techniques like the Cox Proportional Hazards model or Kaplan-Meier estimation. It treats default as an “event.” It treats non-defaulted, still-active loans as “censored” observations.

This matters for two practical reasons. First, it lets banks estimate lifetime PD curves, which IFRS 9 requires for Stage 2 and Stage 3 assets. Second, it naturally handles loans that are still performing at the end of the observation period. It doesn’t treat them as “non-defaults” the way a simple logistic model would. That subtlety matters a great deal in mortgage and long-tenure corporate lending portfolios.

Machine Learning Extensions

Random forests, gradient boosting (XGBoost, LightGBM), and neural networks increasingly supplement — not always replace — logistic regression. This works particularly well where alternative data is available: transaction behavior, digital footprint, utility payment history. Industry research on large Indian banks backs this up. ML models can improve default prediction accuracy over ratio-based or bureau-only assessments, particularly for thin-file borrowers without a long credit history. The trade-off is interpretability. Regulators and internal model validation teams expect PD models to be explainable. That’s why techniques like SHAP (SHapley Additive exPlanations) have become standard for justifying ML-based PD outputs to auditors and regulators.

Common Mistakes in PD Modeling

Treating bureau score as a sufficient standalone predictor without controlling for portfolio-specific behavior
Ignoring population stability — a PD model trained on pre-pandemic data can be badly miscalibrated for a post-pandemic portfolio
Failing to account for right-censoring when using a fixed observation window
Overfitting on a small default sample, which is common in low-default portfolios like corporate or sovereign lending

Loss Given Default (LGD) and Recovery

PD tells you whether a loss will happen. LGD tells you how big it will be once it does. Most practitioners consider it the harder of the two to model well. Default events are relatively rare, and recovery processes can take years to conclude.

What Drives LGD

LGD is shaped primarily by three factors:

Collateral coverage and quality. A secured home loan with a well-documented, liquid property as collateral typically carries a far lower LGD than an unsecured personal loan. The reason: the bank has a tangible asset to recover value from.
Seniority of claim. In corporate lending, senior secured lenders recover more than subordinated or unsecured creditors in a resolution or liquidation.
Recovery mechanism and timeline. In India, recovery routes include SARFAESI Act enforcement, Debt Recovery Tribunals, and the Insolvency and Bankruptcy Code (IBC). These matter enormously for LGD estimation. IBC-driven recoveries have averaged around 94% of the fair value of resolved businesses, though considerably less against the original claim amount. System-wide NPA recovery rates for scheduled commercial banks have roughly doubled, from 13.2% in FY18 to 26.2% in FY25, reflecting stronger legal recovery infrastructure.

The Recovery Rate Relationship

LGD and recovery rate are two sides of the same coin:

LGD = 1 − Recovery Rate

That relationship is correct. But it’s easy to misapply, because “recovery rate” is doing a lot of work in that equation. It has to mean the economic recovery rate: the present value of net recoveries, after two adjustments. First, discount every recovered cash flow back to the default date, to account for the time value of money. Second, subtract the direct and indirect costs of recovery — legal fees, collateral liquidation costs, workout team overhead. A workout can take two to three years in India, even under IBC timelines. A rupee recovered in year three is worth meaningfully less than a rupee recovered on day one.

The common mistake: using the nominal recovery rate instead. That’s raw cash eventually recovered, divided by exposure, with no discounting and no cost deduction. It overstates the recovery rate and, by direct consequence, understates LGD. Say a bank nominally recovers 65% of exposure post-default, but that recovery arrives over three years and costs 8% of exposure in legal and liquidation expenses. The economic recovery rate falls well below 65%. The resulting LGD lands correspondingly higher than the naive “35%” the nominal figure would suggest.

Collateral Valuation

For secured lending, LGD modeling starts with realistic collateral valuation — not the value at origination, but the expected value at the time of liquidation, discounted for:

Market depreciation of the asset class (property, equipment, vehicles)
Haircut for forced-sale conditions versus fair market value
Time-to-recovery, since a three-year legal process erodes present value even if the nominal recovery is high
Direct recovery costs (legal fees, auctioneer fees, administrative costs)

Statistical Methods for LGD

Workout LGD (the standard approach): Banks track every actual default in their historical data. They record all cash flows recovered post-default: collateral sale proceeds, settlement payments, guarantee invocations. Then they discount those cash flows back to the default date, using an appropriate discount rate, and net off recovery costs. This produces an empirical, account-level LGD. Analysts then average and segment it by product, collateral type, and vintage.
Regression-based LGD models: Raw workout LGD is bounded between 0 and 1. It can occasionally exceed 1, when recovery costs surpass recoveries. Banks often use techniques suited to bounded outcomes instead. One option is beta regression. Another is a two-stage model: first predict whether any recovery occurs at all, then model the recovery amount conditional on recovery happening.
Downturn LGD: Basel requires banks to estimate LGD under economic downturn conditions, not just average conditions, because collateral values and recovery rates both tend to fall exactly when default rates rise. This is a critical, frequently underestimated requirement. An LGD model calibrated only on benign-cycle data will understate loss severity in a stress scenario.

A Practical Note on LGD in Indian Retail Lending

For Indian home loans, LGD modeling relies heavily on loan-to-value (LTV) ratio at origination and current LTV, adjusted for property price movements. Property is the dominant recovery source here. Unsecured personal loans and credit cards work differently — they carry substantially higher and more volatile LGD, since recovery depends almost entirely on borrower cooperation, collection agency effectiveness, or write-off. That’s part of why the RBI’s 2023 risk-weight increase on unsecured credit targeted loss severity concerns explicitly.
<h2id=”exposure-at-default”>Exposure at Default (EAD) Calculation

EAD often gets treated as the “simple” component of the PD × LGD × EAD formula, but any product with a revolving or undrawn component needs its own careful modeling.

Why EAD Isn’t Just the Current Balance

For a fully drawn term loan, EAD is straightforward. It stays close to the outstanding principal balance at any point in time, adjusted for scheduled amortization. Products like credit cards, overdrafts, and cash credit facilities work differently. A borrower can draw down additional funds between the assessment date and the moment of default. A borrower approaching financial distress often draws their credit line closer to the limit right before defaulting. That means EAD tends to run higher than the current outstanding balance, for exactly the accounts where it matters most.

The Credit Conversion Factor (CCF)

To capture this, banks use the Credit Conversion Factor (CCF) — the proportion of the currently undrawn commitment they expect to see drawn down before default.

EAD = Current Outstanding Balance + (CCF × Undrawn Commitment)

Worked example: A borrower has a credit card with a sanctioned limit of ₹5,00,000. The current outstanding balance is ₹2,00,000, leaving an undrawn commitment of ₹3,00,000. Historical data shows that, on average, borrowers who eventually default draw down 60% of their remaining undrawn limit before the default event (CCF = 0.60).

EAD = ₹2,00,000 + (0.60 × ₹3,00,000) = ₹2,00,000 + ₹1,80,000 = ₹3,80,000

This runs meaningfully higher than the ₹2,00,000 current balance. The expected loss calculation should use the ₹3,80,000 figure, not the current balance.

How CCF Is Estimated

Banks typically estimate CCF empirically, using a cohort approach. They identify accounts that defaulted. They look back at each account’s utilization level 12 months before default. Then they measure how much of the then-undrawn limit got drawn down by the time of default. Averaging this ratio across the portfolio — segmented by product type and current utilization band — produces the CCF estimates that feed EAD models.

CCF runs highest for revolving retail products like credit cards and overdrafts. It runs near-zero for term loans with no undrawn component. Corporate revolving credit facilities and working capital limits also require dedicated CCF modeling. Distressed corporate borrowers frequently draw down committed but unused credit lines as a liquidity buffer before default becomes evident.

EAD Under Basel and IFRS 9

Under the Basel Internal Ratings-Based (IRB) approach, EAD estimation follows the same downturn-conditioning logic as LGD. CCFs should reflect what happens under stressed conditions, since utilization tends to spike precisely when the broader environment deteriorates. Under IFRS 9, EAD projections also need to extend across the lifetime of the facility for Stage 2 and Stage 3 exposures, not just a fixed 12-month window.

Real-World Implementation in Banking

The theory behind PD, LGD, and EAD only matters if it translates into disciplined implementation. India’s banking sector over the past decade offers a clear illustration of what that looks like at scale.

The Turnaround in Numbers

Following the RBI’s 2015 Asset Quality Review, public sector banks’ gross NPA ratio rose sharply, as transparent recognition brought hidden stress to light. It peaked at 14.58% in March 2018. What followed was a sustained, model-driven cleanup. Recapitalization, tighter underwriting standards, and the Insolvency and Bankruptcy Code combined to bring the PSB gross NPA ratio down to 9.11% by March 2021, and further to 2.58% by March 2025. System-wide, across all scheduled commercial banks, the gross NPA ratio reached a multi-decade low of around 1.8%–2.15% through late 2025 and into 2026, per the RBI’s Financial Stability Report.

At the institution level, this shows up clearly in individual bank results. For the quarter ended March 2026, State Bank of India — India’s largest lender — reported a gross NPA ratio of 1.49% and a net NPA ratio of 0.39%. HDFC Bank reported a gross NPA ratio of 1.15% and net NPA ratio of 0.38%. These aren’t accidents of a benign credit cycle alone. They reflect years of investment: early warning systems, scorecard-based underwriting, stressed-asset monitoring infrastructure.

What Changed Operationally

Three implementation shifts are consistently cited across the sector:

Early Warning Systems (EWS). Public sector banks have rolled out automated EWS frameworks with roughly 80 distinct triggers. These pull in third-party data to flag stress in borrowing accounts before they slip into NPA status. This shifts credit risk management from reactive classification to proactive monitoring.
Machine learning-augmented scorecards. Industry research examining major Indian banks — SBI, HDFC, ICICI, Kotak Mahindra — found that machine learning techniques consistently outperform pure ratio-based or bureau-score-only assessments. The winning combination: logistic regression alongside random forests and neural networks, incorporating alternative data like transaction behavior and digital usage patterns. It works especially well for borrowers with thin credit files.
Faster, more granular underwriting. The same research found loan approval times at many Indian banks have compressed, from several days to a matter of minutes for eligible segments. Automated, model-based decisioning drives this, rather than manual file review. That shift only became possible once PD and exposure models earned enough validated trust to run with minimal manual override.

The Governance Layer

None of this works without governance. The RBI’s ongoing regulatory review — including its recently articulated principle-based framework for AI use in banking — signals something important. As banks lean further into ML-driven credit risk models, model validation, explainability, and monitoring will matter as much as raw predictive accuracy. A model that can’t be explained to an auditor or regulator can’t be deployed at scale in a regulated balance sheet, no matter how well it performs statistically.

Interview Questions to Test Your Understanding

Write out the expected loss formula and explain what each component represents.
Why is logistic regression still preferred over more complex ML models for regulatory PD models?
What’s the difference between workout LGD and downturn LGD, and why does Basel require the latter?
Explain why EAD for a credit card is typically higher than the current outstanding balance.
How would you validate a PD model’s performance? (Hint: think Gini coefficient, KS statistic, and calibration testing.)

Summary

Credit risk modeling breaks down a complex question: how much could a bank lose, and how likely is it? It splits that question into three independently estimated, rigorously validated components — Probability of Default, Loss Given Default, and Exposure at Default — combined through EL = PD × LGD × EAD. Analysts typically estimate PD through logistic regression and survival analysis, increasingly supplemented by explainable machine learning. LGD depends on collateral quality, recovery mechanisms, and downturn conditions. EAD requires modeling credit conversion factors for any revolving exposure. Together, these three parameters drive capital adequacy, IFRS 9 provisioning, and loan pricing. India’s banking sector’s asset quality turnaround over the past decade proves the point: disciplined credit risk modeling delivers at scale.

Frequently Asked Questions

Q1. What is the difference between credit risk and credit risk modeling?

Credit risk is the underlying possibility of borrower default. Credit risk modeling is the quantitative discipline of measuring that risk. It uses statistical and machine learning techniques to estimate PD, LGD, and EAD, so a bank can price, provision for, and manage the risk.

Q2. Is credit risk modeling the same as credit scoring?

Credit scoring is a subset of credit risk modeling, focused specifically on PD estimation for underwriting decisions. Full credit risk modeling also covers LGD, EAD, portfolio-level expected loss, stress testing, and regulatory capital calculation.

Q3. Which technique is better for PD modeling: logistic regression or machine learning?

Neither wins universally. Logistic regression remains preferred for regulatory capital models, thanks to its interpretability and regulator familiarity. Machine learning models can improve accuracy, particularly with alternative data. But they require additional explainability tooling — like SHAP — to meet governance and audit requirements.

Q4. What skills do I need to build a career in credit risk modeling?

You need statistics (logistic regression, survival analysis), proficiency in Python or SAS, familiarity with Basel and IFRS 9, and hands-on exposure to real credit datasets. Structured training that pairs regulatory context with practical model-building is generally the fastest path in.

Q5. How is credit risk modeling connected to IFRS 9?

IFRS 9 requires banks to recognize expected credit losses using forward-looking PD, LGD, and EAD estimates, rather than waiting for an actual default. Stage 1 assets use a 12-month figure, close to the standard EL = PD × LGD × EAD calculation. Stage 2 and 3 assets use lifetime ECL instead — a discounted sum of marginal expected losses across the loan’s remaining life, not a single multiplication.

Ready to Build These Skills Hands-On?

MongoDB Basics Part-II

Posted on February 20, 2021 by Dexlab

In our previous blog we discussed about few of the basic functions of MQL like .find() , .count() , .pretty() etc. and in this blog we will continue to do the same. At the end of the blog there is a quiz for you to solve, feel free to test your knowledge and wisdom you have gained so far.

Given below is the list of functions that can be used for data wrangling:-

updateOne() :- This function is used to change the current value of a field in a single document.

After changing the database to “sample_geospatial” we want to see what the document looks like? So for that we will use .findOne() function.

Now lets update the field value of “recrd” from ‘ ’ to “abc” where the “feature_type” is ‘Wrecks-Visible’.

Now within the .updateOne() funtion any thing in the first part of { } is the condition on the basis of which we want to update the given document and the second part is the changes which we want to make. Here we are saying that set the value as “abc” in the “recrd” field . In case you wanted to increase the value by a certain number ( assuming that the value is integer or float) you can use “$inc” instead.

2. updateMany() :- This function updates many documents at once based on the condition provided.

3. deleteOne() & deleteMany() :- These functions are used to delete one or many documents based on the given condition or field.

4. Logical Operators :-

“$and” : It is used to match all the conditions.

“$or” : It is used to match any of the conditions.

The first code matches both the conditions i.e. name should be “Wetpaint” and “category_code” should be “web”, whereas the second code matches any one of the conditions i.e. either name should be “Wetpaint” or “Facebook”. Try these codes and see the difference by yourself.

So, with that we come to the end of the discussion on the MongoDB Basics. Hopefully it helped you understand the topic, for more information you can also watch the video tutorial attached down this blog. The blog is designed and prepared by Niharika Rai, Analytics Consultant, DexLab Analytics DexLab Analytics offers machine learning courses in Gurgaon. To keep on learning more, follow DexLab Analytics blog.

Introduction to MongoDB

Posted on February 17, 2021February 17, 2021 by Dexlab

MongoDB is a document based database program which was developed by MongoDB Inc. and is licensed under server side public license (SSPL). It can be used across platforms and is a non-relational database also known as NoSQL, where NoSQL means that the data is not stored in the conventional tabular format and is used for unstructured data as compared to SQL and that is the major difference between NoSQL and SQL.
MongoDB stores document in JSON or BSON format. JSON also known as JavaScript Object notation is a format where data is stored in a key value pair or array format which is readable for a normal human being whereas BSON is nothing but the JSON file encoded in the binary format which is quite hard for a human being to understand.
Structure of MongoDB which uses a query language MQL(Mongodb query language):-
Databases:- Databases is a group of collections.
Collections:- Collection is a group fields.
Fields:- Fields are nothing but key value pairs
Just for an example look at the image given below:-

Here I am using MongoDB Compass a tool to connect to Atlas which is a cloud based platform which can help us write our queries and start performing all sort of data extraction and deployment techniques. You can download MongoDB Compass via the given link https://www.mongodb.com/try/download/compass

In the above image in the red box we have our databases and if we click on the “sample_training” database we will see a list of collections similar to the tables in sql.

Now lets write our first query and see what data in “companies” collection looks like but before that select the “companies” collection.

Now in our filter cell we can write the following query:-

In the above query “name” and “category_code” are the key values also known as fields and “Wetpaint” and “web” are the pair values on the basis of which we want to filter the data.
What is cluster and how to create it on Atlas?
MongoDB cluster also know as sharded cluster is created where each collection is divided into shards (small portions of the original data) which is a replica set of the original collection. In case you want to use Atlas there is an unpaid version available with approximately 512 mb space which is free to use. There is a pre-existing cluster in MongoDB named Sandbox , which currently I am using and you can use it too by following the given steps:-
1. Create a free account or sign in using your Google account on
https://www.mongodb.com/cloud/atlas/lp/try2-in?utm_source=google&utm_campaign=gs_apac_india_search_brand_atlas_desktop&utm_term=mongodb%20atlas&utm_medium=cpc_paid_search&utm_ad=e&utm_ad_campaign_id=6501677905&gclid=CjwKCAiAr6-ABhAfEiwADO4sfaMDS6YRyBKaciG97RoCgBimOEq9jU2E5N4Jc4ErkuJXYcVpPd47-xoCkL8QAvD_BwE
2. Click on “Create an Organization”.
3. Write the organization name “MDBU”.
4. Click on “Create Organization”.
5. Click on “New Project”.
6. Name your project M001 and click “Next”.
7. Click on “Build a Cluster”.
8. Click on “Create a Cluster” an option under which free is written.
9. Click on the region closest to you and at the bottom change the name of the cluster to “Sandbox”.
10. Now click on connect and click on “Allow access from anywhere”.
11. Create a Database User and then click on “Create Database User”.
username: m001-student
password: m001-mongodb-basics
12. Click on “Close” and now load your sample as given below :

Loading may take a while….
13. Click on collections once the sample is loaded and now you can start using the filter option in a similar way as in MongoDB Compass
In my next blog I’ll be sharing with you how to connect Atlas with MongoDB Compass and we will also learn few ways in which we can write query using MQL.

So, with that we come to the end of the discussion on the MongoDB. Hopefully it helped you understand the topic, for more information you can also watch the video tutorial attached down this blog. The blog is designed and prepared by Niharika Rai, Analytics Consultant, DexLab Analytics DexLab Analytics offers machine learning courses in Gurgaon. To keep on learning more, follow DexLab Analytics blog.

MongoDB Basics Part-I

Posted on February 12, 2021February 17, 2021 by Dexlab

In this particular blog we will discuss about few of the basic functions of MQL (MongoDB Query Language) and we will also see how to use them? We will be using MongoDB Compass shell (MongoSH Beta) which is available in the latest version of MongoDB Compass.

Connect your Atlas cluster to your MongoDB Compass to get started. Latest version of MongoDB Compass will have this shell, so if you don’t find this shell then please install the latest version for this to work.

Now lets start with the functions.

find() :- You need this function for data extraction in the shell.

In the shell we need to first write the “use database name” code to access the database then use .find() to extract data which has name “Wetpaint”

For the above query we get the following result:-

The above result brings us to another function .pretty() .

2. pretty() :- this function helps us see the result more clearly.

Try it yourself to compare the results.

3. count() :- Now lets see how many entries we have by the company name “Wetpaint”.

So we have only one document.

4. Comparison operators :-

“$eq” : Equal to

“$neq”: Not equal to

“$gt”: Greater than

“$gte”: Greater than equal to

“$lt”: Less than

“$lte”: Less than equal to

Lets see how this works.

5. findOne() :- To get a single document from a collection we use this function.

6. insert() :- This is used to insert documents in a collection.

Now lets check if we have been able to insert this document or not.

Notice that a unique id has been added to the document by default. The given id has to be unique or else there will be an error. To provide a user defined id use “_id”.

ARIMA (Auto-Regressive Integrated Moving Average)

Posted on January 27, 2021February 13, 2021 by Dexlab

This is another blog added to the series of time series forecasting. In this particular blog I will be discussing about the basic concepts of ARIMA model.

So what is ARIMA?

ARIMA also known as Autoregressive Integrated Moving Average is a time series forecasting model that helps us predict the future values on the basis of the past values. This model predicts the future values on the basis of the data’s own lags and its lagged errors.

When a data does not reflect any seasonal changes and plus it does not have a pattern of random white noise or residual then an ARIMA model can be used for forecasting.

There are three parameters attributed to an ARIMA model p, q and d :-

p :- corresponds to the autoregressive part

q:- corresponds to the moving average part.

d:- corresponds to number of differencing required to make the data stationary.

In our previous blog we have already discussed in detail what is p and q but what we haven’t discussed is what is d and what is the meaning of differencing (a term missing in ARMA model).

Since AR is a linear regression model and works best when the independent variables are not correlated, differencing can be used to make the model stationary which is subtracting the previous value from the current value so that the prediction of any further values can be stabilized . In case the model is already stationary the value of d=0. Therefore “differencing is the minimum number of deductions required to make the model stationary”. The order of d depends on exactly when your model becomes stationary i.e. in case the autocorrelation is positive over 10 lags then we can do further differencing otherwise in case autocorrelation is very negative at the first lag then we have an over-differenced series.

The formula for the ARIMA model would be:-

To check if ARIMA model is suited for our dataset i.e. to check the stationary of the data we will apply Dickey Fuller test and depending on the results we will using differencing.

In my next blog I will be discussing about how to perform time series forecasting using ARIMA model manually and what is Dickey Fuller test and how to apply that, so just keep on following us for more.

So, with that we come to the end of the discussion on the ARIMA Model. Hopefully it helped you understand the topic, for more information you can also watch the video tutorial attached down this blog. The blog is designed and prepared by Niharika Rai, Analytics Consultant, DexLab Analytics DexLab Analytics offers machine learning courses in Gurgaon. To keep on learning more, follow DexLab Analytics blog.

ARMA- Time Series Analysis Part 4

Posted on January 25, 2021February 13, 2021 by Dexlab

ARMA(p,q) model in time series forecasting is a combination of Autoregressive Process also known as AR Process and Moving Average (MA) Process where p corresponds to the autoregressive part and q corresponds to the moving average part.

Autoregressive Process (AR) :- When the value of Y_t in a time series data is regressed over its own past value then it is called an autoregressive process where p is the order of lag into consideration.

Where,

Y_t = observation which we need to find out.

α₁= parameter of an autoregressive model

Y_t-1= observation in the previous period

u_t= error term

The equation above follows the first order of autoregressive process or AR(1) and the value of p is 1. Hence the value of Y_t in the period ‘t’ depends upon its previous year value and a random term.

Moving Average (MA) Process :- When the value of Y_t of order q in a time series data depends on the weighted sum of current and the q recent errors i.e. a linear combination of error terms then it is called a moving average process which can be written as :-

y_t = observation which we need to find out

α= constant term

β_ut-q= error over the period q .

ARMA (Autoregressive Moving Average) Process :-

The above equation shows that value of Y in time period ‘t’ can be derived by taking into consideration the order of lag p which in the above case is 1 i.e. previous year’s observation and the weighted average of the error term over a period of time q which in case of the above equation is 1.

How to decide the value of p and q?

Two of the most important methods to obtain the best possible values of p and q are ACF and PACF plots.

ACF (Auto-correlation function) :- This function calculates the auto-correlation of the complete data on the basis of lagged values which when plotted helps us choose the value of q that is to be considered to find the value of Y_t. In simple words how many years residual can help us predict the value of Y_t can obtained with the help of ACF, if the value of correlation is above a certain point then that amount of lagged values can be used to predict Y_t.

Using the stock price of tesla between the years 2012 and 2017 we can use the .acf() method in python to obtain the value of p.

.DataReader() method is used to extract the data from web.

The above graph shows that beyond the lag 350 the correlation moved towards 0 and then negative.

PACF (Partial auto-correlation function) :- Pacf helps find the direct effect of the past lag by removing the residual effect of the lags in between. Pacf helps in obtaining the value of AR where as acf helps in obtaining the value of MA i.e. q. Both the methods together can be use find the optimum value of p and q in a time series data set.

Lets check out how to apply pacf in python.

As you can see in the above graph after the second lag the line moved within the confidence band therefore the value of p will be 2.

So, with that we come to the end of the discussion on the ARMA Model. Hopefully it helped you understand the topic, for more information you can also watch the video tutorial attached down this blog. The blog is designed and prepared by Niharika Rai, Analytics Consultant, DexLab Analytics DexLab Analytics offers machine learning courses in Gurgaon. To keep on learning more, follow DexLab Analytics blog.

Autocorrelation- Time Series – Part 3

Posted on January 22, 2021February 13, 2021 by Dexlab

Autocorrelation is a special case of correlation. It refers to the relationship between successive values of the same variables .For example if an individual with a consumption pattern:-

spends too much in period 1 then he will try to compensate that in period 2 by spending less than usual. This would mean that Ut is correlated with Ut+1 . If it is plotted the graph will appear as follows :

Positive Autocorrelation : When the previous year’s error effects the current year’s error in such a way that when a graph is plotted the line moves in the upward direction or when the error of the time t-1 carries over into a positive error in the following period it is called a positive autocorrelation.
Negative Autocorrelation : When the previous year’s error effects the current year’s error in such a way that when a graph is plotted the line moves in the downward direction or when the error of the time t-1 carries over into a negative error in the following period it is called a negative autocorrelation.

Now there are two ways of detecting the presence of autocorrelation
By plotting a scatter plot of the estimated residual (ei) against one another i.e. present value of residuals are plotted against its own past value.

If most of the points fall in the 1st and the 3rd quadrants , autocorrelation will be positive since the products are positive.

If most of the points fall in the 2nd and 4th quadrant , the autocorrelation will be negative, because the products are negative.
By plotting ei against time : The successive values of ei are plotted against time would indicate the possible presence of autocorrelation .If e’s in successive time show a regular time pattern, then there is autocorrelation in the function. The autocorrelation is said to be negative if successive values of ei changes sign frequently.
First Order of Autocorrelation (AR-1)
When t-1 time period’s error affects the error of time period t (current time period), then it is called first order of autocorrelation.
AR-1 coefficient p takes values between +1 and -1
The size of this coefficient p determines the strength of autocorrelation.
A positive value of p indicates a positive autocorrelation.
A negative value of p indicates a negative autocorrelation
In case if p = 0, then this indicates there is no autocorrelation.
To explain the error term in any particular period t, we use the following formula:-

Where Vt= a random term which fulfills all the usual assumptions of OLS
How to find the value of p?

One can estimate the value of ρ by applying the following formula :-

Credit Risk in Indian Banking: What RBI’s Data Actually Shows

Methodology and Data Sources

Finding 1: Asset Quality Has Improved for Five Consecutive Years

SCB GNPA: from 8% to 2.1% in five years

Finding 2: Improvement Isn’t Even Across Bank Groups

PSBs are catching up fast

Finding 3: Unsecured Retail Is Where New Risk Concentrates

The retail risk hiding inside a good headline number

Finding 4: ECL Will Formalize This Shift

Why the 2027 ECL shift matters here

What This Means for Banks and Risk Teams

FAQ

Conclusion

Ready to Build These Skills Hands-On?

Online Certification Courses to get you started

Understanding Credit Risk: Definition and Types

Introduction

What is Credit Risk?

Types of Credit Risk

Counterparty Risk

Concentration Risk

Settlement Risk

Default Risk Explained

What Causes Defaults

The Probability Framework

Credit Risk in International Banking

The IASB and IFRS 9

RBI’s Framework and India’s ECL Transition

Measuring Credit Risk

Probability of Default (PD)

Loss Given Default (LGD)

Credit Ratings

Expected Loss and Exposure at Default (EAD)

Conclusion

Online Certification Courses to get you started

Complete Guide to Credit Risk Modeling Techniques

Introduction

What is Credit Risk and Why It Matters

Why Credit Risk Modeling Matters to a Bank’s Survival

The RBI Regulatory Backdrop

Core Components of Credit Risk Modeling

1. Probability of Default (PD)

2. Loss Given Default (LGD)

3. Exposure at Default (EAD)

Putting It Together: Expected Loss

Why This Formula Is a Simplification

From Single-Period EL to Lifetime ECL

Probability of Default (PD) Estimation

The Data Foundation

Method 1: Logistic Regression

Method 2: Survival Analysis

Machine Learning Extensions

Common Mistakes in PD Modeling

Loss Given Default (LGD) and Recovery

What Drives LGD

The Recovery Rate Relationship

Collateral Valuation

Statistical Methods for LGD

A Practical Note on LGD in Indian Retail Lending

Why EAD Isn’t Just the Current Balance

The Credit Conversion Factor (CCF)

How CCF Is Estimated

EAD Under Basel and IFRS 9

Real-World Implementation in Banking

The Turnaround in Numbers

What Changed Operationally

The Governance Layer

Interview Questions to Test Your Understanding

Summary

Frequently Asked Questions

Ready to Build These Skills Hands-On?

Online Certification Courses to get you started

Classroom or Online Certification Courses to get you started

Classroom or Online Certification Courses to get you started

Classroom or Online Certification Courses to get you started

Classroom or Online Certification Courses to get you started

Classroom or Online Certification Courses to get you started

Call us to know more

Gurgaon

Kolkata