Skip to content

C++: Add expressions with type data to cpp/extraction-information#21544

Merged
paldepind merged 2 commits intogithub:mainfrom
paldepind:cpp/extraction-information-expr-types
Mar 24, 2026
Merged

C++: Add expressions with type data to cpp/extraction-information#21544
paldepind merged 2 commits intogithub:mainfrom
paldepind:cpp/extraction-information-expr-types

Conversation

@paldepind
Copy link
Contributor

@paldepind paldepind commented Mar 23, 2026

Follow up from #21512. Extends the query to also include data for expressions in the source with types. Like before the QL is more or less copy pasted from the similar queries for Rust/Java/C#.

Here's a comparison table for nmap similar to the one in the previous PR. The first row is the existing data, and the three below are from cpp/telemetry/extraction-information.

traced BMN BMN + deps inst
Expressions with a known type (existing data) 433,251 729,293 850,145
Percentage of expressions with known type 99.999 98.985 98.976
Number of expressions with known type 285,612 585,821 500,554
Number of expressions with unknown type 2 6,009 5,178

Again, the new numbers show more clearly the actual changes from BMN and dependency installation. Though at least for this project the call target data seemed more revealing. Two observations:

  • The increase in the number of expressions with a known type is relatively much greater than the increase in the number of calls with a target.
  • The percentage of expressions with a known type stays very high in both BMN cases. This seems to suggest that the reduction in expressions with a known type for dependency installation is because expressions are completely lost (and not just extracted without a type).

Perhaps others have some additional insight, but to me it seems that the resolved calls metric is more significant.

Here's the data shown in DCA.

@github-actions github-actions bot added the C++ label Mar 23, 2026
@paldepind paldepind marked this pull request as ready for review March 24, 2026 08:01
@paldepind paldepind requested a review from a team as a code owner March 24, 2026 08:01
Copilot AI review requested due to automatic review settings March 24, 2026 08:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the C/C++ cpp/telemetry/extraction-information metric query to also report statistics about expression type availability for expressions originating from source files, complementing the existing “resolved call target” quality metrics.

Changes:

  • Added expression type quality metrics (known vs unknown/erroneous type) to the extraction-information telemetry output.
  • Introduced shared “relevant source file” filtering to ensure metrics are restricted to source files with a relative path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cpp/ql/src/Telemetry/ExtractorInformation.ql Adds expression-type stats keys/values to the emitted telemetry metrics alongside call-target stats.
cpp/ql/src/Telemetry/DatabaseQuality.qll Defines the expression-type stats computation and refactors source-file scoping via a RelevantFile helper.

import cpp
import codeql.util.ReportStats

/** A file that is included in the quality statistics. */
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for RelevantFile is a bit vague given the actual predicate (source file + has relative path). Consider clarifying that this is specifically a source file included in the database-quality metrics (and why exists(getRelativePath()) is part of the criteria).

Suggested change
/** A file that is included in the quality statistics. */
/** A source file that is included in the database-quality statistics, that is,
* a file coming from the source code (`fromSource()`) and having a relative
* path in the database (`exists(getRelativePath())`), so that only files
* properly mapped into the database file tree are counted. */

Copilot uses AI. Check for mistakes.
@paldepind paldepind added the no-change-note-required This PR does not need a change note label Mar 24, 2026
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

SourceExpr() { this.getFile() instanceof RelevantFile }
}

predicate find(SourceExpr e) { not hasGoodType(e) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this predicate isn't clear to me, and it's unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I used it during testing and forgot to remove it. Fixed in 8cb5380.

@paldepind paldepind merged commit 0ed037d into github:main Mar 24, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants