significantly speed up import time of ua_parser#171
Closed
asottile-sentry wants to merge 1 commit intoua-parser:0.xfrom
asottile-sentry:speed-up-import-time
Closed
significantly speed up import time of ua_parser#171asottile-sentry wants to merge 1 commit intoua-parser:0.xfrom asottile-sentry:speed-up-import-time
asottile-sentry wants to merge 1 commit intoua-parser:0.xfrom
asottile-sentry:speed-up-import-time
Conversation
Contributor
|
This seems to be a repeat of #57, and with similar issues: it "fixes" an "issue" which can largely already be fixed (by importing the library itself lazily), and in the process creates a new issue which can't (the cost is now paid at first use rather than on init leading to first-use slowdowns, made worse if preforking). Lazy initialisation should be more of a possibility if #116 ever gets finished, but for the same reason as #57 I don't think it's a useful consideration for 0.x. |
Author
|
rather than everyone paying the lazy tax I would hope the library itself would help with that -- understandable though |
Closed
masklinn
added a commit
to masklinn/uap-python
that referenced
this pull request
Feb 13, 2024
Support is addef for lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed, but slow down run speed, possibly dramatically. Use them by default for the re2 parser, but not the basic parser: experimentally, on Python 3.11 - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) - importing the eager matchers takes ~97ms the eager matchers have a significant overhead, *however* running the bench on the sample file, they cause a runtime increase of 700~800ms on the basic parser bench, as that ends up instantiating *every* regex (likely due to match failures). Relatively this is not huge (~2.5%), but the tradeoff doesn't seem great, especially since the parser itself is initialized lazily. The re2 parser does much better, only losing 20~30ms (~1%), this is likely because it only needs to compile a fraction of the regexes (156 out of 1162 as of regexes.yaml version 0.18), and possibly because it gets to avoid some of the most expensive to compile ones. Fixes ua-parser#171, fixes ua-parser#173
masklinn
added a commit
to masklinn/uap-python
that referenced
this pull request
Feb 17, 2024
Support is added for lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed, but slow down run speed, possibly dramatically. Use them by default for the re2 parser, but not the basic parser: experimentally, on Python 3.11 - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) - importing the eager matchers takes ~97ms the eager matchers have a significant overhead, *however* running the bench on the sample file, they cause a runtime increase of 700~800ms on the basic parser bench, as that ends up instantiating *every* regex (likely due to match failures). Relatively this is not huge (~2.5%), but the tradeoff doesn't seem great, especially since the parser itself is initialized lazily. The re2 parser does much better, only losing 20~30ms (~1%), this is likely because it only needs to compile a fraction of the regexes (156 out of 1162 as of regexes.yaml version 0.18), and possibly because it gets to avoid some of the most expensive to compile ones. Fixes ua-parser#171, fixes ua-parser#173
masklinn
added a commit
to masklinn/uap-python
that referenced
this pull request
Feb 17, 2024
Support is added for lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed, but slow down run speed, possibly dramatically. Use them by default for the re2 parser, but not the basic parser: experimentally, on Python 3.11 - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) - importing the eager matchers takes ~97ms the eager matchers have a significant overhead, *however* running the bench on the sample file, they cause a runtime increase of 700~800ms on the basic parser bench, as that ends up instantiating *every* regex (likely due to match failures). Relatively this is not huge (~2.5%), but the tradeoff doesn't seem great, especially since the parser itself is initialized lazily. The re2 parser does much better, only losing 20~30ms (~1%), this is likely because it only needs to compile a fraction of the regexes (156 out of 1162 as of regexes.yaml version 0.18), and possibly because it gets to avoid some of the most expensive to compile ones. Fixes ua-parser#171, fixes ua-parser#173
masklinn
added a commit
to masklinn/uap-python
that referenced
this pull request
Feb 18, 2024
Add lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed (and memory consumption until triggered), but slow down run speed, possibly dramatically: - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) and ~70kB RSS - importing the eager matchers takes ~97ms and ~780kB RSS - triggering the instantiation of the lazy matchers adds ~800kB RSS - running bench on the sample file using the lazy matcher has 700~800ms overhead compared to the eager matchers While the lazy matchers are less costly across the board until they're used, benching the sample file causes the loading of *every* regex -- likely due to matching failures -- has a 700~800ms overhead over eager matchers, and increases the RSS by ~800kB (on top of the original 70). Thus lazy matchers are not a great default for the basic parser. Though they might be a good opt-in if the user only ever uses one of the domains (especially if it's not the devices one as that's by far the largest). With the re2 parser however, only 156 of the 1162 regexes get evaluated, leading to a minor CPU overhead of 20~30ms (1% of bench time) and a more reasonable memory overhead. Thus use the lazy matcher fot the re2 parser. On the more net-negative but relatively minor side of things, the pregenerated lazy matchers file adds 120k to the on-disk requirements of the library, and ~25k to the wheel archive. This is also what the _regexes and _matchers precompiled files do. pyc files seem to be even bigger (~130k) so the tradeoff is dubious even if they are slightly faster. Fixes ua-parser#171, fixes ua-parser#173
masklinn
added a commit
that referenced
this pull request
Feb 18, 2024
Add lazy builtin matchers (with a separately compiled file), as well as loading json or yaml files using lazy matchers. Lazy matchers are very much a tradeoff: they improve import speed (and memory consumption until triggered), but slow down run speed, possibly dramatically: - importing the package itself takes ~36ms - importing the lazy matchers takes ~36ms (including the package, so ~0) and ~70kB RSS - importing the eager matchers takes ~97ms and ~780kB RSS - triggering the instantiation of the lazy matchers adds ~800kB RSS - running bench on the sample file using the lazy matcher has 700~800ms overhead compared to the eager matchers While the lazy matchers are less costly across the board until they're used, benching the sample file causes the loading of *every* regex -- likely due to matching failures -- has a 700~800ms overhead over eager matchers, and increases the RSS by ~800kB (on top of the original 70). Thus lazy matchers are not a great default for the basic parser. Though they might be a good opt-in if the user only ever uses one of the domains (especially if it's not the devices one as that's by far the largest). With the re2 parser however, only 156 of the 1162 regexes get evaluated, leading to a minor CPU overhead of 20~30ms (1% of bench time) and a more reasonable memory overhead. Thus use the lazy matcher fot the re2 parser. On the more net-negative but relatively minor side of things, the pregenerated lazy matchers file adds 120k to the on-disk requirements of the library, and ~25k to the wheel archive. This is also what the _regexes and _matchers precompiled files do. pyc files seem to be even bigger (~130k) so the tradeoff is dubious even if they are slightly faster. Fixes #171, fixes #173
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
empty python interpreter
before
after