Lazily compile regexes to prevent expensive compilation at import.#95
Lazily compile regexes to prevent expensive compilation at import.#95rtibbles wants to merge 1 commit intoua-parser:masterfrom
Conversation
|
So while I get the sentiment, the flipside is also not great. In the primary use case in which this library was used, it was for a web service. In the web service, I'd personally prefer to take the hit on import time, rather than happening lazily over time. I'd rather it take a bit longer for the server to start up and serving requests, rather than the first N requests taking more time to serve. So ultimately, I think there's not a real way to win here and cover all concerns. I'd propose if you would rather not take the hit on import time, wrap the import in your function so it's called on demand. Something like: def do_the_thing():
from ua_parser import user_agent_parser as uap
uap.Parse(...)but this way you control when it gets imported and choose to take the hit when needed. I think the only reasonable alternative is to control this behavior iwth an environment variable, so the environment variable can be checked when the module is loaded, and toggle the behavior based on that. Otherwise, this change is going to be a negative affect on folks that prefer to take the hit on import. Thoughts? |
|
I'll close this for now, for the reasons @mattrobenolt explained, and because providing for a choice here would likely require a fair amount of design work. So would trying to find a better way to organise or lookup things (which might also be an option). I'd be interested in your use case though @rtibbles what is the situation where you are unhappy paying the price upfront, but are happy to pay pretty much the same one later on during run? |
|
With that said @rtibbles I'd be interested in having more information about your system and possibly the size of your Because on my machine with a 7860 lines > time python3.6 -c 'import ua_parser.user_agent_parser'
python3.6 -c 'import ua_parser.user_agent_parser' 0.13s user 0.04s system 91% cpu 0.180 total
> time python3.7 -c 'import ua_parser.user_agent_parser'
python3.7 -c 'import ua_parser.user_agent_parser' 0.12s user 0.04s system 92% cpu 0.170 total
> time python3.8 -c 'import ua_parser.user_agent_parser'
python3.8 -c 'import ua_parser.user_agent_parser' 0.11s user 0.04s system 91% cpu 0.165 total
> time python3.9 -c 'import ua_parser.user_agent_parser'
python3.9 -c 'import ua_parser.user_agent_parser' 0.12s user 0.04s system 91% cpu 0.167 total
> time python3.10 -c 'import ua_parser.user_agent_parser'
python3.10 -c 'import ua_parser.user_agent_parser' 0.11s user 0.04s system 92% cpu 0.163 totalthe import times you're referring to are an order of magnitude slower, which seems odd, is it some sort of strangely slow SoC? |
Currently all regexes are compiled at module import, which leads to intensive compilation of all possible regexes at module import.
In local testing on my dev machine this leads to importing
ua_parsertaking ~2.5 seconds.Deferring regex compilation has two benefits:
One possible future optimization based on this would be to do some sort of ordering of parsers based on browser/os/device prevalence to ensure that common lookups require less compilation, but as the ordering is determined in uap_core, and sensitive to the needs of potentially conflicting regexes, I did not make any attempt to implement that here.