ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984
ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984
Conversation
1cd26dc to
7d2bea6
Compare
|
Images are ready for the commit at 5106014. To use with deploy scripts, first |
b692755 to
bb0f3cd
Compare
|
@rukletsov @mtodor @theencee Ping |
pkg/logging/rate_limited_logger.go
Outdated
| } | ||
|
|
||
| // SetLimiter sets how many logLines are emitted in a given interval for a specific identifier. | ||
| func (rl *RateLimitedLogger) SetLimiter(limiter string, logLines float64, interval time.Duration, burst int) { |
There was a problem hiding this comment.
I understand that you try to save on type conversion but what does it mean passing 3.14 here? Make it int if you want "num lines per interval" interface or replace it with a single linesPerSecond float64 if you want float.
mtodor
left a comment
There was a problem hiding this comment.
I have looked at the implementation, and I would suggest some bigger changes.
The main concern for me is SetLimiter.
- we need to call it before every log line. Otherwise, logs could be ignored
- the first call of
SetLimiterdefines the limit for all logs afterward
I would suggest defining rate limiting when an instance of NewRateLimitLogger. That allows us to create logs with different rates and use them. i.e.
logRateLimit5min = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 5*time.Minute, 1)
logRateLimit10sec = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 10*time.Second, 1)
And then we can call log.
logRateLimit5min.WarnL(ri.Hostname, "Token validation failed for hostname %v: %v", ri.Hostname, err)
This would require a change in allowLog. We should create a limiter if it does not exist. i.e.
func (rl *RateLimitedLogger) allowLog(limiter string) bool {
if lim, ok := rl.rateLimiters.Get(limiter); ok {
return lim.(*rate.Limiter).Allow()
}
rl.rateLimiters.Add(limiter, rate.NewLimiter(rl.baseLimit, rl.burst))
return true
}
I assume that rl.baseLimit and rl.burst are created with a constructor.
@clickboo What do you think about these suggestions?
All good suggestions, but overkill IMO given rate limiting at ingres. Implemented, nonetheless. |
0e8f49c to
01403c1
Compare
|
@clickboo: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
mtodor
left a comment
There was a problem hiding this comment.
I have looked at the solution. It looks good. I like what you did here.
I would not completely rely on ingress. Ingres is something that we will use in ACSCS. And with that in place, we don't even need this functionality. This PR is really relevant for on-prem deployments too. For on-prem deployments, we don't know if the customer's infrastructure has such kind of protection or not. We know that without rate-limiting logs, there is a problem that central crashes because of accumulated logs. As I remember from our meeting, that's one reason we are adding this functionality in central.
There is one thing that I'm interested in, and I don't know if that is relevant for this implementation because it's not described in the ticket, so it's hard to know.
Hypothetical case. Let's say we have code like this:
log.WarnL(ri.Hostname, "Record 1")
// several lines of code
log.WarnL(ri.Hostname, "Record 2")
Our current implementation will only output the following:
Record 1
2nd log line will never be output, regardless of execution reaching it. Could that be a problem that we have partial context and we are losing important information?
Besides that, we can get this in. If we notice that additional functionality is required or that we can improve something, we can always add it.
…led auth messages.
01403c1 to
5106014
Compare
Description
Checklist
Unit test and regression tests addedEvaluated and added CHANGELOG entry if requiredDetermined and documented upgrade stepsDocumented user facing changes (create PR based on [openshift/openshift-docs](https://github.com/openshift/openshift-docs) and merge into rhacs-docs)If any of these don't apply, please comment below.
Testing Performed
Manual using curl on cluster.