ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages. by clickboo · Pull Request #3984 · stackrox/stackrox

clickboo · 2022-12-01T17:50:27Z

Description

Added a new rate limited logger
Added support to rate limit failed auth messages by hostname
Added hostname to failed auth log messages

Checklist

Investigated and inspected CI test results
~~Unit test and regression tests added~~
~~Evaluated and added CHANGELOG entry if required~~
~~Determined and documented upgrade steps~~
~~Documented user facing changes (create PR based on [openshift/openshift-docs]~~(https://github.com/openshift/openshift-docs) and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

Manual using curl on cluster.

ghost · 2022-12-01T20:41:59Z

Images are ready for the commit at 5106014.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.73.x-352-g5106014c4a.

clickboo · 2022-12-06T19:13:52Z

@rukletsov @mtodor @theencee Ping

pkg/grpc/authn/basic/extractor.go

rukletsov · 2022-12-06T20:59:24Z

pkg/logging/rate_limited_logger.go

+}
+
+// SetLimiter sets how many logLines are emitted in a given interval for a specific identifier.
+func (rl *RateLimitedLogger) SetLimiter(limiter string, logLines float64, interval time.Duration, burst int) {


I understand that you try to save on type conversion but what does it mean passing 3.14 here? Make it int if you want "num lines per interval" interface or replace it with a single linesPerSecond float64 if you want float.

pkg/logging/rate_limited_logger.go

mtodor

I have looked at the implementation, and I would suggest some bigger changes.

The main concern for me is SetLimiter.

we need to call it before every log line. Otherwise, logs could be ignored
the first call of SetLimiter defines the limit for all logs afterward

I would suggest defining rate limiting when an instance of NewRateLimitLogger. That allows us to create logs with different rates and use them. i.e.

logRateLimit5min = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 5*time.Minute, 1)
logRateLimit10sec = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 10*time.Second, 1)

And then we can call log.

logRateLimit5min.WarnL(ri.Hostname, "Token validation failed for hostname %v: %v", ri.Hostname, err)

This would require a change in allowLog. We should create a limiter if it does not exist. i.e.

func (rl *RateLimitedLogger) allowLog(limiter string) bool {
   if lim, ok := rl.rateLimiters.Get(limiter); ok {
      return lim.(*rate.Limiter).Allow()
   }

   rl.rateLimiters.Add(limiter, rate.NewLimiter(rl.baseLimit, rl.burst))

   return true
}

I assume that rl.baseLimit and rl.burst are created with a constructor.

@clickboo What do you think about these suggestions?

pkg/grpc/authn/tokenbased/extractor.go

pkg/logging/rate_limited_logger.go

clickboo · 2022-12-13T02:36:16Z

I have looked at the implementation, and I would suggest some bigger changes.

The main concern for me is SetLimiter.

we need to call it before every log line. Otherwise, logs could be ignored

the first call of SetLimiter defines the limit for all logs afterward

I would suggest defining rate limiting when an instance of NewRateLimitLogger. That allows us to create logs with different rates and use them. i.e.
logRateLimit5min = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 5*time.Minute, 1)
logRateLimit10sec = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 10*time.Second, 1)
And then we can call log.
logRateLimit5min.WarnL(ri.Hostname, "Token validation failed for hostname %v: %v", ri.Hostname, err)
This would require a change in allowLog. We should create a limiter if it does not exist. i.e.
func (rl *RateLimitedLogger) allowLog(limiter string) bool {
   if lim, ok := rl.rateLimiters.Get(limiter); ok {
      return lim.(*rate.Limiter).Allow()
   }

   rl.rateLimiters.Add(limiter, rate.NewLimiter(rl.baseLimit, rl.burst))

   return true
}
I assume that rl.baseLimit and rl.burst are created with a constructor.

@clickboo What do you think about these suggestions?

All good suggestions, but overkill IMO given rate limiting at ingres. Implemented, nonetheless.

openshift-ci · 2022-12-13T05:35:52Z

@clickboo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/gke-ui-e2e-tests	`ef25080`	link	false	`/test gke-ui-e2e-tests`
ci/prow/gke-postgres-ui-e2e-tests	`ef25080`	link	false	`/test gke-postgres-ui-e2e-tests`
ci/prow/gke-postgres-nongroovy-e2e-tests	`01403c1`	link	false	`/test gke-postgres-nongroovy-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mtodor

I have looked at the solution. It looks good. I like what you did here.

I would not completely rely on ingress. Ingres is something that we will use in ACSCS. And with that in place, we don't even need this functionality. This PR is really relevant for on-prem deployments too. For on-prem deployments, we don't know if the customer's infrastructure has such kind of protection or not. We know that without rate-limiting logs, there is a problem that central crashes because of accumulated logs. As I remember from our meeting, that's one reason we are adding this functionality in central.

There is one thing that I'm interested in, and I don't know if that is relevant for this implementation because it's not described in the ticket, so it's hard to know.

Hypothetical case. Let's say we have code like this:

log.WarnL(ri.Hostname, "Record 1")
// several lines of code
log.WarnL(ri.Hostname, "Record 2")

Our current implementation will only output the following:

Record 1

2nd log line will never be output, regardless of execution reaching it. Could that be a problem that we have partial context and we are losing important information?

Besides that, we can get this in. If we notice that additional functionality is required or that we can improve something, we can always add it.

…led auth messages.

github-actions bot added the area/ui label Dec 1, 2022

clickboo force-pushed the boo-rate-limited-logger branch from 1cd26dc to 7d2bea6 Compare December 1, 2022 20:17

clickboo force-pushed the boo-rate-limited-logger branch 7 times, most recently from b692755 to bb0f3cd Compare December 2, 2022 22:07

clickboo requested review from mtodor, rukletsov and theencee December 5, 2022 01:50

clickboo changed the title ~~ROX-13713: Implement a new rate limited logger, and use it to log failed auth messages.~~ ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages. Dec 5, 2022

rukletsov reviewed Dec 6, 2022

View reviewed changes

mtodor reviewed Dec 12, 2022

View reviewed changes

pkg/grpc/authn/tokenbased/extractor.go Outdated Show resolved Hide resolved

pkg/logging/rate_limited_logger.go Show resolved Hide resolved

clickboo force-pushed the boo-rate-limited-logger branch from 0e8f49c to 01403c1 Compare December 13, 2022 04:14

clickboo requested a review from mtodor December 15, 2022 22:03

mtodor approved these changes Dec 19, 2022

View reviewed changes

ROX-13713: Implement a new rate limited logger, and use it to log fai…

5106014

…led auth messages.

clickboo force-pushed the boo-rate-limited-logger branch from 01403c1 to 5106014 Compare January 13, 2023 18:55

clickboo merged commit a2943c2 into master Jan 13, 2023

clickboo deleted the boo-rate-limited-logger branch January 13, 2023 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984

ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984
clickboo merged 1 commit intomasterfrom
boo-rate-limited-logger

clickboo commented Dec 1, 2022 •

edited

Loading

Uh oh!

ghost commented Dec 1, 2022 •

edited by ghost

Loading

Uh oh!

clickboo commented Dec 6, 2022

Uh oh!

Uh oh!

Uh oh!

rukletsov Dec 6, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtodor left a comment

Uh oh!

Uh oh!

Uh oh!

clickboo commented Dec 13, 2022

Uh oh!

openshift-ci bot commented Dec 13, 2022

Uh oh!

mtodor left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

clickboo commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Testing Performed

Uh oh!

ghost commented Dec 1, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickboo commented Dec 6, 2022

Uh oh!

Uh oh!

Uh oh!

rukletsov Dec 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mtodor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

clickboo commented Dec 13, 2022

Uh oh!

openshift-ci bot commented Dec 13, 2022

Uh oh!

mtodor left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clickboo commented Dec 1, 2022 •

edited

Loading

ghost commented Dec 1, 2022 •

edited by ghost

Loading