Skip to content

ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984

Merged
clickboo merged 1 commit intomasterfrom
boo-rate-limited-logger
Jan 13, 2023
Merged

ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages.#3984
clickboo merged 1 commit intomasterfrom
boo-rate-limited-logger

Conversation

@clickboo
Copy link
Contributor

@clickboo clickboo commented Dec 1, 2022

Description

  1. Added a new rate limited logger
  2. Added support to rate limit failed auth messages by hostname
  3. Added hostname to failed auth log messages

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on [openshift/openshift-docs](https://github.com/openshift/openshift-docs) and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

Manual using curl on cluster.

@clickboo clickboo force-pushed the boo-rate-limited-logger branch from 1cd26dc to 7d2bea6 Compare December 1, 2022 20:17
@ghost
Copy link

ghost commented Dec 1, 2022

Images are ready for the commit at 5106014.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.73.x-352-g5106014c4a.

@clickboo clickboo force-pushed the boo-rate-limited-logger branch 7 times, most recently from b692755 to bb0f3cd Compare December 2, 2022 22:07
@clickboo clickboo changed the title ROX-13713: Implement a new rate limited logger, and use it to log failed auth messages. ROX-13714: Implement a new rate limited logger, and use it to log failed auth messages. Dec 5, 2022
@clickboo
Copy link
Contributor Author

clickboo commented Dec 6, 2022

@rukletsov @mtodor @theencee Ping

}

// SetLimiter sets how many logLines are emitted in a given interval for a specific identifier.
func (rl *RateLimitedLogger) SetLimiter(limiter string, logLines float64, interval time.Duration, burst int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you try to save on type conversion but what does it mean passing 3.14 here? Make it int if you want "num lines per interval" interface or replace it with a single linesPerSecond float64 if you want float.

Copy link
Contributor

@mtodor mtodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the implementation, and I would suggest some bigger changes.

The main concern for me is SetLimiter.

  • we need to call it before every log line. Otherwise, logs could be ignored
  • the first call of SetLimiter defines the limit for all logs afterward

I would suggest defining rate limiting when an instance of NewRateLimitLogger. That allows us to create logs with different rates and use them. i.e.

logRateLimit5min = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 5*time.Minute, 1)
logRateLimit10sec = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 10*time.Second, 1)

And then we can call log.

logRateLimit5min.WarnL(ri.Hostname, "Token validation failed for hostname %v: %v", ri.Hostname, err)

This would require a change in allowLog. We should create a limiter if it does not exist. i.e.

func (rl *RateLimitedLogger) allowLog(limiter string) bool {
   if lim, ok := rl.rateLimiters.Get(limiter); ok {
      return lim.(*rate.Limiter).Allow()
   }

   rl.rateLimiters.Add(limiter, rate.NewLimiter(rl.baseLimit, rl.burst))

   return true
}

I assume that rl.baseLimit and rl.burst are created with a constructor.

@clickboo What do you think about these suggestions?

@clickboo
Copy link
Contributor Author

I have looked at the implementation, and I would suggest some bigger changes.

The main concern for me is SetLimiter.

  • we need to call it before every log line. Otherwise, logs could be ignored
  • the first call of SetLimiter defines the limit for all logs afterward

I would suggest defining rate limiting when an instance of NewRateLimitLogger. That allows us to create logs with different rates and use them. i.e.

logRateLimit5min = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 5*time.Minute, 1)
logRateLimit10sec = logging.NewRateLimitLogger(logging.LoggerForModule(), 100, 1, 10*time.Second, 1)

And then we can call log.

logRateLimit5min.WarnL(ri.Hostname, "Token validation failed for hostname %v: %v", ri.Hostname, err)

This would require a change in allowLog. We should create a limiter if it does not exist. i.e.

func (rl *RateLimitedLogger) allowLog(limiter string) bool {
   if lim, ok := rl.rateLimiters.Get(limiter); ok {
      return lim.(*rate.Limiter).Allow()
   }

   rl.rateLimiters.Add(limiter, rate.NewLimiter(rl.baseLimit, rl.burst))

   return true
}

I assume that rl.baseLimit and rl.burst are created with a constructor.

@clickboo What do you think about these suggestions?

All good suggestions, but overkill IMO given rate limiting at ingres. Implemented, nonetheless.

@clickboo clickboo force-pushed the boo-rate-limited-logger branch from 0e8f49c to 01403c1 Compare December 13, 2022 04:14
@openshift-ci
Copy link

openshift-ci bot commented Dec 13, 2022

@clickboo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-ui-e2e-tests ef25080 link false /test gke-ui-e2e-tests
ci/prow/gke-postgres-ui-e2e-tests ef25080 link false /test gke-postgres-ui-e2e-tests
ci/prow/gke-postgres-nongroovy-e2e-tests 01403c1 link false /test gke-postgres-nongroovy-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@clickboo clickboo requested a review from mtodor December 15, 2022 22:03
Copy link
Contributor

@mtodor mtodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the solution. It looks good. I like what you did here.

I would not completely rely on ingress. Ingres is something that we will use in ACSCS. And with that in place, we don't even need this functionality. This PR is really relevant for on-prem deployments too. For on-prem deployments, we don't know if the customer's infrastructure has such kind of protection or not. We know that without rate-limiting logs, there is a problem that central crashes because of accumulated logs. As I remember from our meeting, that's one reason we are adding this functionality in central.

There is one thing that I'm interested in, and I don't know if that is relevant for this implementation because it's not described in the ticket, so it's hard to know.

Hypothetical case. Let's say we have code like this:

log.WarnL(ri.Hostname, "Record 1")
// several lines of code
log.WarnL(ri.Hostname, "Record 2")

Our current implementation will only output the following:

Record 1

2nd log line will never be output, regardless of execution reaching it. Could that be a problem that we have partial context and we are losing important information?

Besides that, we can get this in. If we notice that additional functionality is required or that we can improve something, we can always add it.

@clickboo clickboo force-pushed the boo-rate-limited-logger branch from 01403c1 to 5106014 Compare January 13, 2023 18:55
@clickboo clickboo merged commit a2943c2 into master Jan 13, 2023
@clickboo clickboo deleted the boo-rate-limited-logger branch January 13, 2023 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants