feat(hydro_test): Example of produce and consume from Kafka#2805
feat(hydro_test): Example of produce and consume from Kafka#2805akainth015 wants to merge 1 commit into
Conversation
0873fbe to
ff3c338
Compare
82d91db to
9427980
Compare
9427980 to
d05a2d3
Compare
| # https://github.com/GitoxideLabs/cargo-smart-release/issues/36 | ||
| example_test = { path = "../example_test", version = "^0.0.0", optional = true } | ||
| hydro_build_utils = { path = "../hydro_build_utils", version = "^0.0.1", optional = true } | ||
| rdkafka = { version = "0.39.0", optional = true, features = ["cmake-build", "ssl-vendored"] } |
There was a problem hiding this comment.
Is there an option to use aws-lc-sys here? Then wouldn't need to install libcurl?
There was a problem hiding this comment.
Shadaj found this open PR to fix the issue, but it hasn't gotten traction. That's the reason we're seeing the issue with curl.h even though we're not using curl
There was a problem hiding this comment.
Lol do we know anyone at confluent who we can ping to merge this PR?
| "clippy", | ||
| # https://github.com/dtolnay/trybuild?tab=readme-ov-file#troubleshooting | ||
| "rust-src", | ||
| "llvm-tools", |
There was a problem hiding this comment.
Linker tools for the statically-linked libraries that get pulled in by my changes
- Add hydro_test/src/kafka/mod.rs with kafka_producer, kafka_consumer, dest_kafka, and setup_topic helpers following the SQS PR hydro-project#2746 pattern - Complete hydro_test/examples/kafka.rs: leader produces 1M financial transactions, consumer cluster computes per-account balances - Add 'kafka' feature flag gating rdkafka and futures-util as optional deps - Add llvm-tools component to rust-toolchain.toml for rust-lld
d05a2d3 to
a9cb533
Compare
| - name: Install nightly Rust channel | ||
| run: rustup toolchain add nightly | ||
|
|
||
| - name: Install libcurl dev (needed by rdkafka-sys) |
There was a problem hiding this comment.
| - name: Install libcurl dev (needed by rdkafka-sys) | |
| - name: Install libcurl dev (needed by rdkafka-sys) (REMOVE AFTER https://github.com/confluentinc/librdkafka/pull/5230) |
| "metadata_options": { | ||
| "http_tokens": "required", | ||
| "http_endpoint": "enabled" | ||
| }, |
There was a problem hiding this comment.
| "metadata_options": { | |
| "http_tokens": "required", | |
| "http_endpoint": "enabled" | |
| }, | |
| // Required for Kafka (?) | |
| "metadata_options": { | |
| "http_tokens": "required", | |
| "http_endpoint": "enabled" | |
| }, |
There was a problem hiding this comment.
No, this configures the instance to use IMDSv2. Every EC2 instance should be using IMDSv2 now.
| std::thread::spawn(move || { | ||
| handle.block_on(setup_topic(&brokers, &topic, num_partitions, &security_protocol)); | ||
| }) | ||
| .join() |
| #[arg(long, default_value = "m7i.large")] | ||
| aws_instance_type: String, | ||
|
|
||
| /// AWS AMI ID (Amazon Linux 2) |
There was a problem hiding this comment.
We should probably use AL2023 since AL2 is EOL in June, it would also be nice if we could use the ssm parameter to resolve the default AMI instead of pinning to an image: https://docs.aws.amazon.com/linux/al2023/ug/ec2.html#launch-via-aws-cli
| /// Run mode: "produce" (produce only, prints topic name), "consume" (consume only, requires --topic), or "both" (default) | ||
| #[arg(long, default_value = "both")] | ||
| mode: String, |
| brokers: &'a str, | ||
| group_id: &'a str, | ||
| topic: &'a str, | ||
| security_protocol: &'a str, |
There was a problem hiding this comment.
From a consumer pov I'd think providing a ClientConfig would be more flexible here
| )) => { | ||
| producer.poll(std::time::Duration::from_millis(100)); | ||
| } | ||
| Err((e, _)) => panic!("Failed to send message to Kafka: {}", e), |
There was a problem hiding this comment.
I'd expect that we don't want to panic in lib code, I'd like to talk about how we surface errs operationally
| sent.for_each(q!({ | ||
| let count = std::cell::Cell::new(0usize); | ||
| move |producer| { | ||
| let c = count.get() + 1; | ||
| count.set(c); | ||
| if c >= num_messages { | ||
| rdkafka::producer::Producer::flush( | ||
| &*producer, | ||
| std::time::Duration::from_secs(30), | ||
| ) | ||
| .expect("Failed to flush producer"); | ||
| println!("PRODUCE_DONE {}", c); | ||
| } | ||
| } |
There was a problem hiding this comment.
Not sure if rdkafka exposes async endpoints, but would be best to put this in a dest_sink, Sink could handle flushing directly
a15a670 to
e70eab6
Compare
This creates a Kafka example that can do produce and/or consume and print basic throughput information. The performance is not perfect yet, but we'll make changes to the Hydro API that let us do even better.