Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: We built a ClickHouse-based logging service (github.com/highlight)
192 points by vadman97 on April 21, 2023 | hide | past | favorite | 110 comments
Hey hn! I'm one of the co-founders of highlight.io, an open source monitoring tool.

Today we're sharing a ClickHouse-based logging solution we've been working on. We wanted to showcase how we built it and share how you could try it out to give feedback. Since we started working on highlight.io, we've been hyper-focused on "cohesion", or ensuring that when you install your monitoring stack, all of the resources in that stack (user interactions, requests, traces, logs, etc.) are connected in a consumable way. We've written up more about our philosophy on this here [1].

We started building towards this by connecting your client-side app and your server-side exceptions with session replay and exception monitoring; i.e. if an error happened in a server-side app, we would make it easy (with session replay) to trace all the steps that a user took leading up to it.

Especially for larger companies using highlight.io, the request to tie in logs came up repeatedly, and we wanted to build this with the same philosophy in mind. Now, you'll see client-side and server-side logs all in one place, brought together in the context of a user session, as well as logs in the context of an error.

Like the rest of our stack, this project is written in Go and Typescript, and for log ingestion/querying, we're using ClickHouse [2]. Before deciding on ClickHouse, we were planning to use OpenSearch (an aws fork of elasticsearch [3]) for this part of our product, but as our traffic has increased, we encountered quite a few pains with write throughput for OpenSearch. After evaluating a few options, we eventually landed with ClickHouse (their cloud offering was icing on the cake), which has also proven to be much more cost-effective so far.

Building with ClickHouse from scratch has been an exciting journey. Eric (the mastermind behind this project) wrote a blog post [4] on a handful of ClickHouse learnings we've gathered since starting the project.

For those wanting to try out the product locally, you can run the following commands [5]:

git clone --recurse-submodules https://github.com/highlight/highlight cd highlight/docker; ./run-hobby.sh;

To send logs to highlight, you can use your own OpenTelemetry implementation [6] or use our SDKs [7] which provide lightweight wrappers over OTEL.

Like the rest of highlight.io, we plan to make money from this with our hosted cloud offering. For those interested in trying out the cloud-hosted version, you can get setup at app.highlight.io.

To open the floor for feedback, we would love to get some thoughts on what we've built so far. Beyond that, what are parts of a logging product you wish you had with your current setup? And are there any notable pain-points of using a hosted monitoring product? (We're toying with the idea of an enterprise deployment). Excited to hear from everyone.

[1]: https://highlight.io/docs/general/company/product-philosphy

[2]: https://clickhouse.com

[3]: https://news.ycombinator.com/item?id=26780848

[4]: https://www.highlight.io/blog/how-we-built-logging-with-clic...

[5]: https://www.highlight.io/docs/getting-started/self-host/self...

[6]: https://www.highlight.io/docs/getting-started/backend-loggin...

[7]: https://www.highlight.io/docs/getting-started/overview#for-y...



Wow, I can't wait to try this out, it could be a Sentry killer and doubly so given the friendly license

Also, thank you for introducing me to air (https://github.com/highlight/highlight/blob/sdk/highlight-go...) as that also looks super handy

p.s. for Show HN historians, here is the prior thread: https://news.ycombinator.com/item?id=34897645


Why do you want to replace Sentry? It's mature and works well.

Not sure how the license makes a difference in terms of being a value to your day to day development.

But I do love projects that can be self hosted, so Sentry is very nice on that part, especially when it doesn't limit its capability on the self hosted version.

Also this one doesn't even have PHP sdk and the GitHub issue doesn't have much of a request for it, which sounds like it has still some way to go to be mature given the lack of interest in the project at this point.

Couldn't they just reuse other project's sdk or help with OpenTelemetry? Developing yet another batch of sdk for a dozen language seems like wasted time.

But I do hope we get a nice open source (self hostable) tool that can do what it claims (error reporting, tracing, logging, metrics) because there's no such thing that does all well at the moment that's mature.


> But I do hope we get a nice open source (self hostable) tool that can do what it claims (error reporting, tracing, logging, metrics) because there's no such thing that does all well at the moment that's mature.

Keep an eye out for Highlight then!

> Couldn't they just reuse other project's sdk or help with OpenTelemetry? Developing yet another batch of sdk for a dozen language seems like wasted time.

We use OpenTelemetry for all of our SDKs. All we do is thinly wrap the SDK so developers don't have to deal with OpenTelemetry internal if they don't want to.

There's a doc on it here: https://www.highlight.io/docs/general/company/open-source/co...


Thank you for the response and sorry for not doing the homework there.

Looking forward to seeing the project take off as we need to glue stuff together from various projects to achieve your goal now.


No worries, and we appreciate it.


Isn’t sentry also open source and also using clickhouse at least in some parts?


yes to the second https://github.com/getsentry/self-hosted/blob/23.4.0/docker-... and only after the embargo is over to the first: https://github.com/getsentry/self-hosted/blob/23.4.0/LICENSE...

I also miss the "good old days" when running sentry was like 3 containers, not the 32 of modern Sentry


Yeah, fair point. Sentry has added quite some features and increased complexity. Probably for good reasons like serving large customers. But still


fwiw, we don't really see a need for more than a handful of containers, even long term. we want to make it easy to self-host Highlight. nonetheless, sentry is a reputable project. they deserve the success they've had so far.


Thanks, appreciate the kind note on the license. And yeah, air is really nice; we've been using it since the beginning. Gives us a "JS hot-reloading" like experience with a server-side app.


Did you consider Victoria metrics at all? It is based on clickhouse design. I am considering it for time series type historian so different Application but I think it could work where clickhouse works too.


Haven't heard of it unfortunately. What kind of time series data are you thinking of storing? If you're building an application on top of it, you're better off using an OLAP DB like ClickHouse, InfluxDB, etc. to have access to lower-level constructs.


My time series are from PLC in hydro electric power stations. Usually about 1000 series per turbine and another 1000 for general plant or river stuff. So it is just timestamp and bool, 16 or 32 bit u/int, or 32 bit float. Will just use grafana as front end. Seems to replace industry heavyweight offerings such as Osisoft Pi.


For those who have not used VictoriaMetrics, it is a high performance time series database that can drop-in replace Prometheus. We host ClickHouse at our company but use VictoriaMetrics for long term storage of prometheus metrics across tenants. It's fast, solid, and super economical to run.

If you want to learn more here's a webinar I did with Roman Kavronenko at VictoriaMetrics that compares it with ClickHouse.

https://www.youtube.com/watch?v=sCrdp8hIhJM

Disclaimer: I work for Altinity.


Does it work well to store logs? I thought it's great for metrics.


Clickhouse has been around for longer and battle tested at Yandex. In what cases might VictoriaMetrics be a better fit?


Have not seen Victoria! Seems like they don't have a cloud offering though?


They do have a "managed" offering (on AWS): https://victoriametrics.com/products/managed/.


Very cool. My mistake for overlooking that. Victoria metrics seems more like a metrics store? Something comparable to influx? Is that right?


Correct! It is a TimeSeries database for storing metrics. It is open source, including clustered version, and contains many additional tools for alerting, collecting and visualizing metrics. There are many in VictoriaMetrics community who switched from Influx to VictoriaMetrics due to performance reasons. See also the following doc comparing differences between these two here https://docs.victoriametrics.com/guides/migrate-from-influx....

Disclaimer: I'm one of the VictoriaMetrics maintainers.


“Deploy a hobby instance in one line on Linux with Docker (recommended 16 CPU cores, 32GB RAM, 256GB disk”

Cant say I would call these specs “hobby” at all


Apologies for the inconsistency here. Our Docker resource requirement recommendations were out of date after some recent improvements (https://github.com/highlight/highlight/pull/5074 and https://github.com/highlight/highlight/pull/4993).

Just updated this: 8GB of RAM, 4 CPUs, and 64 GB of disk space.


8GB of ram? For a hobby? Very resource intensive, eh?


I agree with your sentiment here. imo they chose the wrong words to describe what they meant e.g. ~0.5-1GB memory usage is more like a hobby-setup (assuming you run it on the same hw as the services you monitor).

However, after scrolling through the GitHub page, I feel like this is not a service which aims at people (might be completely mistaken) who either have a small set of services to monitor (and/or understand the logs and/or the interest to do it) or have their homelab at a relatively low financial priority (1x 8GB 2400 CL17 is €15 here).

8GB for a single service in a (home) environment is, imo, still a lot, but I think it is a sort-of reasonable figure for what it does and need to do to make that happen


We have had folks in our Discord successfully run highlight on a Raspberry Pi with 4GB RAM, so our recommendation is definitely on the safe side. We're running multiple infra services in the docker stack (postgres, opensearch, clickhouse, influxdb, kafka, redis) that we would look to consolidate in the future to help with running on leaner instances.


Totally agree with that. 8G might be too much to make the open source product popular.

I would say 4G may more sense to me, I know how much engineering effort it requires though. LOL


Unlikely to happen if there are still Java services running in that stack. For instance ElasticSearch/OpenSearch is good at what it does (which is full-text search) but pumping massive amounts of logs into it is never going to be a light solution. Solutions such as Loki which just index tags and dump the raw content into an object storage bucket are the cheapest, and I guess ClickHouse ends up somewhere in the middle between those two depending on how well you configure it to fit the data. Or vice versa how well you can configure the data to fit the technical solution.


This is one of the reasons we’re considering moving off opensearch.


Just about anything with only one instance is essentially hobby.


> Just about anything with only one instance is essentially hobby.

Not high availability? Sure.

However, I've seen software out there that ran as a monolith with a single deployment unit, facilitated lots of business processes and entire teams of people for continued development. Not everything necessarily needs high uptime, either. Some software can also serve particular time zones and have ample windows for scheduled maintenance, upgrades and so on.

There's probably at least a few classifications between hobby projects on one end and HA distributed systems on the other.

>> Cant say I would call these specs “hobby” at all

With this, however, I'm inclined to agree. In my eyes, "hobby" would imply something more along the lines of: "Just give this half a CPU core and about 512 MB of RAM, maybe up to a GB of storage depending on what you'll use it for, it'll probably work well enough for a few users."

Some software that mostly fits that definition, in my experience: Nextcloud, Apache2/Nginx/Caddy, Grav, Mattermost, Gitea, Heimdall, YOURLS, PrivateBin, phpBB, Uptime Kuma, Zabbix, PostgreSQL, MySQL/MariaDB, Redis, RabbitMQ, Docker Swarm and plenty others.

Some software that needs more resources: SonarQube, PeerTube (for encoding), OpenProject (Ruby app), Sonatype Nexus (bloated Java app, but lots of functionality), Matomo (issues with displaying historic data with low resources), BackupPC (compresson of backups), K3s and other Kubernetes cluster distros and plenty others, too.

Not to say that it somehow makes the software worse, just that people have different expectations. Perhaps more realistic expectations on my part for hobby software should be: "You should be able to launch it with whatever spare resources your laptop has."


Hard disagree. A hobby instance should be able to be spun up in the cloud on a free or near free tier.


How do you compare against something like Signoz (YC Backed). You should probably add them to your competitors list since they are also Open Source APM.


Highlight is a full-stack observability platform, so our recordings start from a user's frontend session and associate server-side errors and logs to provide debugging context. We've been building quite a lot to bolster the backend observability use case, including releasing the new logging product. We can't offer feature parity for backend traces and metrics yet, but we're planning to get there by the end of the year.


Is this just a "monitoring" platform, or does this also intend to make use of Clickhouse as a "log analysis" platform ala Mozilla Hindsight?

Which is to say: is there some obvious hook-point to add watcher jobs (as Clickhouse stored procedures, maybe?) to process correlations in the inputs to the system? (Where by "watcher jobs", I mean things like "create a record in a table if a user makes requests across endpoints A, B, C, in that order — with other arbitrary requests from that user in between — in a five-minute sliding window, at some point within 12 hours of the user's registration.")

Using a DB as a log-analysis system would be pretty great, if it was practicable, as you'd be able to correlate present events with events from the distant past (or with statistical aggregations of all of history up to the present), rather than having to build your correlations only from what can be buffered in memory.

But most log-analysis platforms need extremely high event-processing-job performance to scale — Mozilla Hindsight, mentioned before, is a rewrite in C+Lua of a previous system (Mozilla Heka) where the fundamental bottleneck was the Golang runtime. I'd be curious to know whether Clickhouse sprocs/triggers/etc have been tuned to function at that sort of scale...


> Which is to say: is there some obvious hook-point to add watcher jobs (as Clickhouse stored procedures, maybe?) to process correlations in the inputs to the system? (Where by "watcher jobs", I mean things like "create a record in a table if a user makes requests across endpoints A, B, C, in that order — with other arbitrary requests from that user in between — in a five-minute sliding window, at some point within 12 hours of the user's registration.")

This is not something we have planned, nor have we heard yet. Is the use case for doing analytics on your web application? Or more of a complex tracing use case that creates new traces as time passes by?

Fwiw, Highlight is designed (in the current state) for basic log search and alerts.


In our case: fraud detection.

We want to notice users who we ban for breaking the Terms of Use of our API SaaS, who then create new accounts and immediately resume doing the same thing they were doing before they were banned — where the metadata is all different (they're rotate VPNs, get a new [stolen] credit card, etc) but where there is a distinctive "activity-pattern fingerprint" to their ToU-breaking activity (different per violating user, but the same between the "incarnations" of the same user); and where having that fingerprint in the context of a brand new user is implausible, since nobody could learn to use our API to do such a complex thing so quickly.


Oh very interesting. I don't think we're going to build something for that use case specifically. But long term, we do plan to build a metrics product, which you could use to analyze these patterns. It's interesting to hear what people use these sorts of product for. Thanks for sharing.


I am curious about the motivation you choose Clickhouse over Apache Pinot, and Apache Druid? It could be helpful for other folks when choosing the OLAP db from one of them.


For us, a significant reason was the ClickHouse cloud-hosted offering, rather than having to manage a cluster ourselves. Their use of S3 as the backing storage medium means that large-scale data retention is quite affordable.

A good comparison we've referenced: https://leventov.medium.com/comparison-of-the-open-source-ol...


For reference, Apache Druid has an equivalent in Imply Polaris, and Apache Pinot has an equivalent in Startree. I can't speak for Startree, but Polaris similarly uses S3 for backing.


When I was highly engaged with Imply (Druid) a few years ago, S3 was also used as a backing storage. Is this not the case anymore?


I think both Pinot and Druid nowadays offer cloud-hosted solutions. Maybe you started early that only ClickHouse had that offering. Is cloud hosting the only reason you guys choose Clickhouse? I am also wondering is it possible to let users choose the data source?


Slight hijack. I / we went through a very similar tech selection process for timeseries metrics (not logging) ~1.5 years ago. We looked at Druid, ElasticSearch, TimeScale and a bunch of others.

Main takeaways were: the SQL flavor and its aggregations in CH are amazing. Running on a single node for dev laptops is trivial. It’s crazy fast with almost zero tuning.

It does not surprise me at all the CH is powering new products and startups.

Note: hosted CH did not exist yet. We are using Altinity to run our cluster.


> Note: hosted CH did not exist yet. We are using Altinity to run our cluster.

It exists now actually. We (highlight) are on hosted clickhouse, which went in GA a few months ago. https://clickhouse.com/cloud


Thanks for the shout out! "Altinity" in this case means Altinity.Cloud, which is a high-performance cloud ClickHouse. It's been around for over 2.5 years.

Disclaimer: I work at Altinity.


if you can afford SQL then you're not really doing timeseries in any meaningful sense


Clickhouse is fast and doesn’t have absurd architectural complexity.


+1


Not having to deal with a JVM is a major plus tbh.


I've seen so many variations of this comment on HN and I'm still not sure why not having to deal with the JVM is a major plus.


I'm perfectly fine with JVMs, but at a guess, some of it is the usual snobbery for anything strange. But some of it is due to associating JVMs with enterprise nightmares. And some is that JVM tuning is a bit of a dark art. I've made some very good money going in and turning JVM knobs that others were afraid to touch. (The secret, by the way, is to hack together some decent load simulation and then measure not just median numbers but things like 99th percentile latency.)


Have you ever operated a fleet of critical JVM instances and needed more memory? Don’t go over 32GB ram in an instance or the operating characteristics of you entire app change. Compressed memory pointers - oops. They are a blast to debug / operate!

https://stackoverflow.com/questions/25120546/trick-behind-jv...


JVM runtimes have a relatively high startup cost, are not often good 'citizens' in an instance running multiple types of software, and the build processes for a lot of JVM deliverables is an ungodly mess.

Many of those bells and whistles are near-necessary in the enterprise world, but you have the accumulated mass of 'red zones' and developmental landmines in that ecosystem that can quickly turn you off it as a whole if you want to understand the whole system.


I still don't understand some of this -- I developed in Java for 5+ years.

>JVM runtimes have a relatively high startup cost I think many people are okay with that when developing server software that's going to run weeks at a time. It can get a bit annoying with trying to rapidly iterate. And I think things are changing pretty quickly with AOT builds and general improvements.

>and the build processes for a lot of JVM deliverables is an ungodly mess.

I recall using "mvn package." That's it. This was on two different systems that served a good bit of traffic and weren't simple trivial projects.


I don't know if it's a standard Java thing or just an IntelliJ thing, but there's a setting that will hot-patch a running JVM when you change code. Things can get messy if you (or your dependencies) make assumptions about the ClassLoader being used, but other than that it works great.

Still not as good as C#'s debugger in Visual Studio (hit a breakpoint, edit the code, drag the execution back before the problem, resume and run the patched version) but nothing I've seen really is.

Setting up Gradle projects is a bit more involved depending on your setup, but in the end it's still a single command to build an executable JAR.


Yeah, it's been a second since I've used IntelliJ/Spring, but I recall that being the case as well.

Gradle is something I've never messed with, but that makes sense.


I take it you haven't experienced the hell that is to deal with Hadoop JARs. It's absolutely ridiculous.


Having to worry about GC in a database is a pretty bad experience. It also tends to require way more resources than necessary, and just a pretty complex configuration


gc isn't the issue, the jvm is the issue


basically, the jvm is technically sophisticated but operationally complicated

it sucks to use

many people believe otherwise, but those people have rich jvm experience, which is not easy to get


Druid has like 9 different node types and inherits the whole Hadoop configuration mess and complexity


3, and there's absolutely no need for hadoop, particularly with MSQ


Anecdata: tried out Druid and Clickhouse for my SaaS. Couldn’t get Druid working. CH ran in 2 minutes.


Interesting, just found another post from yesterday about the comparison: https://news.ycombinator.com/item?id=35642522, though the comparison is coming from Pinot team.


I suggest adding a pricing tier between $0 and $50/month. I nearly bounced until I saw you also offer very reasonable usage-based billing. You could just calculate what $5/month of usage-based billing could buy and list it as a tier, but having the lowest tier be $50 sends an incorrect signal that this isn't for hobbyists.


That's good feedback. We'll update the pricing page accordingly when the time comes. Question: do you worry about usage based billing with respect to surprise charges? Or do you expect a way to limit this?


Both. IMO you should have a plan where the user pays e.g. $7 and when the resources for it are drained, you start refusing requests until the throttle period expires.

It's extremely useful to prototype and experiment with a project and have it have a total budget that will not be surpassed.

One more idea: pay as you go. I pay $10, that turns out to be not enough, I pay another $20 and get immediately unblocked.


Makes sense. We're going to do #1 for sure. #2 is more difficult because we have to be cognizant of or internal costs. But will share with the team nonetheless. Thank you.


No, I'm perfectly happy with usage-based billing. But when I see the cheapest option costing $50 my heuristic is that there probably isn't an affordable other option. I was on mobile, so the usage-based section was below the fold.

Limiting would be ideal, but all I was suggesting is that you indicate above the fold that you have cheaper options


Sounds good, will update copy accordingly. Thanks for the feedback.


Intuitively it seems like logging and error reporting software would be prime targets for supply chain attacks, as they have access to a treasure trove of exploitable data. A strategic modification in some tiny transitive dependency or two and your logging server could log not just for you. Presumably, gathered data could be used to deanonymize users or exploit your software.

How do people go about vetting and/or self-hosting those, if you operate [software for] a business with actual customers? Is stripping sensitive data at the client enough? Do you lock down outgoing connections through external networking configuration if you self-host? Am I being too paranoid?


This is an interesting topic I've explored a bit. The tldr; is that if your business can't afford or recover from a vendor having an incident, you shouldn't be using a vendor. This is often why highly regulated industries are slower moving and more expensive to operate: You need to either spend a lot of time vetting your vendors with policy (via contracts, compliance requirements, certification requirements) and practice (sign NDAs and review source code for components of interest, run joint penetration tests, fund bug bounty programs).

That said, there are mitigations you can take. There are end-to-end encrypted log solutions out there. Honeycomb.io used to have (they might still?) an interesting offering I used at one employer to encrypt sensitive fields in logs leaving our infrastructure. They had the UI set up to talk to our encryption service and decode things on the fly in the user's browser-side UI so that they (Honeycomb) never had direct, unfettered access to sensitive data.

There are other approaches you can take, but things get tricky when you either need to audit your vendor's access to your data or assume that your vendor can't secure your data to your satisfaction. Better to do it yourself at that point if you have the resourcing to do so.


Is this like schema-less? How do you do indexing for individual log fields and stuff?


They put log attributes into map and then index that map. See https://www.highlight.io/blog/how-we-built-logging-with-clic... (search for CREATE TABLE).


You get structured attribute search without a schema with Highlight. More in our docs: https://www.highlight.io/docs/general/product-features/loggi...


I started using BigQuery with stream inserts for logging. I have around 300k inserts a day und it costs a whole cent a day for writes and even less for reads (I don’t read much though).


With 3.5 inserts/sec you may equally use SQLite.


What does you ingest setup look like with BigQuery? Are you using something like fluentd to pipe logs over?


How does this compare to Loki?


In isolation (just our logging product), Loki is comparable. It would be interesting to do a benchmark on Loki and see what a comparison looks like. Beyond logging, we do session replay and error monitoring and tie all of these things together.


I think what matters is the interface. A clean interface to read structured logs in table view than throw raw text at the user like many products do and nice way to filter them.

Loki + Grafana isn't really good for log viewing at all. I use Metabase to read logs sent to ClickHouse which gives far nicer interface.


Whoa. Didn't know folks did that. I always thought of metabase as BI tool, but the table UI is quite nice.


This is not logging. It's web session telemetry.


As a non-web developer I was also confused about the generic terms like "monitoring" etc. and had to scroll a bit before I realized it's just for web stuff.


That's good feedback. Thanks. What would you prefer we write? Just mentioned "Web" somewhere in the headline?


Would be better to use OpenTelemetry vocabulary. This seems to be RUM product - https://github.com/open-telemetry/oteps/issues/169


Why isn't it logging? You can send us raw logs if you like as well: https://www.highlight.io/docs/getting-started/backend-loggin...


Because in the industry "logging" has been used to refer to syslog and similar formats for many decades.


The new feature we're announcing is the server-side + browser-side logs ingest, query interface, and alerting engine. Our web session recording product is something we've had around but now closely integrates with logs to help you debug.


How does it compare to OpenReplay ?

> Before deciding on ClickHouse, we were planning to use OpenSearch

You should have tried Quickwit :)

Anyway, sounds like a great project, best of luck!

[1] https://github.com/openreplay/openreplay

[2] https://github.com/quickwit-oss/quickwit


Interesting, thanks for sharing. Do you implement your own storage with Quickwit or can it be backed by a cloud storage solution like S3?

Our session replay is similar to OpenReplay but we've focused a lot of effort on making a cohesive backend debugging experience. Highlight sessions give you backend error monitoring and logging out of the box to make it easy to get to the root cause of a bug.


I see, thanks for your answer.

> Do you implement your own storage with Quickwit or can it be backed by a cloud storage solution like S3?

I'm a cofounder of Quickwit and I did not use OpenReplay directly, I just know OpenReplay used ClickHouse + Quickwit for search. I don't know if this is a common setup and if it was just used for very specific use cases. You should ask them :)


Great progress on this front. I will try it out soon. One question: is the self hosted version meant to be used deploying a local instance of clickhouse, or by connecting to managed clickhouse cloud [or both are possible]?


By default, our hobby deployment reads from a local copy of clickhouse. If you want to point it to a cloud-hosted version, it would be pretty easy to adjust environment variables accordingly.

https://www.highlight.io/docs/getting-started/self-host/self...

If you're interested in doing that, feel free to message us in our community.


fyi: your graphic has a typo in it. You have "The open-source, fullstack monitoring platform" correct in your headline but the graphic has a comma after "fullstack." May want to fix that.

Also would recommend against "read more here," instead simply link the words "Session Replay" etc. This is friendlier to screenreader users in particular but also everyone, as readers are drawn to the blue highlighted text and you'd rather that they see the important word first, and not "here" first.


Thanks for the feedback. Fixed here: https://github.com/highlight/highlight/pull/5085


Do you have any plans in ingest trace/span data in future?


Yes, we do. It's on our public roadmap: https://github.com/orgs/highlight/projects/11/views/1 (marked as "Future Work")

We'll likely get to it by Q3 of this year. We hope that the design choices we're making (Clickhouse, OTEL, etc..) will set us up for supporting tracing easily when we get to it.


I would absolutely try this out but elixir support is a must. Is there anything in progress?


Just added an issue: https://github.com/highlight/highlight/issues/5082

We don't have pointed plans for supporting Elixir, but hopefully this opens the floor for anyone interested.


For those who are lost, ClickHouse is a column-oriented database. This person built a Docker container that can copy logs into that database.


That's one way to summarize it :)

We do quite a bit more though with session replay and error monitoring for your full-stack web apps.


Thank You for adding that.

I had to look some things up to figure out what the conversation was about.


you can't keep a raw stream of logs in an indexed database like clickhouse usefully

volume for any nontrivial organization is too large


Log storage is a standard use case for ClickHouse and has been for years. Our company (Altinity) current hosts or supports numerous online services that store and query logs. The standard implementation approach is to store log messages in one column and use the others as indexes on interesting properties such as time, service name, transaction ID, host name, etc. You can then build a log viewer that implements slicing & dicing queries to locate interesting messages. ClickHouse is much faster and more cost-efficient than competing solutions like Loki or ElasticSearch.

Log messages often compress very well (> 95%) so storage is not as much of an issue as you might think.

Disclaimer: I work for Altinity


What's the suggestion to do it efficiently?

And what kind of volume is it that ClickHouse can't handle when Uber can?

https://eng.uber.com/logging/


Clickhouse's data and control plane are well defined, so many folks end up using S3 (or something like it) as a backing store. From what we've heard, this is what clickhouse cloud does behind the scenes: https://clickhouse.com/cloud


Why is this linking to a random markdown page?



Apologies for the confusion. We meant to link to our main readme: https://github.com/highlight/highlight/blob/main/README.md




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: