Show HN: We built a ClickHouse-based logging service

mdaniel · on April 21, 2023

Wow, I can't wait to try this out, it could be a Sentry killer and doubly so given the friendly license

Also, thank you for introducing me to air (https://github.com/highlight/highlight/blob/sdk/highlight-go...) as that also looks super handy

p.s. for Show HN historians, here is the prior thread: https://news.ycombinator.com/item?id=34897645

mekster · on April 21, 2023

Why do you want to replace Sentry? It's mature and works well.

Not sure how the license makes a difference in terms of being a value to your day to day development.

But I do love projects that can be self hosted, so Sentry is very nice on that part, especially when it doesn't limit its capability on the self hosted version.

Also this one doesn't even have PHP sdk and the GitHub issue doesn't have much of a request for it, which sounds like it has still some way to go to be mature given the lack of interest in the project at this point.

Couldn't they just reuse other project's sdk or help with OpenTelemetry? Developing yet another batch of sdk for a dozen language seems like wasted time.

But I do hope we get a nice open source (self hostable) tool that can do what it claims (error reporting, tracing, logging, metrics) because there's no such thing that does all well at the moment that's mature.

podoman · on April 21, 2023

> But I do hope we get a nice open source (self hostable) tool that can do what it claims (error reporting, tracing, logging, metrics) because there's no such thing that does all well at the moment that's mature.

Keep an eye out for Highlight then!

> Couldn't they just reuse other project's sdk or help with OpenTelemetry? Developing yet another batch of sdk for a dozen language seems like wasted time.

We use OpenTelemetry for all of our SDKs. All we do is thinly wrap the SDK so developers don't have to deal with OpenTelemetry internal if they don't want to.

There's a doc on it here: https://www.highlight.io/docs/general/company/open-source/co...

mekster · on April 21, 2023

Thank you for the response and sorry for not doing the homework there.

Looking forward to seeing the project take off as we need to glue stuff together from various projects to achieve your goal now.

podoman · on April 21, 2023

No worries, and we appreciate it.

tnolet · on April 21, 2023

Isn’t sentry also open source and also using clickhouse at least in some parts?

mdaniel · on April 21, 2023

yes to the second https://github.com/getsentry/self-hosted/blob/23.4.0/docker-... and only after the embargo is over to the first: https://github.com/getsentry/self-hosted/blob/23.4.0/LICENSE...

I also miss the "good old days" when running sentry was like 3 containers, not the 32 of modern Sentry

tnolet · on April 21, 2023

Yeah, fair point. Sentry has added quite some features and increased complexity. Probably for good reasons like serving large customers. But still

podoman · on April 21, 2023

fwiw, we don't really see a need for more than a handful of containers, even long term. we want to make it easy to self-host Highlight. nonetheless, sentry is a reputable project. they deserve the success they've had so far.

podoman · on April 21, 2023

Thanks, appreciate the kind note on the license. And yeah, air is really nice; we've been using it since the beginning. Gives us a "JS hot-reloading" like experience with a server-side app.

liketochill · on April 21, 2023

Did you consider Victoria metrics at all? It is based on clickhouse design. I am considering it for time series type historian so different Application but I think it could work where clickhouse works too.

vadman97 · on April 21, 2023

Haven't heard of it unfortunately. What kind of time series data are you thinking of storing? If you're building an application on top of it, you're better off using an OLAP DB like ClickHouse, InfluxDB, etc. to have access to lower-level constructs.

liketochill · on April 22, 2023

My time series are from PLC in hydro electric power stations. Usually about 1000 series per turbine and another 1000 for general plant or river stuff. So it is just timestamp and bool, 16 or 32 bit u/int, or 32 bit float. Will just use grafana as front end. Seems to replace industry heavyweight offerings such as Osisoft Pi.

hodgesrm · on April 22, 2023

For those who have not used VictoriaMetrics, it is a high performance time series database that can drop-in replace Prometheus. We host ClickHouse at our company but use VictoriaMetrics for long term storage of prometheus metrics across tenants. It's fast, solid, and super economical to run.

If you want to learn more here's a webinar I did with Roman Kavronenko at VictoriaMetrics that compares it with ClickHouse.

https://www.youtube.com/watch?v=sCrdp8hIhJM

Disclaimer: I work for Altinity.

mekster · on April 21, 2023

Does it work well to store logs? I thought it's great for metrics.

potamic · on April 22, 2023

Clickhouse has been around for longer and battle tested at Yandex. In what cases might VictoriaMetrics be a better fit?

podoman · on April 21, 2023

Have not seen Victoria! Seems like they don't have a cloud offering though?

nvartolomei · on April 22, 2023

They do have a "managed" offering (on AWS): https://victoriametrics.com/products/managed/.

podoman · on April 23, 2023

Very cool. My mistake for overlooking that. Victoria metrics seems more like a metrics store? Something comparable to influx? Is that right?

hagen1778 · on April 24, 2023

Correct! It is a TimeSeries database for storing metrics. It is open source, including clustered version, and contains many additional tools for alerting, collecting and visualizing metrics. There are many in VictoriaMetrics community who switched from Influx to VictoriaMetrics due to performance reasons. See also the following doc comparing differences between these two here https://docs.victoriametrics.com/guides/migrate-from-influx....

Disclaimer: I'm one of the VictoriaMetrics maintainers.

pech0rin · on April 21, 2023

“Deploy a hobby instance in one line on Linux with Docker (recommended 16 CPU cores, 32GB RAM, 256GB disk”

Cant say I would call these specs “hobby” at all

vadman97 · on April 21, 2023

Apologies for the inconsistency here. Our Docker resource requirement recommendations were out of date after some recent improvements (https://github.com/highlight/highlight/pull/5074 and https://github.com/highlight/highlight/pull/4993).

Just updated this: 8GB of RAM, 4 CPUs, and 64 GB of disk space.

that_guy_iain · on April 21, 2023

8GB of ram? For a hobby? Very resource intensive, eh?

JP44 · on April 21, 2023

I agree with your sentiment here. imo they chose the wrong words to describe what they meant e.g. ~0.5-1GB memory usage is more like a hobby-setup (assuming you run it on the same hw as the services you monitor).

However, after scrolling through the GitHub page, I feel like this is not a service which aims at people (might be completely mistaken) who either have a small set of services to monitor (and/or understand the logs and/or the interest to do it) or have their homelab at a relatively low financial priority (1x 8GB 2400 CL17 is €15 here).

8GB for a single service in a (home) environment is, imo, still a lot, but I think it is a sort-of reasonable figure for what it does and need to do to make that happen

vadman97 · on April 21, 2023

We have had folks in our Discord successfully run highlight on a Raspberry Pi with 4GB RAM, so our recommendation is definitely on the safe side. We're running multiple infra services in the docker stack (postgres, opensearch, clickhouse, influxdb, kafka, redis) that we would look to consolidate in the future to help with running on leaner instances.

redskyluan · on April 21, 2023

Totally agree with that. 8G might be too much to make the open source product popular.

I would say 4G may more sense to me, I know how much engineering effort it requires though. LOL

wvh · on April 22, 2023

Unlikely to happen if there are still Java services running in that stack. For instance ElasticSearch/OpenSearch is good at what it does (which is full-text search) but pumping massive amounts of logs into it is never going to be a light solution. Solutions such as Loki which just index tags and dump the raw content into an object storage bucket are the cheapest, and I guess ClickHouse ends up somewhere in the middle between those two depending on how well you configure it to fit the data. Or vice versa how well you can configure the data to fit the technical solution.

podoman · on April 23, 2023

This is one of the reasons we’re considering moving off opensearch.

JimmyAustin · on April 21, 2023

Just about anything with only one instance is essentially hobby.

KronisLV · on April 21, 2023

> Just about anything with only one instance is essentially hobby.

Not high availability? Sure.

However, I've seen software out there that ran as a monolith with a single deployment unit, facilitated lots of business processes and entire teams of people for continued development. Not everything necessarily needs high uptime, either. Some software can also serve particular time zones and have ample windows for scheduled maintenance, upgrades and so on.

There's probably at least a few classifications between hobby projects on one end and HA distributed systems on the other.

>> Cant say I would call these specs “hobby” at all

With this, however, I'm inclined to agree. In my eyes, "hobby" would imply something more along the lines of: "Just give this half a CPU core and about 512 MB of RAM, maybe up to a GB of storage depending on what you'll use it for, it'll probably work well enough for a few users."

Some software that mostly fits that definition, in my experience: Nextcloud, Apache2/Nginx/Caddy, Grav, Mattermost, Gitea, Heimdall, YOURLS, PrivateBin, phpBB, Uptime Kuma, Zabbix, PostgreSQL, MySQL/MariaDB, Redis, RabbitMQ, Docker Swarm and plenty others.

Some software that needs more resources: SonarQube, PeerTube (for encoding), OpenProject (Ruby app), Sonatype Nexus (bloated Java app, but lots of functionality), Matomo (issues with displaying historic data with low resources), BackupPC (compresson of backups), K3s and other Kubernetes cluster distros and plenty others, too.

Not to say that it somehow makes the software worse, just that people have different expectations. Perhaps more realistic expectations on my part for hobby software should be: "You should be able to launch it with whatever spare resources your laptop has."

mrits · on April 21, 2023

Hard disagree. A hobby instance should be able to be spun up in the cloud on a free or near free tier.

codegeek · on April 21, 2023

How do you compare against something like Signoz (YC Backed). You should probably add them to your competitors list since they are also Open Source APM.

vadman97 · on April 21, 2023

Highlight is a full-stack observability platform, so our recordings start from a user's frontend session and associate server-side errors and logs to provide debugging context. We've been building quite a lot to bolster the backend observability use case, including releasing the new logging product. We can't offer feature parity for backend traces and metrics yet, but we're planning to get there by the end of the year.

derefr · on April 22, 2023

Is this just a "monitoring" platform, or does this also intend to make use of Clickhouse as a "log analysis" platform ala Mozilla Hindsight?

Which is to say: is there some obvious hook-point to add watcher jobs (as Clickhouse stored procedures, maybe?) to process correlations in the inputs to the system? (Where by "watcher jobs", I mean things like "create a record in a table if a user makes requests across endpoints A, B, C, in that order — with other arbitrary requests from that user in between — in a five-minute sliding window, at some point within 12 hours of the user's registration.")

Using a DB as a log-analysis system would be pretty great, if it was practicable, as you'd be able to correlate present events with events from the distant past (or with statistical aggregations of all of history up to the present), rather than having to build your correlations only from what can be buffered in memory.

But most log-analysis platforms need extremely high event-processing-job performance to scale — Mozilla Hindsight, mentioned before, is a rewrite in C+Lua of a previous system (Mozilla Heka) where the fundamental bottleneck was the Golang runtime. I'd be curious to know whether Clickhouse sprocs/triggers/etc have been tuned to function at that sort of scale...

podoman · on April 22, 2023

> Which is to say: is there some obvious hook-point to add watcher jobs (as Clickhouse stored procedures, maybe?) to process correlations in the inputs to the system? (Where by "watcher jobs", I mean things like "create a record in a table if a user makes requests across endpoints A, B, C, in that order — with other arbitrary requests from that user in between — in a five-minute sliding window, at some point within 12 hours of the user's registration.")

This is not something we have planned, nor have we heard yet. Is the use case for doing analytics on your web application? Or more of a complex tracing use case that creates new traces as time passes by?

Fwiw, Highlight is designed (in the current state) for basic log search and alerts.

derefr · on April 22, 2023

In our case: fraud detection.

We want to notice users who we ban for breaking the Terms of Use of our API SaaS, who then create new accounts and immediately resume doing the same thing they were doing before they were banned — where the metadata is all different (they're rotate VPNs, get a new [stolen] credit card, etc) but where there is a distinctive "activity-pattern fingerprint" to their ToU-breaking activity (different per violating user, but the same between the "incarnations" of the same user); and where having that fingerprint in the context of a brand new user is implausible, since nobody could learn to use our API to do such a complex thing so quickly.

podoman · on April 23, 2023

Oh very interesting. I don't think we're going to build something for that use case specifically. But long term, we do plan to build a metrics product, which you could use to analyze these patterns. It's interesting to hear what people use these sorts of product for. Thanks for sharing.

metahunter · on April 21, 2023

I am curious about the motivation you choose Clickhouse over Apache Pinot, and Apache Druid? It could be helpful for other folks when choosing the OLAP db from one of them.

vadman97 · on April 21, 2023

For us, a significant reason was the ClickHouse cloud-hosted offering, rather than having to manage a cluster ourselves. Their use of S3 as the backing storage medium means that large-scale data retention is quite affordable.

A good comparison we've referenced: https://leventov.medium.com/comparison-of-the-open-source-ol...

grumblestumble · on April 21, 2023

For reference, Apache Druid has an equivalent in Imply Polaris, and Apache Pinot has an equivalent in Startree. I can't speak for Startree, but Polaris similarly uses S3 for backing.

dimitrios1 · on April 21, 2023

When I was highly engaged with Imply (Druid) a few years ago, S3 was also used as a backing storage. Is this not the case anymore?

metahunter · on April 21, 2023

I think both Pinot and Druid nowadays offer cloud-hosted solutions. Maybe you started early that only ClickHouse had that offering. Is cloud hosting the only reason you guys choose Clickhouse? I am also wondering is it possible to let users choose the data source?

tnolet · on April 21, 2023

Slight hijack. I / we went through a very similar tech selection process for timeseries metrics (not logging) ~1.5 years ago. We looked at Druid, ElasticSearch, TimeScale and a bunch of others.

Main takeaways were: the SQL flavor and its aggregations in CH are amazing. Running on a single node for dev laptops is trivial. It’s crazy fast with almost zero tuning.

It does not surprise me at all the CH is powering new products and startups.

Note: hosted CH did not exist yet. We are using Altinity to run our cluster.

podoman · on April 21, 2023

> Note: hosted CH did not exist yet. We are using Altinity to run our cluster.

It exists now actually. We (highlight) are on hosted clickhouse, which went in GA a few months ago. https://clickhouse.com/cloud

hodgesrm · on April 22, 2023

Thanks for the shout out! "Altinity" in this case means Altinity.Cloud, which is a high-performance cloud ClickHouse. It's been around for over 2.5 years.

Disclaimer: I work at Altinity.

preseinger · on April 21, 2023

if you can afford SQL then you're not really doing timeseries in any meaningful sense

berkle4455 · on April 21, 2023

Clickhouse is fast and doesn’t have absurd architectural complexity.

podoman · on April 21, 2023

frankjr · on April 21, 2023

Not having to deal with a JVM is a major plus tbh.

douglasisshiny · on April 21, 2023

I've seen so many variations of this comment on HN and I'm still not sure why not having to deal with the JVM is a major plus.

wpietri · on April 21, 2023

I'm perfectly fine with JVMs, but at a guess, some of it is the usual snobbery for anything strange. But some of it is due to associating JVMs with enterprise nightmares. And some is that JVM tuning is a bit of a dark art. I've made some very good money going in and turning JVM knobs that others were afraid to touch. (The secret, by the way, is to hack together some decent load simulation and then measure not just median numbers but things like 99th percentile latency.)

pstoll · on April 22, 2023

Have you ever operated a fleet of critical JVM instances and needed more memory? Don’t go over 32GB ram in an instance or the operating characteristics of you entire app change. Compressed memory pointers - oops. They are a blast to debug / operate!

https://stackoverflow.com/questions/25120546/trick-behind-jv...

vetrom · on April 21, 2023

JVM runtimes have a relatively high startup cost, are not often good 'citizens' in an instance running multiple types of software, and the build processes for a lot of JVM deliverables is an ungodly mess.

Many of those bells and whistles are near-necessary in the enterprise world, but you have the accumulated mass of 'red zones' and developmental landmines in that ecosystem that can quickly turn you off it as a whole if you want to understand the whole system.

douglasisshiny · on April 21, 2023

I still don't understand some of this -- I developed in Java for 5+ years.

>JVM runtimes have a relatively high startup cost I think many people are okay with that when developing server software that's going to run weeks at a time. It can get a bit annoying with trying to rapidly iterate. And I think things are changing pretty quickly with AOT builds and general improvements.

>and the build processes for a lot of JVM deliverables is an ungodly mess.

I recall using "mvn package." That's it. This was on two different systems that served a good bit of traffic and weren't simple trivial projects.

jeroenhd · on April 21, 2023

I don't know if it's a standard Java thing or just an IntelliJ thing, but there's a setting that will hot-patch a running JVM when you change code. Things can get messy if you (or your dependencies) make assumptions about the ClassLoader being used, but other than that it works great.

Still not as good as C#'s debugger in Visual Studio (hit a breakpoint, edit the code, drag the execution back before the problem, resume and run the patched version) but nothing I've seen really is.

Setting up Gradle projects is a bit more involved depending on your setup, but in the end it's still a single command to build an executable JAR.

douglasisshiny · on April 21, 2023

Yeah, it's been a second since I've used IntelliJ/Spring, but I recall that being the case as well.

Gradle is something I've never messed with, but that makes sense.

drowsspa · on April 21, 2023

I take it you haven't experienced the hell that is to deal with Hadoop JARs. It's absolutely ridiculous.

drowsspa · on April 21, 2023

Having to worry about GC in a database is a pretty bad experience. It also tends to require way more resources than necessary, and just a pretty complex configuration

preseinger · on April 21, 2023

gc isn't the issue, the jvm is the issue

preseinger · on April 21, 2023

basically, the jvm is technically sophisticated but operationally complicated

it sucks to use

many people believe otherwise, but those people have rich jvm experience, which is not easy to get

drowsspa · on April 21, 2023

Druid has like 9 different node types and inherits the whole Hadoop configuration mess and complexity

grumblestumble · on April 21, 2023

3, and there's absolutely no need for hadoop, particularly with MSQ

tnolet · on April 21, 2023

Anecdata: tried out Druid and Clickhouse for my SaaS. Couldn’t get Druid working. CH ran in 2 minutes.

metahunter · on April 21, 2023

Interesting, just found another post from yesterday about the comparison: https://news.ycombinator.com/item?id=35642522, though the comparison is coming from Pinot team.

iudqnolq · on April 21, 2023

I suggest adding a pricing tier between $0 and $50/month. I nearly bounced until I saw you also offer very reasonable usage-based billing. You could just calculate what $5/month of usage-based billing could buy and list it as a tier, but having the lowest tier be $50 sends an incorrect signal that this isn't for hobbyists.

podoman · on April 21, 2023

That's good feedback. We'll update the pricing page accordingly when the time comes. Question: do you worry about usage based billing with respect to surprise charges? Or do you expect a way to limit this?

pdimitar · on April 21, 2023

Both. IMO you should have a plan where the user pays e.g. $7 and when the resources for it are drained, you start refusing requests until the throttle period expires.

It's extremely useful to prototype and experiment with a project and have it have a total budget that will not be surpassed.

One more idea: pay as you go. I pay $10, that turns out to be not enough, I pay another $20 and get immediately unblocked.

podoman · on April 23, 2023

Makes sense. We're going to do #1 for sure. #2 is more difficult because we have to be cognizant of or internal costs. But will share with the team nonetheless. Thank you.

iudqnolq · on April 21, 2023

No, I'm perfectly happy with usage-based billing. But when I see the cheapest option costing $50 my heuristic is that there probably isn't an affordable other option. I was on mobile, so the usage-based section was below the fold.

Limiting would be ideal, but all I was suggesting is that you indicate above the fold that you have cheaper options

podoman · on April 23, 2023

Sounds good, will update copy accordingly. Thanks for the feedback.

anileated · on April 21, 2023

Intuitively it seems like logging and error reporting software would be prime targets for supply chain attacks, as they have access to a treasure trove of exploitable data. A strategic modification in some tiny transitive dependency or two and your logging server could log not just for you. Presumably, gathered data could be used to deanonymize users or exploit your software.

How do people go about vetting and/or self-hosting those, if you operate [software for] a business with actual customers? Is stripping sensitive data at the client enough? Do you lock down outgoing connections through external networking configuration if you self-host? Am I being too paranoid?

d4mi3n · on April 21, 2023

This is an interesting topic I've explored a bit. The tldr; is that if your business can't afford or recover from a vendor having an incident, you shouldn't be using a vendor. This is often why highly regulated industries are slower moving and more expensive to operate: You need to either spend a lot of time vetting your vendors with policy (via contracts, compliance requirements, certification requirements) and practice (sign NDAs and review source code for components of interest, run joint penetration tests, fund bug bounty programs).

That said, there are mitigations you can take. There are end-to-end encrypted log solutions out there. Honeycomb.io used to have (they might still?) an interesting offering I used at one employer to encrypt sensitive fields in logs leaving our infrastructure. They had the UI set up to talk to our encryption service and decode things on the fly in the user's browser-side UI so that they (Honeycomb) never had direct, unfettered access to sensitive data.

There are other approaches you can take, but things get tricky when you either need to audit your vendor's access to your data or assume that your vendor can't secure your data to your satisfaction. Better to do it yourself at that point if you have the resourcing to do so.

0x706B · on April 21, 2023

Is this like schema-less? How do you do indexing for individual log fields and stuff?

SergeAx · on April 22, 2023

They put log attributes into map and then index that map. See https://www.highlight.io/blog/how-we-built-logging-with-clic... (search for CREATE TABLE).

vadman97 · on April 21, 2023

You get structured attribute search without a schema with Highlight. More in our docs: https://www.highlight.io/docs/general/product-features/loggi...

lysecret · on April 21, 2023

I started using BigQuery with stream inserts for logging. I have around 300k inserts a day und it costs a whole cent a day for writes and even less for reads (I don’t read much though).

SergeAx · on April 22, 2023

With 3.5 inserts/sec you may equally use SQLite.

vadman97 · on April 21, 2023

What does you ingest setup look like with BigQuery? Are you using something like fluentd to pipe logs over?

bbu · on April 21, 2023

How does this compare to Loki?

podoman · on April 21, 2023

In isolation (just our logging product), Loki is comparable. It would be interesting to do a benchmark on Loki and see what a comparison looks like. Beyond logging, we do session replay and error monitoring and tie all of these things together.

mekster · on April 21, 2023

I think what matters is the interface. A clean interface to read structured logs in table view than throw raw text at the user like many products do and nice way to filter them.

Loki + Grafana isn't really good for log viewing at all. I use Metabase to read logs sent to ClickHouse which gives far nicer interface.

podoman · on April 22, 2023

Whoa. Didn't know folks did that. I always thought of metabase as BI tool, but the table UI is quite nice.

goodpoint · on April 21, 2023

This is not logging. It's web session telemetry.

semi-extrinsic · on April 21, 2023

As a non-web developer I was also confused about the generic terms like "monitoring" etc. and had to scroll a bit before I realized it's just for web stuff.

podoman · on April 21, 2023

That's good feedback. Thanks. What would you prefer we write? Just mentioned "Web" somewhere in the headline?

harpratap · on April 22, 2023

Would be better to use OpenTelemetry vocabulary. This seems to be RUM product - https://github.com/open-telemetry/oteps/issues/169

podoman · on April 21, 2023

Why isn't it logging? You can send us raw logs if you like as well: https://www.highlight.io/docs/getting-started/backend-loggin...

goodpoint · on April 22, 2023

Because in the industry "logging" has been used to refer to syslog and similar formats for many decades.

vadman97 · on April 21, 2023

The new feature we're announcing is the server-side + browser-side logs ingest, query interface, and alerting engine. Our web session recording product is something we've had around but now closely integrates with logs to help you debug.

francoismassot · on April 21, 2023

How does it compare to OpenReplay ?

> Before deciding on ClickHouse, we were planning to use OpenSearch

You should have tried Quickwit :)

Anyway, sounds like a great project, best of luck!

[1] https://github.com/openreplay/openreplay

[2] https://github.com/quickwit-oss/quickwit

vadman97 · on April 21, 2023

Interesting, thanks for sharing. Do you implement your own storage with Quickwit or can it be backed by a cloud storage solution like S3?

Our session replay is similar to OpenReplay but we've focused a lot of effort on making a cohesive backend debugging experience. Highlight sessions give you backend error monitoring and logging out of the box to make it easy to get to the root cause of a bug.

francoismassot · on April 22, 2023

I see, thanks for your answer.

> Do you implement your own storage with Quickwit or can it be backed by a cloud storage solution like S3?

I'm a cofounder of Quickwit and I did not use OpenReplay directly, I just know OpenReplay used ClickHouse + Quickwit for search. I don't know if this is a common setup and if it was just used for very specific use cases. You should ask them :)

mahesh_rm · on April 22, 2023

Great progress on this front. I will try it out soon. One question: is the self hosted version meant to be used deploying a local instance of clickhouse, or by connecting to managed clickhouse cloud [or both are possible]?

podoman · on April 22, 2023

By default, our hobby deployment reads from a local copy of clickhouse. If you want to point it to a cloud-hosted version, it would be pretty easy to adjust environment variables accordingly.

https://www.highlight.io/docs/getting-started/self-host/self...

If you're interested in doing that, feel free to message us in our community.

RheingoldRiver · on April 22, 2023

fyi: your graphic has a typo in it. You have "The open-source, fullstack monitoring platform" correct in your headline but the graphic has a comma after "fullstack." May want to fix that.

Also would recommend against "read more here," instead simply link the words "Session Replay" etc. This is friendlier to screenreader users in particular but also everyone, as readers are drawn to the blue highlighted text and you'd rather that they see the important word first, and not "here" first.

podoman · on April 22, 2023

Thanks for the feedback. Fixed here: https://github.com/highlight/highlight/pull/5085

glenjamin · on April 21, 2023

Do you have any plans in ingest trace/span data in future?

podoman · on April 21, 2023

Yes, we do. It's on our public roadmap: https://github.com/orgs/highlight/projects/11/views/1 (marked as "Future Work")

We'll likely get to it by Q3 of this year. We hope that the design choices we're making (Clickhouse, OTEL, etc..) will set us up for supporting tracing easily when we get to it.

7sidedmarble · on April 21, 2023

I would absolutely try this out but elixir support is a must. Is there anything in progress?

podoman · on April 21, 2023

Just added an issue: https://github.com/highlight/highlight/issues/5082

We don't have pointed plans for supporting Elixir, but hopefully this opens the floor for anyone interested.

RecycledEle · on April 21, 2023

For those who are lost, ClickHouse is a column-oriented database. This person built a Docker container that can copy logs into that database.

vadman97 · on April 22, 2023

That's one way to summarize it :)

We do quite a bit more though with session replay and error monitoring for your full-stack web apps.

RecycledEle · on April 22, 2023

Thank You for adding that.

I had to look some things up to figure out what the conversation was about.

preseinger · on April 21, 2023

you can't keep a raw stream of logs in an indexed database like clickhouse usefully

volume for any nontrivial organization is too large

hodgesrm · on April 21, 2023

Log storage is a standard use case for ClickHouse and has been for years. Our company (Altinity) current hosts or supports numerous online services that store and query logs. The standard implementation approach is to store log messages in one column and use the others as indexes on interesting properties such as time, service name, transaction ID, host name, etc. You can then build a log viewer that implements slicing & dicing queries to locate interesting messages. ClickHouse is much faster and more cost-efficient than competing solutions like Loki or ElasticSearch.

Log messages often compress very well (> 95%) so storage is not as much of an issue as you might think.

Disclaimer: I work for Altinity

mekster · on April 21, 2023

What's the suggestion to do it efficiently?

And what kind of volume is it that ClickHouse can't handle when Uber can?

https://eng.uber.com/logging/

podoman · on April 21, 2023

Clickhouse's data and control plane are well defined, so many folks end up using S3 (or something like it) as a backing store. From what we've heard, this is what clickhouse cloud does behind the scenes: https://clickhouse.com/cloud

captaintobs · on April 21, 2023

Why is this linking to a random markdown page?

dang · on April 21, 2023

Changed from https://github.com/highlight/highlight/blob/main/docs-conten... now. Thanks!

vadman97 · on April 21, 2023

Apologies for the confusion. We meant to link to our main readme: https://github.com/highlight/highlight/blob/main/README.md