Open Source Text-to-Speech and Speech-to-Text on Android?

andrew0@lemmy.dbzer0.com · 7 days ago

At the same time, we will give better access for American products in our market.

Yeah, that’s a no from me. I will continue to boycott anything that is American, except for things that really have no alternative. Hopefully more people feel the same and cut off the money flow to Amerikkka.

andrew0@lemmy.dbzer0.com · 24 days ago

This is true only if the decisions were made independently. If you allow people to make a decision after they’ve seen the metrics, this no longer holds.

Here’s an example of the first. You go at a farmer’s market with a cow and you ask everyone to write on a piece of paper what they think the weight is. If you get the replies and average them, you will find that the mean of all answers will be quite close to the real answer. A mix of non-experts and experts will iron out a good answer somehow.

Now take the average experience of going to a restaurant. One might have just opened recently, has great food and great staff, but only 5 reviews, at an average of 3.8 or something. Another restaurant nearby has been open for 3-4 years, and has 1000 reviews, at maybe 3.9. People will usually follow the one with more reviews because they think it’s the safer option due to the information available. However, if you were to hide this and ask them to choose by just looking at the venue and the menu, they would probably choose the first one.

Group dynamics are quite interesting, and the psychology behind this is quite funky sometimes :D

andrew0@lemmy.dbzer0.com · 29 days ago

Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

Good luck!

andrew0@lemmy.dbzer0.com · 29 days ago

If you find that OCR doesn’t get you very far, maybe try a small vLM to parse PNGs of the pages. For example, Nanonets OCR will do this, although quite slow if you don’t have a GPU. It will give you a Markdown version of the page, which you can then translate with another tool.

PaddleOCR might also be useful, since it focuses on Chinese, but it’s more difficult to set up. To add to this, some other options are MinerU and MistralOCR (this is paid, but you can test it for free if you upload it in Mistral’s library).

andrew0@lemmy.dbzer0.com · 30 days ago

Sure, if all politicians make all their data available to the public. Their phone chat messages, photos taken, everything.

No…? Then don’t bring it up ever again. Initiatives like these will only make it look like you’re a villain if you want privacy.

andrew0@lemmy.dbzer0.com · 1 month ago

I was hoping a dbzer0 piefed instance would happen sometimes in the future! I would totally use it, since it has some pretty cool features that Lemmy has been quite slow in implementing. For example, merged communities that cover one specific topic.

andrew0@lemmy.dbzer0.com · 2 months ago

All the ones I mentioned can be installed with pip or uv if I am not mistaken. It would probably be more finicky than containers that you can put behind a reverse proxy, but it is possible if you wish to go that route. Ollama will also run system-wide, so any project will be able to use its API without you having to create a separate environment and download the same model twice in order to use it.

andrew0@lemmy.dbzer0.com · 2 months ago

Ollama for API, which you can integrate into Open WebUI. You can also integrate image generation with ComfyUI I believe.

It’s less of a hassle to use Docker for Open WebUI, but ollama works as a regular CLI tool.

andrew0@lemmy.dbzer0.com · 2 months ago

You really think this is all him? Guy’s got a full team running the media. The Heritage Foundation’s got its grimy hands all over the country.

andrew0@lemmy.dbzer0.com · 3 months ago

Look into Cosmic DE. It has a similar vibe if you set up tiling, but without all the headache of configuring all the components so that it is usable.

andrew0@lemmy.dbzer0.com · edit-2 3 months ago

Didn’t this guy just go on French national TV and say that Macron is a dictator?

I think if this guy gets voted in, Romania might say bye bye to its EU funds.

andrew0@lemmy.dbzer0.com · edit-2 3 months ago

I heard that Poland is also cheering for some MAGA guy in the next election… Troubling times ahead.

For Romania, there might still be a chance in the run-off. However, the difference between the two candidates was quite large (20% difference; 1.8 million votes). Similarly, the other candidates seemed to have voters that would rather vote for the nazi. Most likely all hope is lost, but that 1% chance is still there.

andrew0@lemmy.dbzer0.com · 3 months ago

You’re right! Sorry for the typo. The older nomic-embed-text model is often used in examples, but granite-embedding is a more recent one and smaller for English-only text (30M parameters). If your use case is multi-language, they also offer a bigger one (278M parameters) that can handle English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified). I would test them out a bit to see what works best for you.

Furthermore, if you’re not dependent on MariaDB for something else in your system, there are also some other vector databases I would recommend. Qdrant also works quite well, and you can integrate it pretty easily in something like LangChain. It really depends on how much you want to push your RAG workflow, but let me know if you have any other questions.

andrew0@lemmy.dbzer0.com · edit-2 3 months ago

Have a look at Ollama embeddings. Easy to set up and the models are much smaller than a typical LLM.

andrew0@lemmy.dbzer0.com · 3 months ago

No one is stopping them from buying meat though? There are still big sections full of it in stores that you see whenever you go grocery shopping.

Heck, I would ban the ads on all food products if it were me. Same with pharmaceuticals.

andrew0@lemmy.dbzer0.com · 3 months ago

My bad, I initially read that we should give Florida to Russia in exchange for Ukraine getting back Crimea, haha. I said giving Alaska instead because it was also part of Russia before, so they could spin it off the narrative similarly to what they did to Ukraine.

But yeah, I don’t think the clown in the White House can really understand the situation unless you put it in perspective for him.

andrew0@lemmy.dbzer0.com · 3 months ago

Why not give Alaska back to Russia? You know, if we’re at the stage where we say we cede territories…

andrew0@lemmy.dbzer0.com · 3 months ago

It would be a shame if someone were to make a post with their office locations across Europe and share it in all the European communities on Lemmy…

andrew0@lemmy.dbzer0.com · 3 months ago

What’s the plan against this? It’s pretty clear that this type of grifting works. Hungary kept Orban in power for far too long, and now Romania might be next.