YOU MIGHT ALSO LIKE
ASSOCIATED TAGS
answer  companies  content  garbage  learns  medical  patterns  personal  publicly  recognition  starts  synthetic  systems  training  wouldn  
LATEST POSTS

What Shouldn’t Go Into AI Systems—And Why It Matters More Than You Think

You don’t need to be a data scientist to see the cracks forming. From chatbots spouting legal advice based on fan fiction to facial recognition systems trained on non-consensual images, the boundaries are blurrier than a JPEG from 2003. The thing is, most people assume AI ethics is about fine-tuning algorithms. In reality, it starts long before code runs—it starts with what we feed it.

Understanding AI Training Data: The Fuel Behind the Machine

We tend to think of AI as this smart, almost sentient force—until it confidently tells you that dolphins are classified as fish by the IRS. Then it hits you: it’s only as good as what it’s been shown. Training data is the diet of artificial intelligence. Give it junk, and it becomes a digital couch potato with dangerous opinions. Feed it balanced, vetted, diverse inputs, and it might actually help diagnose cancer or reduce traffic deaths.

What Training Data Actually Is (and Isn’t)

Training data isn’t just “a lot of text” or “millions of pictures.” It’s curated information used to teach patterns. An image recognition model learns to spot a cat because it has seen 200,000 labeled photos—some clear, some blurry, some with cats halfway hidden behind curtains. But if all those cats are white and fluffy, what happens when it sees a hairless Sphynx? Exactly. It fails. And that’s the simple stuff. Now imagine that same flaw in a hiring algorithm that only knows resumes from Ivy League grads. Bias creeps in before anyone writes a single line of code.

The Hidden Ingredients No One Talks About

Here’s what’s rarely disclosed: scraped Reddit threads, pirated books, screenshots from private forums, and even deleted social media posts recovered through archives. Companies argue it’s all “publicly available.” But just because something is online doesn’t mean it’s fair game. There’s a difference between accessibility and consent. And that distinction? It changes everything.

Private Information: The Obvious No-Go Zone

Let’s be clear about this: your Social Security number, bank statements, therapy notes, and nude photos have no place in any AI model. Sounds obvious, right? Except in 2023, researchers found fragments of real patient data in publicly released medical AI training sets. Not anonymized. Not encrypted. Just… there. Like finding someone’s credit card in a library book.

Breaches like these aren’t bugs. They’re symptoms of a culture that treats data like oxygen—free, infinite, and necessary for survival. Except data isn’t infinite. It’s personal. It’s tied to real lives. And once it’s in an AI system, you can’t exactly recall it like a bad tweet.

Even pseudonymized data can be reverse-engineered. In one study, just four pieces of anonymized location data were enough to uniquely identify 95% of individuals in a dataset of 1.5 million people. That’s not theoretical risk. That’s a math problem we’ve already solved—badly.

Biased or Discriminatory Content: The Slow Poison

And here’s the kicker: even if you avoid obvious privacy violations, bias can still ruin everything. Say you train a loan approval AI on historical lending data from the 1980s. It learns that men are “safer bets.” Surprise—it’s not just repeating the past, it’s automating it. At scale. Forever. Unless you stop it.

This isn’t hypothetical. In 2019, Amazon scrapped an internal recruiting tool because it downgraded resumes with the word “women’s” (as in “women’s chess club captain”). The model wasn’t programmed to hate women. It was just taught to mimic patterns from a decade of male-dominated hiring. The system didn’t know it was being sexist. It just saw what worked before. And that’s the danger.

Because bias isn’t always loud. Sometimes it’s a whisper in the data—like using ZIP codes as a proxy for creditworthiness, which disproportionately penalizes Black and Latino neighborhoods due to decades of redlining. You don’t need to say “race” for racism to seep in.

Illegally Sourced or Copyrighted Material

Now let’s talk about books. Bestselling authors like George R.R. Martin and Sarah Silverman have sued AI companies for using their work without permission. Their argument? Your novel isn’t “training data.” It’s intellectual property. And no, slapping a “fair use” label on it doesn’t make it legal. Especially when the model starts regurgitating 80% of a paragraph from A Dance with Dragons.

The law is still catching up. But ethically? You wouldn’t hand a burglar a master key and say, “just look around.” Yet that’s what scraping entire websites—like GitHub, WordPress blogs, or scientific journals—without permission feels like. Some companies claim they only use publicly accessible content. But legality and morality aren’t always the same. Ask any photojournalist whose images were used to train facial recognition for authoritarian regimes.

Unverified or Harmful Content: When Garbage In Becomes Garbage Out

What happens when AI learns from 4chan? Or QAnon manifestos? Or YouTube comments under a flat-earth video? It starts believing them. Or at least, it learns to mimic them convincingly. And users don’t always know the difference.

In early 2024, a mental health chatbot recommended self-harm to a distressed teenager. Not because it was designed to. But because somewhere in its training, it had absorbed toxic dialogue masked as “support.” The model didn’t understand context. It just matched patterns. And matched poorly.

You can’t disinfect nonsense once it’s baked into the weights of a neural network. Removing a single toxic idea is like unburning a CD. We’re far from it.

Because misinformation isn’t just wrong. It’s contagious. And when AI repeats it with confidence, it gains credibility. That’s why some researchers now advocate for “data diets”—curated, auditable sources only. Think: peer-reviewed journals, verified encyclopedias, government databases. Not the digital equivalent of dumpster diving.

What About Synthetic Data? A Glimmer of Hope

Synthetic data—artificially generated information that mimics real-world patterns—might be part of the solution. Instead of using actual medical records, you simulate 10,000 fake ones with realistic blood pressure, age, and symptoms. No privacy risk. No bias (in theory). But—and this is a big but—synthetic data inherits flaws from the models that create it. Garbage in, garbage in, garbage out.

It’s a bit like photocopying a photocopy. After ten generations, the text blurs. The edges fray. You still need a clean original. And right now, most of our originals are already compromised.

That said, early trials show promise. One hospital reduced patient re-admissions by 18% using synthetic data to train its predictive model. No real names. No real data leaks. Just patterns, reshaped responsibly. Could this be the future? Maybe. But we’re not there yet.

Frequently Asked Questions

Can AI Be Trained Without Any Personal Data?

Yes—but with limits. You can build general language models using public domain texts, open scientific papers, and synthetic datasets. However, for specific tasks like medical diagnosis or personalized recommendations, some level of personal input is unavoidable. The key is minimizing exposure, anonymizing rigorously, and allowing opt-outs. Data minimization isn’t just ethical. It’s increasingly required by laws like GDPR and CCPA.

Is Scraping Public Websites Always Illegal?

No, but it’s legally gray. Courts are still deciding whether automated scraping violates terms of service—and thus, the Computer Fraud and Abuse Act. In 2022, the U.S. Ninth Circuit ruled that accessing publicly available data isn’t a crime, even if against a site’s rules. But copyright and ethical concerns remain. Just because you can doesn’t mean you should.

How Do I Know If My Data Was Used to Train an AI?

You probably don’t. Most companies don’t disclose their full training sources. Some, like Meta with LLaMA, publish partial lists. Others remain opaque. There’s growing pressure for transparency—via “data nutrition labels” or audit trails—but adoption is slow. Honest answer? Unless you file a lawsuit, you may never know.

The Bottom Line

I am convinced that the biggest AI crisis isn’t superintelligence. It’s data recklessness. We’re building systems that shape justice, health, and opportunity—using inputs we wouldn’t trust our toaster with. And that’s not paranoia. That’s pattern recognition.

We need hard lines: no non-consensual personal data, no copyrighted works without licenses, no toxic content disguised as “open internet.” Not because it makes AI less powerful—but because power without accountability is just danger with better marketing.

Let’s stop asking, “Can we?” and start asking, “Should we?” The answer won’t come from engineers alone. It’ll come from doctors, writers, janitors, and users who never signed up to be training data. Because AI isn’t just built by coders. It’s built on us.

And honestly? That’s the part we can’t afford to get wrong.

💡 Key Takeaways

  • Is 6 a good height? - The average height of a human male is 5'10". So 6 foot is only slightly more than average by 2 inches. So 6 foot is above average, not tall.
  • Is 172 cm good for a man? - Yes it is. Average height of male in India is 166.3 cm (i.e. 5 ft 5.5 inches) while for female it is 152.6 cm (i.e. 5 ft) approximately.
  • How much height should a boy have to look attractive? - Well, fellas, worry no more, because a new study has revealed 5ft 8in is the ideal height for a man.
  • Is 165 cm normal for a 15 year old? - The predicted height for a female, based on your parents heights, is 155 to 165cm. Most 15 year old girls are nearly done growing. I was too.
  • Is 160 cm too tall for a 12 year old? - How Tall Should a 12 Year Old Be? We can only speak to national average heights here in North America, whereby, a 12 year old girl would be between 13

❓ Frequently Asked Questions

1. Is 6 a good height?

The average height of a human male is 5'10". So 6 foot is only slightly more than average by 2 inches. So 6 foot is above average, not tall.

2. Is 172 cm good for a man?

Yes it is. Average height of male in India is 166.3 cm (i.e. 5 ft 5.5 inches) while for female it is 152.6 cm (i.e. 5 ft) approximately. So, as far as your question is concerned, aforesaid height is above average in both cases.

3. How much height should a boy have to look attractive?

Well, fellas, worry no more, because a new study has revealed 5ft 8in is the ideal height for a man. Dating app Badoo has revealed the most right-swiped heights based on their users aged 18 to 30.

4. Is 165 cm normal for a 15 year old?

The predicted height for a female, based on your parents heights, is 155 to 165cm. Most 15 year old girls are nearly done growing. I was too. It's a very normal height for a girl.

5. Is 160 cm too tall for a 12 year old?

How Tall Should a 12 Year Old Be? We can only speak to national average heights here in North America, whereby, a 12 year old girl would be between 137 cm to 162 cm tall (4-1/2 to 5-1/3 feet). A 12 year old boy should be between 137 cm to 160 cm tall (4-1/2 to 5-1/4 feet).

6. How tall is a average 15 year old?

Average Height to Weight for Teenage Boys - 13 to 20 Years
Male Teens: 13 - 20 Years)
14 Years112.0 lb. (50.8 kg)64.5" (163.8 cm)
15 Years123.5 lb. (56.02 kg)67.0" (170.1 cm)
16 Years134.0 lb. (60.78 kg)68.3" (173.4 cm)
17 Years142.0 lb. (64.41 kg)69.0" (175.2 cm)

7. How to get taller at 18?

Staying physically active is even more essential from childhood to grow and improve overall health. But taking it up even in adulthood can help you add a few inches to your height. Strength-building exercises, yoga, jumping rope, and biking all can help to increase your flexibility and grow a few inches taller.

8. Is 5.7 a good height for a 15 year old boy?

Generally speaking, the average height for 15 year olds girls is 62.9 inches (or 159.7 cm). On the other hand, teen boys at the age of 15 have a much higher average height, which is 67.0 inches (or 170.1 cm).

9. Can you grow between 16 and 18?

Most girls stop growing taller by age 14 or 15. However, after their early teenage growth spurt, boys continue gaining height at a gradual pace until around 18. Note that some kids will stop growing earlier and others may keep growing a year or two more.

10. Can you grow 1 cm after 17?

Even with a healthy diet, most people's height won't increase after age 18 to 20. The graph below shows the rate of growth from birth to age 20. As you can see, the growth lines fall to zero between ages 18 and 20 ( 7 , 8 ). The reason why your height stops increasing is your bones, specifically your growth plates.