Crowdsourcing
Obtain informations and idea from large group of people
- Introduction to crowdsourcing and related concepts:
Introduction to crowdsourcing and related concepts:
1. Crowdsourcing
Crowdsourcing is the practice of gathering information, ideas, services, or content from a large group of people, typically via the internet.
Examples of Crowdsourcing
- Wikipedia: Volunteers collaboratively create and edit content.
- Waze: Drivers share real-time traffic data to improve navigation.
- GoFundMe: People contribute small amounts of money to collectively fund larger projects.
2. Human Computation
Human computation refers to systems that combine human skills with computer processing power to solve complex problems.
Examples of Human Computation
- reCAPTCHA: Users help digitize text by verifying distorted words.
- Foldit: Players solve puzzles to help scientists understand protein folding.
- Amazon Mechanical Turk (MTurk): A platform where workers perform small tasks that AI struggles with, such as sentiment analysis or content moderation.
reCaptcha
Have you ever come across an image like this?
If so, you’ve unknowingly contributed to one of the largest human computation projects in the world. Captcha technology helps to transcribe content for no effort at all.
Every image is shown to multiple users, and once enough people have classified the text the same way, the computer stores the information as correct. In its first 4 years, reCaptcha transcribed the entire NY Times archive of over 13M articles.
ML and AI have been trained on the datasets created from human computation projects like reCaptcha, helping AI to perfect transcription technology. Nowadays, reCaptcha is used to help identify objects in images to help train computers to do the same.
Click here to watch a video clip from one of the creators of the captcha (30 seconds)
Class Activity
Click on this form below to fill out a captcha as a class!
Popcorn Hack
Find a website that uses reCaptcha
3. Citizen Science and Public Datasets
Citizen science is a form of crowdsourcing where volunteers participate in scientific research by collecting, categorizing, or analyzing data.
Examples of Citizen Science
- Zooniverse: A platform where volunteers classify images for scientific projects.
- eBird: Birdwatchers contribute sightings, creating valuable migration data. Researchers use this data to graph trends over time.
- Globe at Night: Citizens measure light pollution by observing stars.
Public datasets
There are multiple databases, both digital and physical, of large pools of data contributed by citizen scientists. With the Internet, more people can contribute to the data, and researchers or hobbyists can use that data in their own calculations.
- Kaggle datasets
- Google Public Datasets (instructions to access datasets here)
- Data.gov
Bias: The Dangers of Public Data
While public datasets can be useful for graphing trends, new technologies and innovations, research, and more, it always runs the risk of being biased and discriminatory.
Data comes from people, and people carry their own inherent biases. When using data to train AI or in ML, it can produce biased results:
- A facial recognition software was found more likely to falsely state that Asian faces had their eyes closed than those of other races
- Facial recognition technology used by law inforcement gave false positive matches for Black and Asian faces at a higher rate than white faces because the tech was trained mainly on white faces
- When Amazon used software to sort through resumes, those mentioning the word “woman” (woman’s sports clubs, woman’s colleges) were more likely to get discarded because the majority of past Amazon employees were male
When an AI trained on a specific dataset yields biased results, it’s important to stop and consider where those results may have come from.
Popcorn Hack
Find another public dataset and describe it’s purpose. What do you think are the pros and cons of this dataset?
4. Volunteer Computing and Distributed Computing
Volunteer and distributed computing involve using the idle processing power of thousands of personal computers to perform large-scale computations.
Examples of Volunteer and Distributed Computing
- SETI@Home: Volunteers’ computers analyze radio signals in the search for extraterrestrial life.
- Folding@Home: Participants donate their computer power to simulate protein folding for disease research.
- BOINC (Berkeley Open Infrastructure for Network Computing): A platform for distributed computing projects across various fields.
Benefits and Risks
Topic | Benefits | Risks |
---|---|---|
Crowdsourcing | - Aggregates collective knowledge, leading to innovative solutions. - Enhances efficiency by distributing tasks to many people. - Cost-effective for organizations. - Faster problem-solving with large-scale participation. |
- Vulnerable to misinformation and vandalism. - Lack of diversity despite open access. - Prone to scams and fraud in crowdfunding platforms. |
Human Computation | - Solves problems that are difficult for machines but easy for humans (e.g., image recognition). - Efficient for processing large datasets through microtasks. - Improves machine learning algorithms by validating data. - Can utilize unpaid volunteers or paid microworkers. |
- Potential for human error or bias in results. - Limited scalability for complex or large-scale problems. |
Citizen Science | - Broadens participation in scientific research, increasing data collection capacity. - Allows non-experts to contribute to real scientific projects. - Encourages public engagement with science. - Promotes scientific literacy and community involvement. |
- Data quality may vary due to inconsistent or inaccurate contributions. - Requires proper verification to prevent false or misleading data. - Privacy concerns when participants share personal data. |
Volunteer Computing | - Harnesses idle computing power to solve complex problems. - Cost-effective access to large-scale processing power. - Accelerates scientific simulations and data analysis. |
- Security risks, as distributed networks are vulnerable to hacking. - Reliability issues due to inconsistent user participation. |
Essential Knowledge from CB
Learning Objective | Description |
---|---|
IOC-1.E | Explain how people participate in problem-solving processes at scale. |
IOC-1.E.1 | Widespread access to information and public data facilitates the identification of problems, development of solutions, and dissemination of results. |
IOC-1.E.2 | Science has been affected by using distributed and “citizen science” to solve scientific problems. |
IOC-1.E.3 | Citizen science is scientific research conducted in whole or part by distributed individuals, many of whom may not be scientists, who contribute relevant data to research using their own computing devices. |
IOC-1.E.4 | Crowdsourcing is the practice of obtaining input or information from a large number of people via the Internet. |
IOC-1.E.5 | Human capabilities can be enhanced by collaboration via computing. |
IOC-1.E.6 | Crowdsourcing offers new models for collaboration, such as connecting businesses or social causes with funding. |
Hacks
Question 1
Explain the concept of crowdsourcing. Provide 2 examples (that have not been mentioned in this lesson) of how it is commonly used in different fields.
Question 2
Identify a real-world example of a successful crowdsourcing project. Explain the project, its goals, and the positive outcomes achieved through the collaboration of a large group of people.
Question 3
What are some drawbacks of crowdsourcing and why would certain groups denounce crowdsourcing? Provide specific example(s).
Question 4
Find a public data set that would work with your Pilot City project.
MCQ
- Which of the following is NOT an example of citizen science?
-
A) When perfecting their recommendation algorithms, Amazon looks at the data of its users to see what items users tend to buy at the same time.
-
B) In order to increase computing power, NASA creates a program that people can run on their personal computing devices when they have free RAM to enhance their supercomputing power.
-
C) Volunteers can watch a live camera feed of a canal in the Netherlands and press a button to open the gate to let fish through when they are trapped on one side.
-
D) When they see an invasive species in the wild, people can report their sightings on a website so researchers can use that data to map trends over time.
- Which of the following would benefit from crowdsourcing?
-
A) An online wiki about the popular anime Jojo’s Bizarre Adventure.
-
B) An online ticket purchasing system that allows people to buy tickets for the art museum.
-
C) A free online application that allows users to convert images from one format to another.
-
D) An automated inventory management system for a large warehouse.
- A software company wants to improve its product’s user interface. Which crowdsourcing approach would be most effective for gathering diverse feedback while minimizing development costs?
-
A) Hire professional UI consultants
-
B) Release a best version with built-in feedback tools
-
C) Survey only existing customers via email
-
D) Conduct in-person focus groups